As an alternative to Streaming jobs, librsync provides a "pull"-mode interface where it will repeatedly call application-provided callbacks to get more input data and to accept output data.
However, rather than calling rs_job_iter(), the application should then call rs_job_drive(), passing an input and an output callback. rs_job_drive() takes an opaque pointer for both the input and output callback: this could be a
FILE* or some similar object telling them what to read and write.
The librsync interface allows non-blocking streaming processing of data. This means that the library will accept input and produce output when it suits the application. If nonblocking file IO is used and the IO callbacks support it, then librsync will never block waiting for IO.
Normally callbacks will read/write the whole buffer when they're called, but in some cases they might not be able to process all of it, or perhaps not process any at all. This might happen if the callbacks are connected to a nonblocking socket. Either of two things can happen in this case. If the callback returns RS_BLOCKED, then rs_job_iter() will also return RS_BLOCKED shortly.
When an IO callback blocks, it is the responsibility of the application to work out when it will be able to make progress and therefore when it is worth calling rs_job_iter() again. Typically this involves a mechanism like
select to wait for the file descriptor to be ready.
The IO callbacks are allowed to block. This will of course mean that the application's call to rs_job_drive() will also block.
IO callbacks are also allowed to process or provide only part of the requested data, as will commonly happen with socket IO.
The library might not get as much input as it wanted when it is first called. If it gets a partial read, it needs to hold onto that valuable and irreplaceable data.
It cannot keep it on the stack, because it will be lost if the read blocks. It needs to be kept in the job structure, or in somewhere referenced from there.
The state function probably cannot proceed until it has all the needed input. So possibly this can be expressed at a high level of the job structure. Or perhaps it should just be done by each particular state function.
When the library has output to write out, the callback might not be able to accept all of it at the time it is called. Deferred outgoing data needs to be stored in a buffer referenced from the job structure.
I think it's always OK to try to flush this when entering rs_job_iter. I think it's OK to not do anything else until all the outgoing data has been flushed.
In many cases we would like to pass a pointer into the input (or pread) buffer straight to the output callback. In other cases, we need a different buffer to build up literal outgoing data.
librsync deals with short, bounded-size headers and checksums, and with arbitrarily-large streaming data. Although the commands are of bounded size, they are not of fixed size, because there are different encodings to suit different situations.
The situation is very similar to fetching variable-length headers from a socket. We cannot read the whole command in a single input, because we don't know how long it is. As a general principle I think we should not read in too much data and buffer it, because this complicates things. Therefore we need to read the type byte first, and then possibly read some parameters.