librsync  2.3.4
Macros | Functions
delta.c File Reference

Generate in streaming mode an rsync delta given a set of signatures, and a new file. More...

Include dependency graph for delta.c:

Go to the source code of this file.

Macros

#define MAX_MISS_LEN   (MAX_DELTA_CMD - 3)
 Max length of a miss is 64K including 3 command bytes. More...
 

Functions

static rs_result rs_delta_s_scan (rs_job_t *job)
 Get a block of data if possible, and see if it matches. More...
 
static rs_result rs_delta_s_flush (rs_job_t *job)
 
static rs_result rs_delta_s_end (rs_job_t *job)
 
static rs_result rs_getinput (rs_job_t *job, size_t block_len)
 
static int rs_findmatch (rs_job_t *job, rs_long_t *match_pos, size_t *match_len)
 find a match at scan_pos, returning the match_pos and match_len. More...
 
static rs_result rs_appendmatch (rs_job_t *job, rs_long_t match_pos, size_t match_len)
 Append a match at match_pos of length match_len to the delta, extending a previous match if possible, or flushing any previous miss/match. More...
 
static rs_result rs_appendmiss (rs_job_t *job, size_t miss_len)
 Append a miss of length miss_len to the delta, extending a previous miss if possible, or flushing any previous match. More...
 
static rs_result rs_appendflush (rs_job_t *job)
 Flush any accumulating hit or miss, appending it to the delta. More...
 
static rs_result rs_processmatch (rs_job_t *job)
 Process matching data in the scoop. More...
 
static rs_result rs_processmiss (rs_job_t *job)
 Process miss data in the scoop. More...
 
static rs_result rs_delta_s_slack (rs_job_t *job)
 State function that does a slack delta containing only literal data to recreate the input. More...
 
static rs_result rs_delta_s_header (rs_job_t *job)
 State function for writing out the header of the encoding job. More...
 
rs_job_trs_delta_begin (rs_signature_t *sig)
 Prepare to compute a streaming delta. More...
 

Detailed Description

Generate in streaming mode an rsync delta given a set of signatures, and a new file.

The size of blocks for signature generation is determined by the block size in the incoming signature.

To calculate a signature, we need to be able to see at least one block of the new file at a time. Once we have that, we calculate its weak signature, and see if there is any block in the signature hash table that has the same weak sum. If there is one, then we also compute the strong sum of the new block, and cross check that. If they're the same, then we can assume we have a match.

The final block of the file has to be handled a little differently, because it may be a short match. Short blocks in the signature don't include their length – we just allow for the final short block of the file to match any block in the signature, and if they have the same checksum we assume they must have the same length. Therefore, when we emit a COPY command, we have to send it with a length that is the same as the block matched, and not the block length from the signature.

Profiling results as of v1.26, 2001-03-18:

If everything matches, then we spend almost all our time in rs_mdfour64 and rs_weak_sum, which is unavoidable and therefore a good profile.

If nothing matches, it is not so good.

2002-06-26: Donovan Baarda

The following is based entirely on pysync. It is much cleaner than the previous incarnation of this code. It is slightly complicated because in this case the output can block, so the main delta loop needs to stop when this happens.

In pysync a 'last' attribute is used to hold the last miss or match for extending if possible. In this code, basis_len and scan_pos are used instead of 'last'. When basis_len > 0, last is a match. When basis_len = 0 and scan_pos is > 0, last is a miss. When both are 0, last is None (ie, nothing).

Pysync is also slightly different in that a 'flush' method is available to force output of accumulated data. This 'flush' is use to finalise delta calculation. In librsync input is terminated with an eof flag on the input stream. I have structured this code similar to pysync with a seperate flush function that is used when eof is reached. This allows for a flush style API if one is ever needed. Note that flush in pysync can be used for more than just terminating delta calculation, so a flush based API can in some ways be more flexible...

The input data is first scanned, then processed. Scanning identifies input data as misses or matches, and emits the instruction stream. Processing the data consumes it off the input scoop and outputs the processed miss data into the tube.

The scoop contains all data yet to be processed. The scan_pos is an index into the scoop that indicates the point scanned to. As data is scanned, scan_pos is incremented. As data is processed, it is removed from the scoop and scan_pos adjusted. Everything gets complicated because the tube can block. When the tube is blocked, no data can be processed.

Definition in file delta.c.

Macro Definition Documentation

◆ MAX_MISS_LEN

#define MAX_MISS_LEN   (MAX_DELTA_CMD - 3)

Max length of a miss is 64K including 3 command bytes.

Definition at line 103 of file delta.c.

Function Documentation

◆ rs_delta_s_scan()

static rs_result rs_delta_s_scan ( rs_job_t job)
static

Get a block of data if possible, and see if it matches.

On each call, we try to process all of the input data available on the scoop and input buffer.

Definition at line 122 of file delta.c.

◆ rs_delta_s_flush()

static rs_result rs_delta_s_flush ( rs_job_t job)
static

Definition at line 164 of file delta.c.

◆ rs_delta_s_end()

static rs_result rs_delta_s_end ( rs_job_t job)
static

Definition at line 204 of file delta.c.

◆ rs_getinput()

static rs_result rs_getinput ( rs_job_t job,
size_t  block_len 
)
inlinestatic

Definition at line 210 of file delta.c.

◆ rs_findmatch()

static int rs_findmatch ( rs_job_t job,
rs_long_t *  match_pos,
size_t *  match_len 
)
inlinestatic

find a match at scan_pos, returning the match_pos and match_len.

Note that this will calculate weak_sum if required. It will also determine the match_len.

This routine could be modified to do xdelta style matches that would extend matches past block boundaries by matching backwards and forwards beyond the block boundaries. Extending backwards would require decrementing scan_pos as appropriate.

Definition at line 229 of file delta.c.

◆ rs_appendmatch()

static rs_result rs_appendmatch ( rs_job_t job,
rs_long_t  match_pos,
size_t  match_len 
)
inlinestatic

Append a match at match_pos of length match_len to the delta, extending a previous match if possible, or flushing any previous miss/match.

Definition at line 258 of file delta.c.

◆ rs_appendmiss()

static rs_result rs_appendmiss ( rs_job_t job,
size_t  miss_len 
)
inlinestatic

Append a miss of length miss_len to the delta, extending a previous miss if possible, or flushing any previous match.

This also breaks misses up into 32KB segments to avoid accumulating too much in memory.

Definition at line 288 of file delta.c.

◆ rs_appendflush()

static rs_result rs_appendflush ( rs_job_t job)
inlinestatic

Flush any accumulating hit or miss, appending it to the delta.

Definition at line 302 of file delta.c.

◆ rs_processmatch()

static rs_result rs_processmatch ( rs_job_t job)
inlinestatic

Process matching data in the scoop.

The scoop contains match data at scan_buf of length scan_pos. This function processes that match data, returning RS_DONE if it completes, or RS_BLOCKED if it gets blocked. After it completes scan_pos is reset to still point at the next unscanned data.

This function currently just removes data from the scoop and adjusts scan_pos appropriately. In the future this could be used for something like context compressing of miss data. Note that it also calls rs_tube_catchup to output any pending output.

Definition at line 332 of file delta.c.

◆ rs_processmiss()

static rs_result rs_processmiss ( rs_job_t job)
inlinestatic

Process miss data in the scoop.

The scoop contains miss data at scan_buf of length scan_pos. This function processes that miss data, returning RS_DONE if it completes, or RS_BLOCKED if it gets blocked. After it completes scan_pos is reset to still point at the next unscanned data.

This function uses rs_tube_copy to queue copying from the scoop into output. and uses rs_tube_catchup to do the copying. This automaticly removes data from the scoop, but this can block. While rs_tube_catchup is blocked, scan_pos does not point at legit data, so scanning can also not proceed.

In the future this could do compression of miss data before outputing it.

Definition at line 355 of file delta.c.

◆ rs_delta_s_slack()

static rs_result rs_delta_s_slack ( rs_job_t job)
static

State function that does a slack delta containing only literal data to recreate the input.

Definition at line 367 of file delta.c.

◆ rs_delta_s_header()

static rs_result rs_delta_s_header ( rs_job_t job)
static

State function for writing out the header of the encoding job.

Definition at line 384 of file delta.c.

◆ rs_delta_begin()

rs_job_t * rs_delta_begin ( rs_signature_t sig)

Prepare to compute a streaming delta.

Todo:
Add a version of this that takes a rs_magic_number controlling the delta format.

Definition at line 396 of file delta.c.