Martin Pool, 15 July 2004
I have been working on a major rewrite of librsync. I hope to have it functionally complete in the next one or two months. The overall goal is to simplify and to remove complexity.
The librsync library does network delta-compression of
streams and files. The algorithm is similar to that used in the rsync and xdelta programs. Unlike
diff and xdelta, librsync does not require access to both of
the files on the same machine, but rather only a short
signature
of the old file and the complete contents of the new
file.
librsync ships with a tool called rdiff which does delta compression from the command line or from scripts.
librsync does not speak the (hairy) application-specific protocol used by rsync. They only share an algorithm, not any code.
The current release is librsync-0.9.6. This works OK and has been moderately widely used in rdiff-backup amongst other things. However, I am quite dissatisfied with the code: it was written while I was trying to understand the problem and is as a result a bit convoluted.
These changes are still experimental. 0.9.6 is the stable release at the moment. I hope the changes will seem sufficiently good that people will want to switch.
More than half the public API has changed, with the goal of making the API and the internals simpler. Code that calls librsync will have to be changed. In most cases I hope this will make the client code simpler too. (Since the old library will still be available, you don't have to change it immediately, or indeed ever, if you don't want to.)
In librsync 0.9, clients had to manage memory bufers for input and output, adding input data and removing output when the library needed it. In the new code, the library makes callbacks whenever it needs input or output.
The callbacks are defined so that they will work with programs that do nonblocking network IO. The callbacks can process only part of the request, or they can indicate that the request would block. When used with nonblocking IO, librsync will never block itself. The application always has control over when and how to call select (or whatever other method).
Use of callbacks by librsync does not imply the program needs to be threaded, event-driven or structured around callbacks. The callbacks are called at only one time: when the application has called rs_job_run. This is the only time the application needs to expect the callbacks to run.
The library will still work in threaded programs. It does no threading operations itself, so typically each context needs to be protected by a mutex.
These changes have considerably simplified the library and improved performance for some cases. Performance should not be much worse in any case.
The command-line syntax for rdiff has changed slightly. Scripts that call it may need small updates.
librsync is now built using SCons, rather than automake/autoconf/libtool. The SConstruct file is smaller and easier to maintain than the old build system. It should make it possible to use a single build script on Unix and Windows. If you don't want to use it, the library should be simple to compile by writing your own Makefile or whatever.
Rusty Russell proposes a hierarchy of hard-to-misuse interface styles.
In the previous branch you had to read the documentation, and maybe the implementation, to use librsync correctly. I think the new interface ought to be easy to use correctly.
Because the library is simpler, the documentation is now (I hope) more complete and easier to understand.
I have not found a completely satisfactory format for the documentation. Manpages are handy on Unix, but not easy to access on other platforms and hard to convert to decent-looking web pages. Docbook is powerful, but a bit of pain to write and the commands to convert it to useful output are still not well-standardized. reStructuredText is nice to write and produces decent text and html output, on both Windows and Unix, but is not widely used. I should probably just pick one: probably ReST.
Licensing will still be GNU Lesser GPL, but I am not releasing the code until I have finished rework.
The previous public version is 0.9.6, which is functionally stable but somewhat immature code. I'm not sure what version number this new code should have. Since it's a major API change it might deserve a new major number, which would make it 1.0.0.
Prerelease code is available through Arch, but be warned that it does not work yet — it is published only so that people can look at and comment on the API. To get it, install GNU Arch and run these commands:
% tla register-archive \ http://sourcefrog.net/arch/mbp@sourcefrog.net--2004 % tla get \ mbp@sourcefrog.net--2004/librsync--callback--0.11