Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / June 2006

Tip: Looking for answers? Try searching our database.

rsync and network file transfer speeds

Thread view: 
Joshua Jung - 29 Jun 2006 22:59 GMT
Hi, I appreciate the note to research rsync on my last post (thanks Rogan
and Suken!).

I've done some research on rsync and even read the doctoral thesis that
Tridgell wrote on his rsync algorithm. While knowledgeable about what the
rsync algorithm is in practice, I am unsure of whether it will work for
our application.

Our program concept involves regularly transferring huge numbers of files
from our client to our server for special processing and storage.  This is
to be done regularly and we want the process to be transparent (at least
on the client). This would mean we do not want duplicate files or file
blocks on the server.

This is where rsync *could* be beneficial. Unfortunately, besides the
currently *unsupported* jarsync I have been unable to find any Java
implementations of rsync. Also my research is finding out that rsync was
originally designed for high-latency type applications (like dial-up
connections). Assuming our network is going to be low-latency (i.e.
broadband based), will rsync make any sense? It just seems to me that the
rolling signature algorithm is only good if the algorithm is faster than
the connection.

The short and sweet of my questions is this:

Assuming our transfer speeds are broadband level, will it be faster to
run the standard rsync algorithm or just do a quick check on the time
stamps of the files on client and server and just upload the entire file
(with zipping of course) if the time-stamps are different?

Any website links or data on the speed of rsync on current
machines/connections would be greatly appreciated. Also, if there is any
other option besides rsync, that would be sweet as well!

Josh <><

[P.S I'd love to test out rsync myself, but I'd like some more advice
before diving that direction :) ]
Dimitri Maziuk - 30 Jun 2006 00:22 GMT
Joshua Jung sez:
...
> The short and sweet of my questions is this:
>
> Assuming our transfer speeds are broadband level, will it be faster to
> run the standard rsync algorithm or just do a quick check on the time
> stamps of the files on client and server and just upload the entire file
> (with zipping of course) if the time-stamps are different?

If you can guarantee that the clocks on both ends are always in sync
(ntp will do that, most of the time), and both ends are in the same
time (and DST) zone, yes: a size + timestamp check is going to be
faster. In general you cannot trust a random internet host's clock,
hence the clever algorithms.

> Any website links or data on the speed of rsync on current
> machines/connections would be greatly appreciated. Also, if there is any
> other option besides rsync, that would be sweet as well!

We're using rsync routinely on a 100Mb/s LAN, with a couple of pretty
slow machines (e.g. a Sun Ultra 10). The only problem is that rsync
seems to be very sensitive to network glitches -- most protocols will
recover from 1-2 sec. loss of connectivity just fine, rsync usually
doesn't. (Not what you'd expect from something designed for dial-up
connections.)

Dima
Signature

All whitespace is equivalent except in certain situations          
                                                 -- ANSI C standard committee

Joshua Jung - 30 Jun 2006 21:08 GMT
> Joshua Jung sez:
> ...
[quoted text clipped - 23 lines]
>
> Dima

Are there any other options out there besides the rsync algorithm? We've
noticed that a lot of backup companies are using a feature that backs up
only changes at the byte or block level to reduce overhead and were
curious if there are algorithms that could do this for file comparisons.
Obviously the easy way is to store two copies on the client machine and
diff them but that is quite intensive and there really isn't any added
benefit to having two files on the client machine!

Appreciate any help! Thanks so much.

Josh <><


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.