4x faster network file sync with rclone (vs rsync) (2025)

upvote

4x faster network file sync with rclone (vs rsync) (2025)

(www.jeffgeerling.com)

169 points

by indigodaddy3 days ago |

upvote

by digiown3 hours ago|

[-]

Note there is no intrinsic reason running multiple streams should be faster than one [EDIT: "at this scale"]. It almost always indicates some bottleneck in the application or TCP tuning. (Though, very fast links can overwhelm slow hardware, and ISPs might do some traffic shaping too, but this doesn't apply to local links).

SSH was never really meant to be a high performance data transfer tool, and it shows. For example, it has a hardcoded maximum receive buffer of 2MiB (separate from the TCP one), which drastically limits transfer speed over high BDP links (even a fast local link, like the 10gbps one the author has). The encryption can also be a bottleneck. hpn-ssh [1] aims to solve this issue but I'm not so sure about running an ssh fork on important systems.

1. https://github.com/rapier1/hpn-ssh

reply

upvote

by Aurornis2 hours ago|

[-]

> Note there is no intrinsic reason running multiple streams should be faster than one.

The issue is the serialization of operations. There is overhead for each operation which translates into dead time between transfers.

However there are issues that can cause singular streams to underperform multiple streams in the real world once you reach a certain scale or face problems like packet loss.

reply

upvote

by nh21 hours ago|

[-]

Is it certain that this is the reason?

rsync's man page says "pipelining of file transfers to minimize latency costs" and https://rsync.samba.org/how-rsync-works.html says "Rsync is heavily pipelined".

If pipelining is really in rsync, there should be no "dead time between transfers".

reply

upvote

by spockz4 minutes ago|

[-]

I’m not sure why, but just like with scp, I’ve achieved significant speeds ups by tarring the directory first (optionally compressing it), transferring and then decompressing. Maybe because it makes the tar and submit, and the receive, untar/uncompress, happen on different threads?

reply

upvote

by wmf1 hours ago|

[-]

The ideal solution to that is pipelining but it can be complex to implement.

reply

upvote

by mprovost3 hours ago|

[-]

In general TCP just isn't great for high performance. In the film industry we used to use a commercial product Aspera (now owned by IBM) which emulated ftp or scp but used UDP with forward error correction (instead of TCP retransmission). You could configure it to use a specific amount of bandwidth and it would just push everything else off the network to achieve it.

reply

upvote

by nh247 minutes ago|

[-]

What does "high performance" mean here?

I get 40 Gbit/s over a single localhost TCP stream on my 10 years old laptop with iperf3.

So the TCP does not seem to be a bottleneck if 40 Gbit/s is "high" enough, which it probably is currently for most people.

I have also seen plenty situations in which TCP is faster than UDP in datacenters.

For example, on Hetzner Cloud VMs, iperf3 gets me 7 Gbit/s over TCP but only 1.5 Gbit/s over UDP. On Hetzner dedicated servers with 10 Gbit links, I get 10 Gbit/s over TCP but only 4.5 Gbit/s over UDP. But this could also be due to my use of iperf3 or its implementation.

I also suspect that TCP being a protocol whose state is inspectable by the network equipment between endpoints allows implementing higher performance, but I have not validated if that is done.

reply

upvote

by digiown3 hours ago|

[-]

There's an open source implementation that does something similar but for a more specific use case: https://github.com/apernet/tcp-brutal

There's gotta be a less antisocial way though. I'd say using BBR and increasing the buffer sizes to 64 MiB does the trick in most cases.

reply

upvote

by tclancy1 hours ago|

[-]

Have you tried searching for "tcp-kind"?

reply

upvote

by pezgrande2 hours ago|

[-]

Was the torrent protocol considered at some point? Always surprised how little presence has in the industry considering how good the technology is.

reply

upvote

by gruez1 hours ago|

[-]

If you strip out the swarm logic (ie. downloading from multiple peers), you're just left with a protocol that transfers big files via chunks, so there's no reason that'd be faster than any other sort of download manager that supports multi-thread downloads.

https://en.wikipedia.org/wiki/Download_manager

reply

upvote

by ambicapter2 hours ago|

[-]

torrent is great for many-to-one type downloads but I assume GP is talking about single machine to single machine transfers.

reply

upvote

by adolph1 hours ago|

[-]

Aspera's FASP [0] is very neat. One drawback to it is that the TCP stuff not being done the traditional way must be done on CPU. Say if one packet is missing or if packets are sent out of order, the Aspera client fixes those instead of all that being done as TCP.

As I understand it, this is also the approach of WEKA.io [1]. Another approach is RDMA [2] used by storage systems like Vast which pushes those order and resend tasks to NICs that support RDMA so that applications can read and write directly to the network instead of to system buffers.

0. https://en.wikipedia.org/wiki/Fast_and_Secure_Protocol

1. https://docs.weka.io/weka-system-overview/weka-client-and-mo...

2. https://en.wikipedia.org/wiki/Remote_direct_memory_access

reply

upvote

by softfalcon1 hours ago|

[-]

> It almost always indicates some bottleneck in the application or TCP tuning.

Yeah, this has been my experience with low-overhead streams as well.

Interestingly, I see a ubiquity of this "open more streams to send more data" pattern all over the place for file transfer tooling.

Recent ones that come to mind have been BackBlaze's CLI (B2) and taking a peek at Amazon's SDK for S3 uploads with Wireshark. (What do they know that we don't seem to think we know?)

It seems like they're all doing this? Which is maybe odd, because when I analyse what Plex or Netflix is doing, it's not the same? They do what you're suggesting, tune the application + TCP/UDP stack. Though that could be due to their 1-to-1 streaming use case.

There is overhead somewhere and they're trying to get past it via semi-brute-force methods (in my opinion).

I wonder if there is a serialization or loss handling problem that we could be glossing over here?

reply

upvote

by digiown36 minutes ago|

[-]

Tuning on Linux requires root and is systemwide. I don't think BBR is even available on other systems. And you need to tune the buffer sizes of both ends too. Using multiple streams is just less of a hassle for client users. It can also fool some traffic shaping tools. Internal use is a different story.

reply

upvote

by akdev1l47 minutes ago|

[-]

not sure about B2 but AWS S3 SDK not assuming that people will do any tuning makes total sense

cuz in my experience no one is doing that tbh

reply

upvote

by yason1 hours ago|

[-]

Note there is no intrinsic reason running multiple streams should be faster than one

If the server side scales (as cloud services do) it might end up using different end points for the parallel connections and saturate the bandwidth better. One server instance might be serving other clients as well and can't fill one particular client's pipe entirely.

reply

upvote

by oceanplexian2 hours ago|

[-]

Uhh.. I work with this stuff daily and there are a LOT of intrinsic reasons a single stream would be slower than running multiple: MPLS ECMP hashing you over a single path, a single loss event with a high BDP causing congestion control to kick in for a single flow, CPU IRQ affinity, probably many more I’m not thinking like the inner workings of NIC offloading queues.

Source: Been in big tech for roughly ten years now trying to get servers to move packets faster

reply

upvote

by digiown2 hours ago|

[-]

Ha, it sounds like the best way to learn something is to make a confident and incorrect claim :)

> MPLS ECMP hashing you over a single path

This is kinda like the traffic shaping I was talking about though, but fair enough. It's not an inherent limitation of a single stream, just a consequence of how your network is designed.

> a single loss event with a high BDP

I thought BBR mitigates this. Even if it doesn't, I'd still count that as a TCP stack issue.

At a large enough scale I'd say you are correct that multiple streams is inherently easier to optimize throughput for. But probably not a single 1-10gb link though.

reply

upvote

by patmorgan2318 minutes ago|

[-]

I mean isn't a single TCP connections throughput limited by the latency? Which is why in high(er) latency WAN links you generally want to open multiple connections for large file transfers.

https://wintelguy.com/wanperf.pl

reply

upvote

by Saris2 hours ago|

[-]

Wouldn't lots of streams speed up transfers of thousands of small files?

reply

upvote

by digiown2 hours ago|

[-]

If the application handles them serially, then yeah. But one can imagine the application opening files in threads, buffering them, and then finally sending it at full speed, so in that sense it is an application issue. If you truly have millions of small files, you're more likely to be bottlenecked by disk IO performance rather than application or network, though. My primary use case for ssh streams is zfs send, which is mostly bottlenecked by ssh itself.

reply

upvote

by catdog1 hours ago|

[-]

It's an application issue but implementation wise it's probably way more straightforward to just open a separate network connection per thread.

reply

upvote

by yegle3 hours ago|

[-]

The author tried running rsyncd demon so it's not _just_ the ssh protocol.

reply

upvote

by dekhn3 hours ago|

[-]

Single file overheads (opening millions of tiny files whose metadata is not in the OS cache and reading them) appears to be an intrinsic reason (intrinsic to the OS, at least).

reply

upvote

by pixl972 hours ago|

[-]

IOPs and disk read depth are common limits.

Depending on what you're doing it can be faster to leave your files in a solid archive that is less likely to be fragmented and get contiguous reads.

reply

upvote

by ftchd14 minutes ago|

[-]

Rclone is such an elegant piece of software, reminds me of the time where most software worked well most of the time. There's few people that wouldn't benefit from it, either as a developer or end-user.

I'm currently working on the GUI if you're interested: https://github.com/rclone-ui/rclone-ui

reply

upvote

by ericpauley3 hours ago|

[-]

Rclone is a fantastic tool, but my favorite part of it is actually the underyling FS library. I've started baking Rclone FS into internal Go tooling and now everything transparently supports reading/writing to either local or remote storage. Really great for being able to test data analysis code locally and then running as batch jobs elsewhere.

reply

upvote

by rsync1 minutes ago|

[-]

"Rclone is a fantastic tool, but my favorite part of it is actually the underyling FS library."

Related to this is the very useful:

  rclone serve restic ...

.. workflow that allows you to create append-only (immutable) backups.

This howto is not rsync.net-specific - you can follow this recipe at any standard SSH endpoint:

https://www.rsync.net/resources/notes/2025-q4-rsync.net_tech...

reply

upvote

by absoflutely1 hours ago|

[-]

What kind of data analysis do you run with Go and do you use an open source library for it? My experience with stats libraries in Go has been lukewarm so far.

reply

upvote

by coreylane3 hours ago|

[-]

RClone has been so useful over the years I built a fully managed service on top of it specifically for moving data between cloud storage providers: https://dataraven.io/

My goal is to smooth out some of the operational rough edges I've seen companies deal with when using the tool:

  - Team workspaces with role-based access control
  - Event notifications & webhooks – Alerts on transfer failure or resource changes via Slack, Teams, Discord, etc.
  - Centralized log storage
  - Vault integrations – Connect 1Password, Doppler, or Infisical for zero-knowledge credential handling (no more plain text files with credentials)
  - 10 Gbps connected infrastructure (Pro tier) – High-throughput Linux systems for large transfers

reply

upvote

by noname1203 hours ago|

[-]

I hope that you sponsor the rclone project given that it’s the core of your business! I couldn’t find any indication online that you do give back to the project. I hope I’m wrong.

reply

upvote

by coreylane2 hours ago|

[-]

I'm certainly planning on sponsoring the project as soon as possible, but so far I have zero paying customers, hopefully that will change soon

reply

upvote

by znnajdla1 hours ago|

[-]

first thing that popped into my mind is that your free plan is crazy generous. cut it out.

reply

upvote

by stronglikedan2 hours ago|

[-]

that's just creepy and hella presumptuous

reply

upvote

by asacrowflies1 hours ago|

[-]

Yeah I've seen this pop up in foss a lot lately and I don't like it.

reply

upvote

by sneak1 hours ago|

[-]

Gifts do not confer obligation. If you give me a screwdriver and I use it to run my electrical installation service business, I don’t owe you a payment.

This idea that one must “give back” after receiving a gift freely given is simply silly.

reply

upvote

by burnte53 minutes ago|

[-]

Yes but thank-yous are always good. Making sure the project sticks around is just smart.

reply

upvote

by MattGrommes1 hours ago|

[-]

If your neighbor kept baking and giving you cookies, to the point where you were wrapping and reselling them at the market, don't you think you should do something for them in return?

reply

upvote

by jfbaro3 hours ago|

[-]

Me too!

reply

upvote

by plasticsoprano3 hours ago|

[-]

How do you deal with how poorly rclone handles rate limits? It doesn't honor dropbox's retry-after header and just adds an exponential back off that, in my migrations, has resulted in a pause of days.

I've adjusted threads and the various other controls rclone offers but I still feel like I'm not see it's true potential because the second it hits a rate limit I can all but guarantee that job will have to be restarted with new settings.

reply

upvote

by darthShadow1 hours ago|

[-]

> doesn't honor dropbox's retry-after header

That hasn't been true for more than 8 years now.

Source: https://github.com/rclone/rclone/blob/9abf9d38c0b80094302281...

And the PR adding it: https://github.com/rclone/rclone/pull/2622

reply

upvote

by coreylane2 hours ago|

[-]

I honestly haven't used it with Dropbox before, have you tried adjusting --tpslimit 12 --tpslimit-burst 0 flags? Are you creating a dedicated api key for the transfer? Rate limits may vary between Plus/Advanced forum.rclone.org is quite active you may want to post more details there.

reply

upvote

by kwanbix27 minutes ago|

[-]

It is crazy to see how difficult google makes it for anyone to download their own pictures from google photos. Rclone used to allow you to download them, but not anymore. Only the ones uploaded by Rclone are available to download. I wish someone forced all cloud providers to allow you to download your own data. And no, google takout doesn't count. It is horrible to use.

reply

upvote

by newsoftheday2 hours ago|

[-]

I prefer rsync because of its delta transfer which doesn't resend files already on the destination, saving bandwidth. This combined with rsync's ability to work over ssh lets me sync anywhere rsync runs, including the cloud. It may not be faster than rclone but it is more conserving on bandwidth.

reply

upvote

by HPsquared2 hours ago|

[-]

Rclone can "sync" with a range of different ways to check if the existing files are the same. If no hashes are available (e.g. WebDAV) I think you can set it to check by timestamp (with a tolerance) and size.

Edit: oh I see, delta transfer only sends the changed parts of files?

reply

upvote

by newsoftheday1 hours ago|

[-]

It only sends the changed parts of files (the diffs) is my understanding which saves bandwidth.

reply

upvote

by plagiarist1 hours ago|

[-]

Does rclone not do that? I thought they were specifically naming themselves similarly because they also did that.

reply

upvote

by newsoftheday1 hours ago|

[-]

My understanding is that rclone does not do true delta sync sending only the differing parts of files like rsync.

reply

upvote

by cachius3 hours ago|

[-]

rclone --multi-thread-streams allows transfers in parallel, like robocopy /MT

You can also run multiple instances of rsync, the problem seems how to efficiently divide the set of files.

reply

upvote

by cachius3 hours ago|

[-]

> efficiently divide the set of files.

It turns out, fpart does just that! Fpart is a Filesystem partitioner. It helps you sort file trees and pack them into bags (called "partitions"). It is developed in C and available under the BSD license.

It comes with an rsync wrapper, fpsync. Now I'd like to see a benchmark of that vs rclone! via https://unix.stackexchange.com/q/189878/#688469 via https://stackoverflow.com/q/24058544/#comment93435424_255320...

https://www.fpart.org/

reply

upvote

by pama3 hours ago|

[-]

Sometimes find (with desired maxdepth) piped to gnu-parallel rsync is fine.

reply

upvote

by adolph1 hours ago|

[-]

My go-to for fast and easy parallelization is xargs -P.

  find a-bunch-of-files | xargs -P 10 do-something-with-a-file

       -P max-procs
       --max-procs=max-procs
              Run up to max-procs processes at a time; the default is 1.
              If max-procs is 0, xargs will run as many processes as
              possible at a time.

reply

upvote

by akdev1l38 minutes ago|

[-]

note that one should use -print0 and -0 for safety

reply

upvote

by SoftTalker3 hours ago|

[-]

robocopy! Wow, blast from the past. Used to use it all the time when I worked in a Windows shop.

reply

upvote

by bob10293 hours ago|

[-]

I am using robocopy right now on a project. The /MIR option is extremely useful for incrementally maintaining copies of large local directories.

reply

upvote

by indigodaddy3 hours ago|

[-]

One thing that sets rsync apart perhaps is the handling of hard links when you don't want to send both/duplicated files to the destination? Not sure if rclone can do that.

reply

upvote

by 2 hours ago|

[-]

deleted

reply

upvote

by xoa2 hours ago|

[-]

Thanks for sharing, hadn't seen it but at almost the same time he made that post I too was struggling to get decent NAS<>NAS transfer speeds with rsync. I should have thought to play more with rclone! I ended up using iSCSI but that is a lot more trouble.

>In fact, some compression modes would actually slow things down as my energy-efficient NAS is running on some slower Arm cores

Depending on the number/type of devices in the setup and usage patterns, it can be effective sometimes to have a single more powerful router and then use it directly as a hop for security or compression (or both) to a set of lower power devices. Like, I know it's not E2EE the same way to send unencrypted data to one OPNsense router, Wireguard (or Nebula or whatever tunnel you prefer) to another over the internet, and then from there to a NAS. But if the NAS is in the same physically secure rack directly attached by hardline to the router (or via isolated switch), I don't think in practice it's significantly enough less secure at the private service level to matter. If the router is a pretty important lynchpin anyone, it can be favorable to lean more heavily on that so one can go cheaper and lower power elsewhere. Not that more efficiency, hardware acceleration etc are at all bad, and conversely sometimes might make sense to have a powerful NAS/other servers and a low power router, but there are good degrees of freedom there. Handier then ever in the current crazy times where sometimes hardware that was formerly easily and cheaply available is now a king's ransom or gone and one has to improvise.

reply

upvote

by Dunedan2 hours ago|

[-]

I wonder if the at least partially the reason for the speed up isn't the multi-threading, but instead that rclone maybe doesn't compress transferred data by default. That's what rsync does when using SSH, so for already compressed data (like videos for example) disabling SSH compression when invoking rsync speeds it up significantly:

  rsync -e "ssh -o Compression=no" ...

reply

upvote

by nh21 hours ago|

[-]

Compression is off by default in OpenSSH, at least `man 5 ssh_config` says:

> Specifies whether to use compression. The argument must be yes or no (the default).

So I'm surprised you see speedups with your invocation.

reply

upvote

by aidenn02 hours ago|

[-]

rclone is not as good as rsync for doing ad-hoc transfers; for anything not using the filesystem, you need to set up a configuration, which adds friction. It realy is purpose built for recurring transfers rather than "I need to move X to Y just once"

reply

upvote

by ruuda1 hours ago|

[-]

We wrote https://github.com/chorusone/fastsync for fast ad-hoc transfers over multiple TCP streams.

reply

upvote

by rurban1 hours ago|

[-]

Thanks for the lms tips in the comments. Amazing!

reply

upvote

by KolmogorovComp2 hours ago|

[-]

Why are rclone/rsync never used by default for app updates? Especially games with large assets.

reply

upvote

by rjmunro2 hours ago|

[-]

zsync is better for that. zsync precalculates all the hashes and puts them in a file alongside the main one. The client downloads the hashes, compares them to what it has then downloads the parts it is missing.

With rsync, you upload hashes of what you have, then the source has to do all the hashing work to figure out what to send you. It's slightly more efficient, but If you are supporting even 10s of downloads it's a lot of work for the source.

The other option is to send just a diff, which I believe e.g. Google Chrome does. Google invented Courgette and Zucchini which partially decompile binaries then recompile them on the other end to reduce the size of diffs. These only work for exact known previous versions, though.

I wonder if the ideas of Courgette and Zucchini can be incorporated into zsync's hashes so that you get the minimal diff, but the flexibility of not having a perfect previous version to work from.

reply

upvote

by plagiarist1 hours ago|

[-]

Do a CRDT but for binary executables

reply

upvote

by packetlost3 hours ago|

[-]

I use tab-complete to navigate remote folder structures with rsync all the time, does rclone have that?

reply

upvote

by nh21 hours ago|

[-]

This is not a feature of rsync, but of your shell.

So the question "does rclone have that" doesn't make much sense, because it usually wouldn't be rclone implementing it.

For example, zsh does it here for rsync, which actually invokes `ssh` itself:

https://github.com/zsh-users/zsh/blob/3e72a52e27d8ce8d8be0ee...

https://github.com/zsh-users/zsh/blob/3e72a52e27d8ce8d8be0ee...

That said, some CLI tools come with tools for shells to help them implement such things. E.g. `mytool completion-helper ...`

But I don't get rclone SSH completions in zsh, as it doesn't call `_remote_files` for rclone:

https://github.com/zsh-users/zsh/blob/3e72a52e27d8ce8d8be0ee...

reply

upvote

by gjvc1 hours ago|

[-]

May 6, 2025 May 6, 2025 May 6, 2025 May 6, 2025 May 6, 2025 May 6, 2025 May 6, 2025

reply

upvote

by 3 hours ago|

[-]

deleted

reply

upvote

by sneak1 hours ago|

[-]

What’s sad to me is that rsync hasn’t been touched to fix these issues in what feels like decades.

reply

upvote

by baal80spam4 hours ago|

[-]

I'll keep saying that rclone is a fantastic and underrated piece of software.

reply

upvote

by digiown3 hours ago|

[-]

rclone is super cool, but unfortunately many of the providers it supports has such low ratelimits, that it's fairly difficult to use it to transfer much data at all.

reply

upvote

by plasticsoprano2 hours ago|

[-]

This has been my problem. Not necessarily that the rate limits are low, many can be gotten around by using multiple users to do the work since the limits are per user, but how rclone handles those rate limits when they hit them. The exponential back off will end up making hours and days long delays that will screw a migration.

reply