upvote
The simple model for scp and rsync (it's likely more complex in rsync): for loop over all files. for each file, determine its metadata with fstat, then fopen and copy bytes in chunks until done. Proceed to next iteration.

I don't know what rsync does on top of that (pipelining could mean many different things), but my empirical experience is that copying 1 1 TB file is far faster than copying 1 billion 1k files (both sum to ~1 TB), and that load balancing/partitioning/parallelizing the tool when copying large numbers of small files leads to significant speedups, likely because the per-file overhead is hidden by the parallelism (in addition to dealing with individual copies stalling due to TCP or whatever else).

I guess the question is whether rsync is using multiple threads or otherwise accessing the filesystem in parallel, which I do not think it does, while tools like rclone, kopia, and aws sync all take advantage of parallelism (multiple ongoing file lookups and copies).

reply
> I guess the question is whether rsync is using multiple threads or otherwise accessing the filesystem in parallel

No, that is not the question. Even Wikipedia explains that rsync is single-threaded. And even if it was multithreaded "or otherwise" used concurent file IO:

The question is whether rsync _transmission_ is pipelined or not, meaning: Does it wait for 1 file to be transferred and acknowledged before sending the data of the next?

Somebody has to go check that.

If yes: Then parallel filesystem access won't matter, because a network roundtrip has brutally higher latency than reading data sequentially of an SSD.

reply
Note that rsync on many small files is slow even within the same machine (across two physical devices), suggesting that the network roundtrip latency is not the major contributor.
reply
I’m not sure why, but just like with scp, I’ve achieved significant speeds ups by tarring the directory first (optionally compressing it), transferring and then decompressing. Maybe because it makes the tar and submit, and the receive, untar/uncompress, happen on different threads?
reply
It's typically a disk-latency thing, as just stat-ing the many files in a directory can have significant latency implications (especially on spinning HDDs) vs opening a single file (the tar) and read-()ing that one file in memory before writing to the network.

If copying a folder with many files is slower than tarring that folder and the moving the tar (but not counting the untar) then disk latency is your bottleneck.

reply