My weekly rsync w. 1Gb IF’s:
When xferring my newly created tar.gz archives (will always update) i see approx 700Mb/s.
When xfering my “4TB nfs disk” where not much has changed, it takes approx 45 min, and usually with Kb/s xfer rate … Lot’s of small files (linux kernel sources etc.). That’s because once the initial xfer was done, not much will change , be xferred. But rsync still has to calculate/check if the file has changed, takes time.
What rsync arguments are you using? Picking the right ones can make things go much faster. If you are trying to checksum the file it will take as long as it would take for the host and target system to read every file and calc its checksum. So that would mean you are reading 21TiB on each side before anything happens.
Are you running rsyncd on the target side or are you doing this over a mounted filesystem?
Here is a script I use to sync my music collection from my Mac to my NAS. In this case the Mac is the source of truth so I just want the NAS to match. I am running the rsycnd App on the TrueNAS side. Also note if you remove the “–progress – stats” it runs about 15% faster. File list generation time on 96K files is 0.053 seconds.
Please do not try this blind. You can run it with “./script.sh test” and it will do a dry run and tell you want it will do.
#/bin/bash
#PUT
SRC=/Volumes/data1/Media/Music/
DST=rsync://barrel:30026/music
EXCLUDE_FILE=rsync-exclude
if [ "$1" == "test" ]; then
RSYNC_START="/opt/homebrew/bin/rsync --dry-run"
else
RSYNC_START="/opt/homebrew/bin/rsync"
fi
if [ ! -d $SRC ]; then
echo "Local directory is not there. Please fix and try again"
else
$RSYNC_START \
--iconv=utf-8-mac,utf-8 \
--force \
--size-only \
--no-perms \
--no-owner \
--no-group \
--omit-dir-times \
--delete \
--progress \
--stats \
--recursive \
--exclude-from=$EXCLUDE_FILE \
$SRC \
$DST
fi
I googled to see if there was anything I could do to make it run faster and could find anything i.e. force parallelisation similar to downloading a file via multiple connections
Irrespective of the size of the files, and whether there has been any chage, rsync needs to check each and every file. One hour for 2.1M files? Not too bad…
You may speed it up with a special vdev or metadata L2ARC so that rsync has faster access to all the metadata it needs to browse.
But moving the backup to ZFS and using replication will be faster.
Rsync needs to pull all the file metadata from the drives, in order to compare the directory listing to the target.
At minimum, it needs to know “does this file exist?” In order to do that, it needs to crawl through the entire directory tree.
Most rsync setups need more than that, though. They also need to compare any difference in sizes and modification timestamps.
No matter what, the entire directory tree needs to be crawled and read from the storage drives.
The only way to speed up this process is to bypass the drives and do this all in RAM. How can this all be done in RAM? By keeping the relevant metadata in the ARC.
How to keep all this metadata in the ARC? By having enough total RAM to safely house it, and adjusting a ZFS parameter to mitigate “pressure” against metadata in the ARC.
In summary: You’ll need to increase RAM and adjust the ZFS parameter to greatly favor metadata over data in the ARC.
As @awalkerix said: ZFS replications are way better. It already knows what to transfer, without having to crawl an entire directory tree, as is the case with rsync.