I have two TrueNas Scale servers. One contains the data, the other the backup.
I have connected the two servers using 10gb fiber optical cable.
I use rsync for backing up the data.
It used to work fine but now this happens:
when rsync job starts the data is transferred at 6gb speed. I can see the zfs cache fill up until there is 0.6gb ram free left. Then the transfer falls back to 0. Like it suffocates… ?
I did upgrade Ilo and Bios recently on the back server (ml310e gen 8).
we need more information about the system, pool layout, how much ram?
You’re most likely bottlenecked by your pool speeds, but without more information we can only guess.
The server is an HP Proliant ML310e, 32GB RAM (HP smart-memory; 1600 mhz), disks are 4 * 10 TB Nas Drives (WD Red) and the Pool layout is a zfs-pool, stripe (I know, I know …).
If this helps, then the “permanent fix” is to check that your storage controller is set to “AHCI/SATA” or “Non-RAID” mode in the BIOS, and then look for a setting in your BIOS for “Physical Drive Write Cache State” and set it to Enabled. ZFS is able to control the cache on individual drives and will send them flush commands for data safety.
If there’s no BIOS setting, then you could set up a scheduled task to re-enable write caching after each boot, but let’s see if this fixes your issue.
I also see that you’ve got a couple USB-connected drives (the 3TB WD and the PS4 Seagate Game Drive) - USB attached drives aren’t generally encouraged with TrueNAS and the PS4 one specifically is a “shingled” or SMR drive which is doubly suspect. What are these used for?
I got the same results, alas.
For a minute it’s working fine, but then it sort of suffocates, again.
Maybe I have an incorrect network setup? Let me explain.
I have a HP Fiber network controller that has two SFP+ ports. One of the ports is connected to the internet and in my network (192.168.1.x), this to make my backups available for every device connected to the 192.168.1.x-network.
The other SFP+ port is connected directly to the server using 192.168.3.5 as IP address.
I have mounted /mnt/backup/video to 192.168.3.6/hpprox/video (server IP address that has the same HP network controller).
I use:
rsync -rau --no-perms /mnt/video /mnt/backup/video
to synchronise my movies and tv-series every three hours (job in crontab) and once a month I I have the same job, plus the --delete parameter to mirror (if I accidentally delete a file I have until the 1st of the month to restore if needed).
The weird thing is, this has been working for a year, so I am a bii puzzled what has happened, other than the ILO and firmware update. I can’t seem to access Smart Provisioning either. Could this be related to the issue I have? Should I do a server reset for example?
Just so you know: when I use Rsync via TrueNas’ Data Protection / Rsync tasks, it uses the 192.168.1.x network and I notice that when ZFS cache is full and 0.3 GB memory is left, data must be written from cache to disk because free memory goes up and down like an accordion. This is good, I think.
Only when using the 192.168.3.x 10GB network it seems to suffocate.
10Gbps is probably beyond the capability of the destination system to receive, but I’d think you would be seeing a more gradual slope and “leveling off” of the transfer at the actual throughput speeds of your vdev as the ZFS write throttle starts to constrict things down.
However this is giving me some questions:
Are you saying that the source system (192.168.3.5) is mounting the filesystem from the destination (192.168.3.6) as a local path (potentially over NFS or SMB?) rather than sending an rsync job, ZFS replication stream, or other method of transfer?
Yes, I am very used to using Rsync in this fashion. Bad habit maybe…
And I am pulling from the backup system. Should I give pushing from the source system a go and/or are you suggesting to use ZFS replication?
I’d definitely look at setting up a regular ZFS replication job (as a PULL from the backup system) as it will likely be much faster and more efficient, as well as letting you preserve your snapshot chains.
I am using TrueNas replication now and have setup a SSH-connection through the 10GB network card. It seems to be working fine now.
And indeed, that replication works pretty nice!