Global performance drops when copying large amount of files to Dataset

Hi everyone;
I’ve had my Truenas Scale system for the last 2 years-ish, and just recently (this week) added a new ZPool and upgraded to Dragonfish; I know I should have done one, then the other, but I was also trying to set up my second server as a backup.

I’ve ran into some issues, when copying to the old Dataset I’m getting a general slowdown on the system, copying only between 0 and 40 mb/s (over a gigabit ethernet), and the Reporting tab on CPU Usage has the Iowait off the roof during that time.

I’m unsure where to look to improve the performance, here are both systems, my issue is on my main system:

Main TrueNAS server
TrueNAS Scale Dragonfish-24.04.0
1x Xeon X5650 @2.66Ghz, 6 Core/12 Threads (MB has space for an extra CPU, not used at this time)
16Gb RAM DDR3 EEC
2x 1Gb Ethernet ports
LSI 9211-4I 9211-8I SAS/SATA 6Gbps HBA LSI P20 IT Mode for ZFS

boot drive : SSD 64gb - KINGSTON
data_vault pool : 5x 4Tb Seagate IronWolf 4TB NAS Internal Hard Drive HDD – CMR 3.5 Inch SATA 6Gb/s 5900 RPM 64MB Cache in RAIDZ2
media_vault pool : 3x 8Tb HGST drives Refurbished SATA 6Gb/s 7200 RPM in RaidZ1

Backup TrueNAS server (not having any problems so far)
Running under Proxmox

i7 Using 4 cores/4 threads
8 Gb RAM allocated (NON-ECC)
1x 1Gb Ethernet ports

boot drive : 32Gb Virtual disk
data_vault pool : 3x 8Tb HGST drives Refurbished SATA 6Gb/s 7200 RPM in RaidZ1 (direct access)

Running Dedupe or compression?

Have you done and fio test on your storage systems to see how they perform in general?

Thanks for the answer.
All of my datasets have ZFS Deduplication to OFF, and all are using Compression Level Inherit (LZ4)

However, I think that I had still replication running to my other server. As soon as I have any copying of files, even from one share to another, the system slows down a lot.

As for fio, I don’t know how to test it, I did a bit of search, and I can’t find a clear explaination on how to test that and understand the results; do you have any links that I could look into?

Well you answered my first question which would have been about SMR drives but it sounds like you do not have any of those.

Next, examine your SWAP Space, GUI → Reporting → Memory. Make sure you are not using any swap space, the Used value should be 0 (zero). If you are using SWAP, you have run out of RAM. Since you are using Proxmox to run TrueNAS on, just allocate another GB of RAM.

See if this is the issue.

EDIT: How are you copying the data? Be specific. I’m looking to understand the flow of data. Are you routing it through a second computer to copy between two SMB shares? Are you using the cp command (which would be by far the fastest way) to copy the files from the first dataset to the second dataset? Is WiFi involved at all.

1 Like

Thanks, I have 2 methods for copying :

  • File copy on a Windows Workstation from one folder to the other.(through SMB) via the Windows Explorer
  • Data Protection → Replication tasks pulling from the main TrueNAS (Barebone), the one that had the performance issue into the Proxmox-hosted TrueNAS (I didn’t check the performance on that one yet.)

For the SWAP, it seems to have some impact, thanks, I have a permanent 1.44GiB swap used, and it raises again when I’m copying file, so I’ll try adding more ram tomorrow, see if it helps.

That is definitely a problem. Give the VM at least 2GB more RAM, a little more wouldn’t hurt. Check the SWAP file again, it needs to be a zero value or you lose performance.

You must be running applications on TrueNAS, I say that because I run TrueNAS on two different systems, both on top of ESXI and I give them only 16GB RAM, but other than being a simple NAS, not much else.

And of course the Windows copy will be half speed at best, but that is given.

I don’t use Replication either however I thought I read that some people were having replication issues on Dragonfish so it is possible you could have that issue as well. A new version should be out in a few days, not sure if it will fix that issue.

1 Like

My issue is on the bare-metal one (16Gb ECC RAM), but I used to run some VM on it, had them all stopped and removed. I will upgrade to 32Gb RAM tomorrow and see if it helps the performance; Honestly, it’s been happening since I added a new pool of 3x8Tb, and it may be also because I’m copying files to my new (this one is on a VM) backup TrueNAS.

Do you have another suggestion for synchronizing files between 2 TrueNAS if not via Replication tasks?

Rsync is the only other option and it can be inherently slow. Replication as I understand it is the preferred method but you may need to just wait until the next release comes out, which should be this Tuesday but again, I don’t know if that particular problem has been fixed.

I am a little confused again, that is easy to do. I just always want to be very clear on everything so proper advice, if I have any, is provided to you.

You have a TrueNAS on bare metal, that is your primary NAS. This is the unit you added three 8TB drives to and is the machine that is slow.

You have a TrueNAS on Proxmox, as your backup NAS.

The backup NAS does not have any SWAP Space issues (reads zero (0)) but the primary is having SWAP Space issues (reads 1.44GB).

I’m making assumptions here so correct me…
Both machines are connected to a 1Gbit network switch via NICs in the machines? No WiFi is in play at all.

Hopefully all of that is true.

Q) Something I didn’t ask you because it sounds like you said it worked except for replication, what brand is the NIC in both machines?

Q) Are you copying data from your backup NAS to your Primary NAS new dataset/pool? Sounds stupid but reading what you wrote, I’m not certain.

If you are copying data from within your primary NAS from your original datasets to the new datasets/pool within the primary NAS, there is a significantly faster way to do that and the transfers do not go out over the Ethernet to slow things down. But this is not replication nor Rsync, but it looks more like Rsync, but rather it is a few cp commands. But that can’t be used if you are transferring between machines.

Add that extra RAM into your backup machine and hope that is the fix.

I have never used Proxmox to I’m not the authority to ask for help using it. I do expect that you passed through the entire LSI card to the TrueNAS VM and not individual drives.

Hopefully you find out that Dragonfish needs more RAM and the problem is solved, only because I’m sure you just want it working properly. We will see what tomorrow brings. Troubleshoot one thing at a time.

Last thing… Do not upgrade ZFS features when you upgrade to a different version of TrueNAS. Doing so takes away the option to just roll back to the previous version you had running. My pools are from FreeNAS 12, no need to update them with features I will never use and I can move between FreeNAS 12 to Dragonfish without issue. However, I do not have VM’s or any jails, nothing else running. I did have SyncThing running on SCALE, of course it didn’t run on CORE but I didn’t expect it to. I move back and fourth as I please (once every few months) for development reasons.

Hope to hear the problem is fixed soon.

1 Like

I’ve upgraded my baremetal one to 32Gb RAM, no Swap usage anymore, running some tests today to test it. I’ll know for sure by the end of the day

Unfortunately, I’ve made the mistake of upgrading my pools to the new version, I wish I’d have thought about it before, it makes sense.

I’m not focussing on my backup NAS yet, and I’ll have to check all those steps again with it, but so far, the performance is much better with 32Gb RAM on my primary.

I’ll post back the results.

EDIT:
To answer the questions, I’m using the mainboard NIC, not sure what is there, my Main NAS has 2, but I can only see 1 of them; I think it’s because the second is tied to the 2nd CPU, and since I haven’t installed one, that might be the issue, not sure. I’ll cross that bridge next.

Question 2: I’m copying most of the data from the baremetal NAS to my VM NAS, but one dataset is actually going to be copied back (sync’d) to my Baremetal. The goal is to have a backup at my son’s place, but also allowing him to use the data, and be able to put his own files on the backup NAS and have those files backed up on the main one.

For future reference, making a “checkpoint” of your pool prior to an upgrade allows you to revert, undoing the pool “upgrade” itself.

3 Likes

24.04.0 has a swap related issue with lru_gen.

It should be resolved in 24.04.1, which is due any day now.

I’d suggested upgrading to it when it comes out :slight_smile:

1 Like

I read that before but apparently forgot about it. Good information.

Vote for this ticket! For great justice!

I’ve upgraded to 24.04.1, no more swap, this is perfect, as the cache takes a LOT of ram, but doesn’t uses swap (it’s now disabled by default), this prevents the performance drop that I was plaggued with.