Huge performance loss with long file transfers

Hello,

Community edition 25.04, Ryzen 8500G, 32GB DDR5 6000, 6 x 8TB sata disks raid z1, 2.5Gbe, OS on a 250GB NVMe.
I’ve been running it for around 1 week now. It worked fine and I normally get between 200 and 240 MB/s read from my PC. This afternoon, I was copying back TB’s to USB disks to prepare for breaking/rebuilding the raid-z1 due to the issue with space after expansion, and suddenly the performance dropped steadily to 15-20MBytes/s. Cpu is almost idle, no other task running. Is this a known issue?

I rebooted truenas, and performance was back to normal. Then after a couple of hours of reads, it dropped again, this time as low as 10 Mbytes/s or even less. Stopping the files copy for a while and restarting does not change anything.

Edit: 2nd reboot, was fine again for a while (got a stable 150 MB/s “only” because I’m writing on a rotational “slow” usb disk but that’s fine), and in less than 1 hour it has dropped again, currently to 7-10 MBytes/s…

  • What external disk do you use?
  • How do you exactly copy your data?
1 Like

Hello. It’s a 5900 rpm 8TB drive in USB 3 enclosure, plugged directly on a motherboard port (not front). But I tried to copy to an internal SSD when the performance drop hits. Same result. I use Total Commander for all file operations.

PS: I tried a long pause this time, around 15-20 minutes, without rebooting, it also seems to work. Strange.

What model HDD? If it’s a Seagate Baracuda I think that’s an SMR drive and that’s expected data throughput (from experience, I have that drive).

3 Likes

Well as already said, if I try to transfer from truenas to a local SSD, I get the same performance when the issue is hitting. And if I try to tranfer from a local SSD to the external disk, I get 100+ MB/s at anytime, so the issue is with the source, ie truenas.

Going out on a limb here: are the drives in the RaidZ vdev also Baracuda drives?

Also, what size files are you copying?

1 Like

Drives are mixed in the raidz, and 4 of them are Barracudas, indeed, but… these 4 were previously in a middle-aged QNAP TS-431P and I never had this issue for years, including when I pulled the data from the QNAP to USB drives just before removing the drives from the QNAP and putting them in the truenas machine. So this is not the issue. File sizes vay from a few MB’s to several GB’s, lots of media files there.

The SMR drives will eventually clog up during large data transfers., and the holding storage fills and the drive struggles to copy onto the shingles. It’s worse when it’s trying to write small files ie meta data, or pdfs or small image files as the write speed goes way down anyway, but the interim storage becomes full.

When reading from a RaidZ of SMR drives, if the drives are stuck reading small files that can also hit the transfer speed.

This is unlikely to be a Truenas issue and more likely to be a hardware issue.

1 Like

Well it’s mostly big files with sizes of x GB. And again these were the very same drives in the QNAP and I never had this issue in years, the worst sequential speed on large files I could get at anytime was in the 80-90 MB/s range (it was also 1Gb ethernet only), here it’s 10 times slower when the issue hits. I understand it’s unlikely in theory to be a software issue, but all symptoms and tests show that the disks should not be the issue either, so I wonder where it is :slight_smile: Maybe the way raid-z works with the drives… or could it be a side effect of the raid expansion? I started with 4 disks and added one more then another one a couple of days later.

Another point that I already mentioned, in details: a big file copy is stuck at 8-9 MB/S. I cancel the copy, reboot truenas, restart the copy, boom, 100MB/s. The same file. So it does not sound like a hardware issue there, if a simple reboot of truenas solves it?

Not easy to diagnose for sure.

The Files and Drives may be the same, but the filesystem is not and the way zfs works isn’t fit for smr drives.

1 Like

I understand this. But it still doesnt’ explain why a simple reboot fixes the issue for some time, with the same files which are at the same place on these disks.
I think I’m going to try OMV since I have to break/rebuild my raid-z anyway, maybe I won’t have the same issues, but nothing is perfect for sure.

I imagine that the dataset is using the standard 128k block size. The blocks themselves have metadata which has to be stored on the SMR disks, written to for copying and read to be able to find each block as it needs to be copied.

So every time a block is copied one of the drives has to stop what it was doing, look up the metadata, and then go back to copying. SMR drives are pretty poor at that.

Given the investment in SMR drives Truenas is unlikely to be the best solution for your needs.

1 Like

…but my issues are only read there… no writes.

And… ?
Why do you assume that your reads are sequential? That everything that needs to be read (data and associated metadata) is arranged sequentially on drive?

ZFS writes sequentially… everything that is in a transaction group. If you filled the pool by writing more than one file at a time, all files that were incoming have been sliced into 5 second chunks of received data, mixed together and commited together as one single sequential write. Nice write performance. Now, due to that write optimisation, any file which was received across multiple transaction groups has been sliced and intermixed with other files, and reading it back is NOT a sequential operation.
And that’s before inquiring whether a SMR drive doing its silent little business of rearranging shingles during idle time could further disperse ZFS on-disk structures…

2 Likes

Actually the reboot “fixing” it is easy to understand: it’s not actually fixing anything.

If the HDD are device managed SMR drives, then the HDD is saying to truenas that the information is written when it has not yet been placed into the singled areas. So cancelling the copy and then rebooting is probably wiping out information which truenas thinks has been written but hasn’t, and then when the copy is restarted on a new file previously “written” blocks might be missing. Yay data loss.

The reason it “works” is that the cache is emptied, and then when the file transfer is restarted it goes back into the drive cache before it gets written into the shingles, so it looks like the drive is copying at a faster rate again.

Look, I’m no where close to an expert on this subject, but I do have two crappy SMR drives (Seagate Barracuda) that I’ve played around with in truenas and I can say this: all performance is slower than the CMR drives I’m using (Ironwolf Pro). The writes slow down. The reads aren’t great either even when using metadata vdevs (so the metadata is on an NVMe drive, not the SMR HDD). The difference in performance is down to the the nature of SMR drives, if you’re looking for a “why” answer maybe the thread above can help you with it.

1 Like

Again, I was only doing reads there, and when I copied back the data to true nas before this, it was 1 stream at a time. But OK, in the end it seems there are only 2 solutions anyway, either switch to full CMR drives, or switch to something else than Truenas. Clear, thanks.

That’s really weird. I’ve been pulling from the truenas 2 x 130-140MB/s stable in parallel (I think the 2.5 Gbe is maxed out) for more than 30 minutes now and it doesn’t slow down… I still don’t get it.

So… I’ve just swapped the last of the 4 SMR disks I had by a CMR, it’s resilvering.
I’ll update for reference once it’s completed and I have made tests to see if the problem is gone.

1 Like