Brand new to TrueNAS and I’m considering it for my next home server. Currently, I use Debian 12 on my primary server with MDADM. The MDADM array is comprised of 20, 4TB drives in RAID6 (with one hot spare) and the array is about 90% full and is XFS.
This server backs up daily to a 2nd server. The 2nd server is also using MDADM and is RAID5 with 10, 8TB drives in (with one hot spare) and XFS.
If I remember correctly, it took about 12+ hours to resilver the RAID6 array when I added another 4TB drive and about 27 hours to resilver the RAID5 array when I added another 8TB drive.
The drives are all WD Red NAS drives chugging along at 7200RPM.
My big question is, has anyone resilvered the same capacity drives (that were about 90% full) in ZFS and if so, about how long did it take?
I read that ZFS is ‘data aware’ and only copies parts of the drives that are holding data when resilvering which should make that task quicker than a hardware RAID card or MDADM but I’ve also read that when a ZFS pool gets in that range of utilized space, the resilver speed takes a big hit. True?
I’m thinking about swapping out the 4TB drives in the RAID 6 array with 8TB drives and swapping the 8TB drives in the backup server to 16TB. Doubling my capacity in both.
I’ve seen some posts here and there that talked about making arrays that big with ZFS (in particular, using large capacity drives…16TB, 20TB, etc) while completely doable, things are going to drop to MDADM speed when resilvering.
It is very hard to know, since it depends on the type of data, how much server memory you run and how full the drives are, etc. I am running ZRAID2 (which is sort of like RAID6) with 8x16TB hard drives, but I am only 30% full. A resilver takes ~5 hours (WD X18 7200RPM). My data is a mix of documents, photos, and video.
All in all ZFS is a WAY better data protection system than MDADM, but it also hates to be really full. If you are planning at running at 90%+ usage, I don’t know that I would go ZFS (90%+ is not great for any file system IMO).
ZFS is not just a file system, it is actually a storage stack that includes some functions that are usually implemented by the operating OS kernel, such as ARC cache.
ZFS has its own storage allocator that divides the storage space into about 200 areas to minimize file fragmentation. When the storage occupancy rate reaches about 80%, since most of the storage areas have been exhausted, ZFS will switch to another storage allocation algorithm, which will greatly reduce write performance.
It should be noted that ZFS is not most concerned about performance. Due to the existence of MetaSlab, ZFS does not have real random writes and sequential writes. Coupled with the characteristics of the CoW file system, this makes ZFS not suitable for many write-intensive scenarios.
RAIDZ and RAID5/6 store their data differently an so resilver in different ways.
RAID5/6 are physical stripes with a 1:1 block relationship and they can resilver from block 0 to the end sequentially (possibly skipping unused blocks). This has minimal seek time during the resilver.
ZFS mirrors also adopt this approach.
However RAIDZ resilvers cannot take the approach because the blocks are not written in that way, so I would expect an (85%) full RAIDZ vDev to take significantly longer to resilver than a RAID5/6 equivalent.
And apparently RAIDZ resilvers are also proportional to the amount of data in the pool - so both the amount per disk and the number of disks.
Fortunately, I have never had to resilver, but perhaps some people who have actually done this can comment on how long it takes.
Finally, resilvering is supposed to be a very occasional activity, and as such resilvering performance should not be as important as performance under normal load. (ZFS gives priority to real-workload i/os above resilvering i/os, and TrueNAS also has a quiet hours scheduler to reduce the impact of this still further - so aside from a wish to get back to a non-degraded, fully redundant state, the impact of resilvering on production workloads, and a wish to minimise the resilvering time, should be less important.)
Some Western Digital Red 4TB drives are SMR, which is not suitable for ZFS. But, your re-silver times for your MDADM RAID-6 seems to indicate they are CMR, not SMR. However, no harm in checking anyways.
You mention 20 x 4TB in a RAID-6. ZFS RAID-Zx, (with 1, 2 or 3 drives of parity), does not do well in wider stripes. Some think 12 disks maximum per RAID-Zx vDev is a good stopping point. Others go higher. 20 disks is a bit too high in my opinion.
To be clear, a ZFS pool consists of at least 1 data vDev. But, pools can have many more, and special vDev types, (that are generally not useful to home and small office users).
I’m not too concerned with high I/O as the drives fill up… this is just going to be for a Plex server at my house and a Windows backup location for a few PCs.
My main concern is resilvering speed. HDD’s die and when they do, I rest easier at night knowing the resilver is complete and the array is healthy again. If that process drags on for days and days… I get a little worried.
This theoretical server will probably have 32GB of RAM. That too low for ZFS with these array sizes?
Out of the gate, they’ll be plenty of space but just like my current array, as time goes on… it will start to fill up and I guess I’ll bump into slow rebuild times if the unthinkable happens.
It will just be housing a Plex server with 1080p and 4K content. I asked in another reply how bad would the speed dip. Are we talking to the point that if I was copying a file to/from the array, I’d be looking at 100kb/s if a resilver was underway on drives that are 80% full?
These are CMR drives.I was thinking about getting fewer, higher capacity drives but if I stick with many, smaller ones… guess it’s better to stick with MDADM?
Knock on wood, MDADM has been performing perfectly for years and years and years. I randomly gather a checksum of all my files from time to time and compare them days/weeks/months later and so far so good.