(wrong) Partitions as the Optimal ZFS Solution for new build

Update: See marked solution. My initial logic was heavily flawed and another win for “there is a reason the best practices are what they are”.

Hi all, new to TrueNAS (using 25.10) but am well aware that partitioning drives is not best practice nor fully supported by TrueNAS. Still, I am having a hard time seeing it being the wrong choice for a homelab (I understand as well not the TrueNAS target customer). With partitioning I would see the ability to have very high perf for part of the storage and bulk space for the rest.

I put together a 12 HDD (10TB enterprise sata 7200 drives), 4 SSD, 4 NVME system w/ 192 GB ram. The use is mixed use with a few VM’s, several docker containers, and a much larger amount of file storage that generally has low throughput access requirements.

For performance and redundancy 9 HDD drives are put into mirrored groups of 3 and then striped. Now, with this setup in a standard configuration it would be roughly 27TB of usable high performance storage. Given somewhat slow internet speeds having some larger datasets locally is beneficial so 27TB isn’t a huge amount of space. An alternative option is to partition the drives. First a 2 TB partition used for VM’s and databases in the mirror description above resulting in 6 TB of high perf read/write. Second, a 7 TB partition put in a RAIDZ3 configuration over 10 drives (rather than just the 9 mirrored) for 49TB of usable space.

Negatives of this approach:

  • TrueNAS UI doesn’t support partitions anywhere. It cannot be used for creation, but it is easy enough to create this from the CLI with the pool exported and then imported through the UI. Hot spares cannot be setup via the UI but adding it via the CLI seems to work.
  • I am not sure about potential minor performance degradation when the RAIDZ pool is used at a low level if the fact two pools use the same drive. ZFS and my basic understanding of the ZIO scheduler assume the ZDEVs are dedicated disks.
  • Scrubs should be staggered to make sure both pools are not scrubbed at the same time
  • You can’t really prioritize one pool over another (as far as im aware) meaning low priority IO on the RAIDZ pool could hamper higher priority IO on the Fast Pool.

Positives:

  • 56TB of usable space, nearly double of a pure mirror configuration
  • The 49TB pool can take 3 full drive failures without data loss
  • Things needing the absolute best performance still get that with the fast pool

An alternative would clearly be to use different drives for different pools. This would greatly reduce performance though.

Further Details
In worst luck with the mirroring, it would allow two drives in the same group to fail and still not need to restore backups. The system is on a UPS and most things critical are continuously backed up remotely (although restoration a pain). As such, I am comfortable with async enabled for all storage.

The SSD’s and NVME’s are largely just existing had hardware but fairly performant. These VM’s are used as my primary desktop, a development machine, and the containers go from realtime networking applications to web servers. Performance in the VM’s is pretty important activities can include things like compiling chrome source which its 10’s (100’s?) of thousands of files and can take over half a day. Dataloss is pretty unacceptable too, while will backup semi often the internet connection is not great so remote recovery would require hd’s to be mailed or a long download. If the server burns down I accept it will take me awhile to restore.

Given async=off there is no need for a vlog drive. The NVMEs/SSDS are not all the same models as they were just on hand but they have decent speeds. The NVME’s would be good as a SPECIAL vdev mirrored together potentially. The SSDs maybe a L2 cache.

Yes I did say 12 drives and only talk about 10. At least one I will likely keep as a hotspare and 12 if something of a hard limit and mirror groups of 3 only go into 9 or 12 so likely only using 9 or 10 of the drives.

You do you.
Just don’t expect us to help fix it when that dogs dinner goes wrong.

Oh and with 192GB of RAM on a homelab - I suggest that the L2ARC is a total waste of time.

Personally I would do a Z2/Z3 of the HDD’s and use the SSD’s in a high speed pool and the NVME’s in another high speed pool. Use one for containers and one for Apps

But as I said - you do you

1 Like

Others have done some odd things with TrueNAS, including myself in the early days, but I have come to realize that using supported methods reaps the biggest gains in supportability.

As @NugentS said, “You do you.” I was going to say something similar as well.

That is a lot of data storage to hang on two NVMe drives mirrored to ensure you do not lose your data. It sounds like you may understand the Gains, however I don’t think you realize the pitfalls of a sVDEV.

It isn’t generally a wise idea to go against standard practice, especially something where the core system doesn’t support the use case. I just don’t quite understand the why against where it would seem like a potentially optimal use here where you could meet both performance in bulk needs with two differently configured pools.

That is a lot of data storage to hang on two NVMe drives mirrored to ensure you do not lose your data. It sounds like you may understand the Gains, however I don’t think you realize the pitfalls of a sVDEV.

I would likely do either a 3 way or 4 way mirror for the NVMe’s to avoid data loss. I might have more concern about some emergency need of attaching the array to hardware without enough NVME support for some reason (motherboard failure?) but I would guess I could also backup the sVDEVs to the bulk storage array and restore those to true HDD’s temporarily if needed.

Using the NVMEs the two configurations would be a 2 wide mirror which would only allow one failure (but clearly nvmes should be less likely to fail together compared to HDD’s) or a mirror across all four. This loses the ability to use the NVME’s for a special vdev (unless the 2 TB ones were partitioned, ha) for the HDD array.

Even in the basic striped mirror though you would have only 1.8TB of usable high speed space vs 6TB.

Oh and with 192GB of RAM on a homelab - I suggest that the L2ARC is a total waste of time.

Likely but had the drives spare and it being able to keep the VM os drive blocks or larger database data warm on the SSD would seem beneficial (not that one can control what gets stored on the L2ARC but I would guess given the repetitive but infrequent use that would be the case).

Still there is a reason for KISS, and for something like the boot pool would seem obvious why to avoid partitioning. On datapools though I am just not sure as to the choice of TrueNAS not to support partitions, other than for enterprise customers saying if you have two use cases have two sets of hardware as the additional cost and space are not huge.

One nice thing about TrueNAS as migration on these pools is easy so while I have started some experimenting I could shift everything to the SSDS and recreate the pools in the ‘recommended’ configuration.

The gross oversimplification here is “the laws of physics suck.”

Putting two pools on the same spindles will cause a huge amount of seek penalties. You’re assigning the first 2T of physical LBAs on the drive to one pool and the remaining to another - so when ZFS needs to do I/O to both pools, the underlying drives will be furiously firing the heads back and forth between the two regions.

Potentially adding 8ms seek time + 4ms rotational latency to each I/O - especially for VM workloads that you state as requiring “pretty important” performance, which tend to be small recordsize likely won’t give you the results you’re after.

This is all leaving aside the potential wrenches you’ll be throwing into the works w.r.t the TrueNAS middleware, which expects only one pool to exist per physical device.

My rule of thumb for arranging my storage is the following:

  • I use a Z2 array for my bulk storage (multimedia files, pictures and some .iso files for my experiences). This array is usually only compression enabled, however those files are usually not really able to be compressed any more.
  • I have a separate, mirrored pair of 6TB HDDs for all the backups on my network. On this array I usually enable deduplication too, besides the normal compression. This storage is not meant for bulk storage, only backups.
  • If I need speed, then I use 1, 2 or 4 NVMe PCIe 4x SSD-s, striped with proper snapshoting and backup to the previous drive to avoid data loss. VMs are ALWAYS installed to at least a single M.2 NVMe. I dont even use sATA SSD-s any more. (even a Gen3 NVMe is like 7-10 times faster than a sATA SSD, so there is absolutely no point bothering to build up an overly complicated thus a playground of data loss system)
  • Anything that does not fit into any of the above categories, will live on y separate dataset on the first, bulk storage array
  • Of course this system does not replace a proper 1-2-3 backup system!

Succinct and exceptionally well put, thank you for taking the time. I did a good bit of research and with some things like FreeBSD’s ZFS guide mentioning partitions vs whole disk there suffered no performance penalty but I missed that semi-obvious problem. In the two pools one drive configuration ZFS cannot order the data requests in best order for the drive as a whole, and I only skimmed this with my thought that the lack of io-prioritization would cause an issue. HDD heads and their quite large seek costs could easily cause quite a bit of thrashing, and my theorized worst case really would only be closer to the worst in SSD’s but HDD clearly far worse. FreeBSD’s info might be true, but it only talks about using the drive for things like boot code or undersizing to account for drive differences later not clearly 2 pool 1 drive idiocy.

Apologies for the obvious question and thanks everyone for setting me straight.

2 Likes