[SOLVED] Planning a SCALE deployment, looking for sanity check on Vdev Structure

I see 2 issues here.

One is that, currently the data I need to save is stored on the 18TB disks, I bought the 2 extra so I could use them as temporary storage for when I load the NAS back, as, as far as I know there’s not going to be a clean migration route for my current data.

So that makes the 4 disk VDEVs kind of impossible for me to set up without buying a bunch of additional hard drives. Though as I write this it occurs to me I can probably pull the 4x 4TB drives and use those. (You are suggesting a 1 disk parity v dev yeah?)

Second, I don’t love the idea of a separate fast Vdev, I kind of love the idea of the Metadata and small files being on the nvme, without me having to consider where to store them.

It also means I end up with another fragmented storage solution, and I kind of admire the simplicity of a single volume.

Actually upon further consideration, I think you’re onto something with the fast pool, Though I don’t love the idea still, the reality is if my projects got much larger then the space I have I’d be more inclined to simply buy more drives to expand the fast storage.

Would you clarify what you mean about the 3 way mirror raidz2 pool? That kind of went over my head honestly.

I may be wrong, but I believe the small file size is actually based on block size, not file size.

Ie if your block size is 1MiB, then all files will be stored on the special vdev as the maximum block size used by all files would be 1MiB.

Seems counter intuitive, I could be wrong too, but I do remember this being cited as a use case.

It was definitely along the lines of ‘you can set it to store everything under a certain file size’

No, I’m suggesting raidz2 vdev with double parity: Raidz1 with 18 TB drives puts too much data at risk in case of drive failure. So the suggestion is double parity so you still have parity after loosing one drive—not more.
And then a 3-way mirror as special vdev (if you go this way) to have a similar level of resiliency on the special vdev.

Having one single pool is nice, but it’s not going to play nice and fast whenever spinning drives get involved. So my suggestion, assuming that your working set while rendering fits entirely in the NVMe drives (“1 TB or more” of base data and 1-2 TB of temporary files while rendering) is to do it all on the “fast” pool, and then store the final result in the “slow” pool.
Tiered storage, managed by user.

If you have “about 8 TB” of data, you can:

  1. Make a mirror pool with the two new 18 TB drives, and transfer everything on it. (Alternatively, all the SSDs together could almost do it.)
  2. Delete the Unraid array.
  3. Make a 5-wide raidz1 pool with the five 4 TB drives and replicate from the 2*18 pool.
  4. Destroy the mirror pool to create a 4-wide raidz2 with all four 18 TB.
  5. Replicate one last time from the 5-wide raidz1 to the 4-wide raidz2.
  6. Optionally, add a second 4-wide raidz2 vdev with the 4 TB drives.

I’m unsure whether ZFS would accept the last 4 TB drive as a spare which could only be used for one vdev but not the other. Probably not. So keep it as cold spare.

1 Like

You know what, You’re absolutely right, this is the better strategy, thank you very much for providing the step by step instructions for the best transition too, that is very helpful!

I have one final question for you:

What makes the raidz2 solution superior to simply a pair of mirrors? If I’m recalling correctly, it’s because parity is stored, in effect, in part across the entire array? And thus this allows more disks to fail then a simple mirror?

Thanks again.

Happy to be useful. :slightly_smiling_face:
(At steps 3. and 5., make sure to create another pool and not add another vdev to the pool: This would be a mistake with no easy escape!)

A striped mirror can loose one drive in each vdev; two failing drives in the same 2-way mirror vdev kill the pool. Full stop.
Raidz2 can loose any two drives; striped raidz2 can loose more than two as long as there are no more than two failing drives in the same vdev.
Additionally, there’s the issue of UREs (Unrecoverable Read Error): Occasionally, a drive cannot read a given sector. RAID as well as ZFS would then restore data from parity… except if the array is degraded and has no redundancy left. With traditional RAID, this could kill the array; with ZFS, you cannot retrieve the affected file, which is less severe but still annoying. The URE is usually given as “less than 1 in N bits”, with typical values for N being 1E14 or 1E15 for HDDs (SSD: 1E17). The former value amounts to 12 terabytes… So, if you believe, at least a little, in manufacturer’s spec sheets, you should be worried about resilvering so much data without any redundancy left. (In practice, assume that you’re loosing one level of redundancy to the risk of URE; then you want double redundancy to be fully protected against the loss of one drive.)

At 50% space efficiency, striped 2-way mirrors have more IOPS than 4-wide raidz2, but raidz2 is more resilient to multiple drive failures as well as to single drive failure + URE.
Mirrors are more flexible (can add or remove, evolve to 3-way… and can split), and the pool can grow by two drives at a time.
Raidz2 is quite inflexible, and the pool has to grow by four disks at a time. But it is safer.
Your data, your call.


Roger that, All points made clear.

Thanks again for your assistance.

Of course, since RaidZ2 is less flexible, you could consider acquiring a few more disks, and starting out with 5-way or 6-way, where you see a much larger benefit from RaidZ2. Ie with 6 way you have
4 data disks and 2 parity so circa 66% storage efficiency instead of 50% with 4-way.

Ie this would double your rusty pool size from 36TB to 72TB.


Mind you, RaidZ Expansion is coming :slight_smile:

72TB on 6 disks does sound very tasty…

But it will be a long time before I have need of such an amount of space. If expansion is coming that seems like something I can deal with in the future. Thank you for bringing it to my attention though, that is definitely worth consideration.

1 Like

You could create a RAIDZ2 VDEV in a degraded state without two drives, migrate the files, then add the drives… but it’s both risky and a PIA.

Suggested readings: iX's ZFS Pool Layout White Paper and Assessing the Potential for Data Loss.

Re: RAIDZ2 espansion, be mindful it will have its own cost as well. It’s no magic hand. Do note the caveats.

1 Like

Yeah honestly after saying this. I did give it a little more thought and decided to pick up a couple more drives to complete the 6 disk vdev straight away.

The better efficiency and numbers for when I end up pulling data from the slow storage attracted me.

I even snagged another 4tb to complete another vdev from those. Though I am not sure how wise that is running two of such vastly different capacities, especially given 4TB drives are very cost inefficient to replace new, and when the pool has 72TB. Not sure an additional 16 will make the difference.

I also bought an additional 2TB Sata ssd to complete the 3 way mirror special vdev, I figure with such a large capacity pool I’m going to want it.

I am going to cannibalise some higher capacity nvme drives for the fast storage to increase the space of that so I can use that for my original ingest. Just so I’m not transferring data back and fourth over a few days.

Is it preferred in this community to mark a thread as solved when it’s solved? I noticed someone marked a post of mine as a solution already.


Just an annoyance.

Just do note that once added, you cannot remove it without destroying the pool.

Generally, I would say yes. We moved forum recently, and some things are not as established as they should.

ZFS is smart enough to distribute the data between the two vdevs in such a way to take advantage of all the disks, correct? So given it cost me £40 to get another 4tb, it kind of seems worth it to add those, if not for the performance enhancement and to not contribute ewaste.

Understood, as mentioned those drives would only gather dust in a drawer if not deployed so, makes sense.


If you’re going for capacity with 6-wide raidz2 then you probably don’t want to add a second vdev with 4 TB as you could not remove it, and the 18 TB vdev is already well above the capacity you need.

1 Like

Sorry to bring this up again. But your comment has created another question I can’t wrap my head around:

I’m struggling to find distinction between those two raid types, my only assumption is when you say striped you mean where there are more than one vdev present?

Correct. In ZFS, multiple VDEVs are always striped. He’s saying that you can lose up to 4 drives in a pool that has 2x RAIDZ2 VDEVs as long as less than 3 happen per VDEV.