Benefits and consequences of adding more vdevs?

Lylat1an · May 9, 2025, 10:45pm

My server currently has one vdev, and I’m considering adding one or two identical ones.

I’ve read that writes to the pool probably won’t be any faster, but I’d like additional redundancy.

Also, would I need more RAM for the additional terabytes being added?

dan · May 9, 2025, 11:41pm

More vdevs won’t give you additional redundancy. In fact, they’ll reduce redundancy–if/when any single vdev fails, the entire pool is lost.

More RAM is always good, but I wouldn’t say you need more for a good while.

swc-phil · May 10, 2025, 12:13am

I’m 99% sure that this is wrong. Throughput/IOPS of the pool is a sum of the throughput/IOPS of all its VDEVs.

However, having 8 drives in RAIDZ3, you can probably already saturate your 10G network card (at least with sequential write).

saspus · May 10, 2025, 1:17am

Without knowing what kind of usage you expect and where is a bottleneck it’s hard to say what the best course of action would be.

On the contrary, adding second vdev will make your array approximatly 100% more responsive and faster. All vdevs share workload.

It has very little to do with how much emptty space you add, or even how much space you use. It depends on amount of metadata you will be handling, and data access patterns.

Depending on your usage you may end up wasting ram (yes, you can have “too much ram” – it wont’ hurt anythign except power consumptiiona and your wallet buying it in the first place – if smaller amount of ram was sufficient to fit working set of data).

Depending on your usage you may benefit from adding ram, or adding a special devie, or SLOG, or L2ARC, or more data vdevs, or none of the above.

To provide specific advice we need to know what kind of data accesses does your server experience and/or just look at existing bottlenecks and make targetred changes to address those. TrueNAS has amazing built-in graphs. Havea a look at them, specifically Arc Hit ratio under ZFS section, and Disk Busy and IOPs under Disks.

Lylat1an · May 10, 2025, 9:30pm

My server is typically accessed by one machine at a time for storing weekly backups at home.

The average sustained write speed of large files is 200MB/s.

I suspect that speed can be improved, and I’m willing to re-create the pool to get there.

Is there a command to show what options I used when creating the pool?

saspus · May 11, 2025, 12:04am

What kind of IO is this? A few large files or a lot of small files?

Are those synchronous writes?

For synchronous writes you can

disable sync on a dataset, if you have a reliable UPS
add slog (small Optane would work the best)

This will help offload some IO from your array. But I don’t think this is a bottleneck, I think your pool configuration is:

8x 4TB drives in RAIDZ3

Why was this configuration chosen?

It looks two raidz1 of 4 disks each would be appropriate here . And then add another vdev to add more space when needed (followed by zfs send/receive). Raid2 would have been an overkill. Raidz3 it only appropriate when you have to maintain uptime at all costs, including loss of performance.

This is not the case for a low use backup server.

Note, if one of disks develops bad sectors and you decide to replace it, that disks still provides redundancy for the duration of replacement, if secondary failure is what concerns you.

Here is some math to underscore what astronomical probabilities are we talking about

Let’s say you have 5 disks, no redundancy. Let’s say probability of any one drive failure is 2% every year.

When you have no redundancy, probability of data loss equal to 1 - (1-0.02)⁵ = 10%. (inverse of probability of none of the drives failing during that time)

When you add one drive for redundancy, for data loss to occur you must experience second drive failure within 1 week of first failure, during rebuild. Omitting calculations here, the result is 0.016%. Vanishingly small. I’d say, completely negligible, a far cry from no-redundancy case.

So, adding one parity driver gives you a great improvement. Totally worth it. This is raidz1.

Now adding yet another disk, probability of two disks failing within a week of a first disk failure comes out to be (I won’t bore you with derivation, it’s pretty straightforward): 0.000012%.

Horrible deal. You pay yet another cost of drive, pay in loss of performance, to make vanishingly small probability … also vanishingly small. Now we are going into territory of power supply failure frying all disks at once, house fire, etc.

Bad deal. Not worth it. This is raidz2.

You might object hat this math works for uncorrelated failures. True. But if the failures are correlated, who is to say that they won’t result in entire array dying?

Now, adding one more level of redundancy will add a bunch more zeroes after the decimal. I did not bother computing.

Another note — we consider complete drive failure. Just couple of bad blocks where disk is still usable during routine replacement sniffs the whole picture

On the other hand, 4TB is too small a size but it uses just the same energy as 20TB disk. Depending on where you live reducing number of drives can save a lot of money.

On yet another hand,

What’s the hurry then in going super fast just to sit idle for a week…

Anecdotally, I disabled my 10Gb nic to save some power and installed 2.5Gbps one instead.
I understand there is this urge to saturate everying all the time — but rationally, you don’t have to.

Post the graph of disk IO during transfer.

Lylat1an · May 11, 2025, 3:44am

1: That’s a seven-hundred gigabyte system image, mostly a single huge file.
I tried disabling sync and it didn’t give much improvement. Yes I trust my UPS.

2: I chose 8 disks because the tower has 8 hot-swap bays, and I found a good offer for ten 4TB drives.

I use RAID-Z3 because I hate data loss.

3: The sooner the transfers are complete, the sooner I can go about my day.

Edit: I noticed I could change the “Record size” for my pool: Changing it to 1M instead of the default 128k has increased the average write rate to between 6 and 7 hundred MB/s, with bursts over 900MB/s.

Perhaps one day I’ll add a SSD for the smaller files, but I think this will work okay for now.

saspus · May 11, 2025, 6:33am

If raid-z9 woudl be possible – woudl you use it?

The line shall be drawn somewhere. With raidz1, let alone raidz2, with so few drives, the probability of data loss due to disk failure is much lower than due to any other reasons, such as lighting caused power surge, house fire, or other unforseen circumnstances. Instead, if you dont’ want data loss – replicate everything to another offsite unit.

I’m not sure I follow. Why are you waiting for ~~paint to dry~~ backup to complete? Does not your backup run unattened at night? If not – make it run unattended at night.

Thiese are fake, of course.

You can increase it futher, for marginal improvements, but this won’t fix the root casue.

Even though you did not provide IOps stats, inspite of repeated requests, this obvervarion tells methe issue is inded IO bound, as expected with such a strange pool configuration.

So the culrpit here is you pool config, and solution would be to reconfigure your pool into two raidz1 vdevs, 4disks in each.

Constantin · May 11, 2025, 10:19pm

Your pool speed will increase with every additional, identical VDEV you add. Think of every additional VDEV as another stripe in a traditional RAID 0. The more stripes, the faster the pool. The downside is that if any individual VDEV fails 100% then the whole pool is toast too.

So, as one adds more and more VDEVs to a pool, it becomes increasingly incumbent to stay on top of disks going kaploink in the night. Is it insurmountable? Absolutely not. Disks fail so rarely now, the best approach IMO for a in-house server is to qualify some spares, set them aside, and wait for something to break before replacing it.

Some folk here like to replace stuff preemptively… I rely on copious backups and replication. But then again I run a Z3 also, am perfectly happy with it and prefer that extra cushion over running two Z1 VDEVs even if they would give me more data to play with and provide a faster pool to boot. 400MB/s is plenty fast for me and the 10GbE SFP+ port is built into the motherboard.