Why Is an upgrade from RAIDZ1 to RAIDZ2 and from there to RAIDZ3 not possible?

TheColin21 · October 24, 2024, 5:08pm

Hi,

I know that I can’t add another disk of parity to a RAIDZ.
I’d like to understand if this is just something that hasn’t been implemented yet (like RAIDZ Expansion was) or if this is actually not possible and if so, why.

Can someone help me out?

dan · October 24, 2024, 6:08pm

I suppose in theory it could be coded in, just as RAIDZ expansion has been. But it’s never been there, and AFAIK it’s never been on the roadmap.

argumentum · October 24, 2024, 6:29pm

I will, for I love emojis

TrueNAS was simple. Have a GUI for ZFS. Just buy a box, fill it with drives and have a NAS.
If you don’t know how to choose the hardware, we at iX will do it for you.
If you want more reliability, throw another box w/drives.
With enough boxes you can have anything you need.

The problem begins with “so and so have this and that there, can you do it ?”, and the answer is yes.
So now the focus is to have an all in one that does everything, as people ask.
All this is my trend of thought before coffee

But the idea is the same as it was.
You’d have more than one box o’drives mirrored or what not, and destroy one box and rebuild it as you wish. It was never the idea to have just one box with no other copy of your data elsewhere. Still is not because, shit happens and … you need your 3,2,1 backups. So no, safety first.

Maybe one day, if the developers are bored because everything is just perfect, meh, let’s make variable Z a thing.

Stux · October 24, 2024, 8:59pm

It’s not implemented… yet.

Now that RaidZ expansion is here, it’s an obvious next step… after all as you grow a VDev by making it wider, you may want to increase its redundancy level too.

In theory, the same “reflowing” technique could probably be used…

Anyway, it’s technically difficult to implement, probably not impossible though, and who is going to pay to have it implemented as it’s not really a feature needed in “enterprise” scenarios.

So. Don’t wait for it, but maybe it will come in the next 5-10 years

bonox · October 24, 2024, 10:25pm

I may have misunderstood this but Z expansion is just adding some new data blocks - it doesn’t change the way the data is written across the vdev. This is also why an expanded vdev needs a rebalance to give full space and speed. If you don’t, you hamstring the vdev somewhat since you’re more reliant on many of the free blocks existing on a single drive.

But zfs doesn’t have dedicated parity drives like unraid or raid5 - the collection of data, checksum and parity data is spread across the whole width of the vdev making it complicated to just ‘add more redundancy’

It’s mathematically possible, but may not give you the same resilience as starting from scratch and spreading the collection of bits around the whole vdev.

Arwen · October 25, 2024, 1:18am

I actually thought about it, and adding RAID-Zx parity does not work without block pointer re-write.

Here me out.

In the simple case of a 4 disk wide RAID-Z1 and adding a 5th disk for 2nd parity, it may seem like it should work. And it does for 3 data column stripes, (and the 4th for parity). You can re-write the lower level block info and add a 2nd parity because the amount of data columns remains the same.

However, what if that same 4 disk wide RAID-Z1 had 2 file blocks, of 1 data column each. So adding a single disk for 2nd parity only adds 2nd parity to 1 file block. The other remains as single parity.

Somewhat hard to explain. But, if their does not happen to be free space next to such instances, so that a 2nd parity can be added, then you need block pointer re-write.

Without the ability to re-write block pointers, the orphan file block(s) that can’t take advantage of a new column for parity will remain 1 less parity than the others.

I even thought about how ZFS status should display the output;

  NAME           STATE     READ WRITE CKSUM
  tank           ONLINE       0     0     0
    raidz12-0    ONLINE       0     0     0
...

And for RAID-Z2 to RAID-Z3, it would be “raidz23”. Naturally whence the “conversion” is complete, the pool vDev type would change to more normal “raidz2” or “raidz3”.

In someways this can still work but with manual effort. If their was a program to list the files still “stuck” on 1 less parity, the SysAdmin could manually copy them which would force the new write to use added parity.

Anyway, my 5 dollars worth of thought, (payable to KoFi).

etorix · October 25, 2024, 8:49pm

RAID5 does not have dedicated parity drives. That was RAID3 (and that’s why 3 was totally superseded by 5…).

Translation for novice readers: It will NOT happen.
“Block pointer rewrite” is ZFS lingo for an essentially impossible problem.

argumentum · October 25, 2024, 9:03pm

…I love chit-chat.
If there is space in those drives for the new setup, the new setup ( Zx ) can be written to the unused parts of the new setup, moving blocks as if to new drives.
Once is al done, replace the old tables with the new ones that were written in an temporary place. Recalculating the whole thing as many times as the free space allows.

I don’t see it as impossible but do see it as impractical.

Also, I would not like to have this feature at the constant expense of bloated code. Again, impractical.

Therefore, can it be done: yes. Should it be done: never

Arwen · October 26, 2024, 3:24am

I did have some thoughts on this as well. Not exactly bloated code avoidance, but memory bloat.

Some new features that are either not currently used, like extra checksum algorithms or compression algorithms, could be setup as loadable modules. Both user side SO, (Shared Object), which are able to be loaded and unloaded as needed. And as loadable kernel modules if needed too.

Right now so much of ZFS is loaded in memory that most people don’t need. Other features are transitory, like a scrub, disk resilver or RAID-Zx column add.

Whence those are done, why not reduce memory foot print and unload the code with associated data?

I am not saying that this is a panacea for code bloat. Nor solve memory problems. But, making OpenZFS code leaner in memory makes room for other uses of that memory.

Remember, it is probably unheard of for a single server to have every compression, checksum and other features actually in use at the same time.

I mean who really uses “lzjb” or “zle” compression today?

argumentum · October 26, 2024, 3:50am

that would be, code wise, as “Case of” and it would jump to the case and use that. Computationally is not expensive. In machine code is a jump to here or there. Removing that, … I don’t know. If it works, just leave it there.
Memory wise, nowadays with fast gig sizes ( vs. mb sizes of old ) I don’t think is a big deal.

If they had to fit it all in a 1.44 floppy, yes, remove it. But, …my point was along the lines of, does it make practical sense to do this or that, and regarding on-line / on-the-fly change of the ZFS, it does not, to me. ( not that I wouldn’t welcome the availability )

Now, if implemented as an off-line operation then yes, in my view. Write some utility that, with the pool off line, does it’s magic ( for as long as it will take, days or weeks ), then that I believe would be welcomed by the community of homelabers.
“I had 5 drives, my new rig can fit 4 and I need to…”, whatever. I do see a group of people wishing for such thing even if as off-line. My 2 cents, am not a developer

argumentum · October 26, 2024, 4:14am

…you know what, it can be done and it would not be as “expensive” as I said but would have to be coded and supported ( as in this niche scenario brakes it ).
I don’t know ( thinking on my feet here ). Would you as a company bring such feature knowing that each feature is a pain to support/maintain ?
So there is when the money people figure out if the bet, ( because it’s a gamble ) would not exceed available human resources. Meaning: are we gonna loose money by doing this ?, because if the risk is great, better not even hint it.

Again, love chit-chat
Don’t take me too seriously. Just evaluating scenarios.