Extended vdev, but no extra space

nicofar · November 13, 2024, 3:08pm

Due to delivery issues with some of the disks in my array I started my pool with a raidz1 with 3 disks (18TB Seagate Ironwolves).

When I received the last 3 disks I extended my vdev 3 times, one for each disk. Each extension was successfull and a scrub was completed after each one.

But TrueNAS dashboard is still only showing available space for part of the disks, 65.29TB:
2024-11-13 15_57_50-TrueNAS - 192.168.13.1 — Mozilla Firefox

On the other hand, running zpool list directly on the machine shows the correct amount of total, and free, disk space: 98.2T total and 81.3T free:

How do I get the total capacity and the total size to play along (98.2TB vs. 65.29TB), and also FREE vs. Available (81.3T vs 54.05TB)?

Arwen · November 13, 2024, 8:40pm

We have been discussing RAID-Zx expansion in other threads.

It appears their are 2 problems with this new feature. The one you list which seems to be a GUI problem. Meaning iX probably can fix that. You might check and see if their is a bug for it already filed.

The other seems to be an OpenZFS bug which does not calculate the correct parity to data ratios correctly, for available space. This is likely an OpenZFS bug and upstream would have to fix it, (though iX does help with OpenZFS development). That means it will likely take longer for this to resolved, (after they figure out what the “correct” resolution is).

On another subject, many people think RAID-Z1, (or RAID-5 for non-ZFS), with disks larger than 2TB is risky... You have 18TB disks, and now 6 of them in a RAID-Z1. If you have backups or understand the risk, it's your call.

nicofar · November 13, 2024, 9:27pm

I was aware that there were differences in the risk for Z1-Z3, but I did not know that there were also differences when it came to disk size - other than the obvious that we are talking about vastly more data with 18TB compared to 2TB.

I’ll go and see if a bug has been filed regarding my issue. Hopefully the pool itself turns out being ok.

Edit: There does not seem to be a bug report with this issue, and I don’t have rights in the “project” to make a bug report.

Stux · November 13, 2024, 9:42pm

nicofar · November 13, 2024, 10:27pm

I see. I think my best bet to get this resolved is to move the data and recreate the pool.

Arwen · November 14, 2024, 2:34am

The issue is that their is a chance of an unrecoverable read error, URE, occurring during the disk replacement. It’s not a hard number in concrete, but the estimate is that it can happen with 2TB and larger disks due to the amount of data to re-sync. Especially if the RAID-Z1 or RAID-5 has more disks, or if the RAID-Z1 uses more data in the vDev.

This URE would be on one of the other RAID-Z1 disks, AND the disk being replaced has been removed completely. Since you have only 1 parity, ANY used failed sector on the source disks can lead to data loss. Most consumer hard disk drive have specified probably UREs at 2^14, (or was it 10^14?), or something like that for the statistical probability of encountering a URE. That gets REALLY noticeable for HDDs larger than 2TBs.

One way to reduce the chance of the problem occurring, is to do a replace in place. If the disk to be replaced is not fully failed, and you have another disk slot, you can install the replacement disk before you remove the failing disk.

This allows ZFS to use both the failing disk’s good data, and any other redundancy it has, like RAID-Z1 parity. Or metadata extra copies. In essence, ZFS creates a temporary Mirror between the failing disk, and it’s replacement. Whence the replacement is fully re-silvered / re-synced, the failing disk is removed from the pool. Then the user can remove it from the server.

Note that when I say “ANY used failed sector on the source disks can lead to data loss”, if the failed sector can be re-read successfully, no problem. Or if it has extra copies, like with “copies=2”. And by default, all Metadata has more than 1 copy, even on RAID-Z1. So again no problem. But, Metadata is generally a very small percentage of the used storage, so rare statistically speaking.

Last, ZFS has one more trick to handle such events, telling you which file(s) is / are bad. Then allowing you to restore that file(s), without having to do anything else to the pool. A normal RAID-5 that experiences a URE during disk replacement is screwed, Likely it is full restore time.

etorix · November 14, 2024, 8:00am

That is typically given as “less than 1 in 1E14” bits, whatever “less than” may mean.
So p ≤ 10^-14.
NAS/enterprise HDDs may be rated for “less than 1 in 1E15”, while SSDs are “less than 1 in 1E17”.

As a comparison, 12 TB = 12 *10^12*8 bits = 0.96*10^14 b.

nicofar · November 14, 2024, 8:35am

Does that mean that any used failed sector can lead to complete failure of rebuild, or does it mean that any used failed sector can lead to a failure to rebuild the specific file that has the failure?

etorix · November 14, 2024, 11:05am

For classical RAID it would be total array loss. For ZFS it is “only” loss of the specific file, but still an annoyance and something to take into account if you think that raidz1 fully protects against the complete loss of one member drive.

nicofar · November 14, 2024, 6:56pm

Happy to hear. I’m planning to keep up 1-2-3 for important data, and even then, for family photos missing one or two files will not hurt a lot anyways.

The majority of my data is not unique, and I will be able to get it back if I need to. It’s mostly kept locally either as part of a 1-2-3 backup itself, or just for convenience.