Problem/Justification
OpenZFS supports much faster resilvers for mirrors (2-3x faster than normal), but it’s not the default behavior. You have to pass the -s flag to zpool attach or zpool replace. TrueNAS should provide a checkbox for this feature in the GUI, or even make it the default behavior.
More from docs:
Sequential reconstruction resilvers a device in LBA order without immediately verifying the checksums. Once complete, a scrub is started, which then verifies the checksums. This approach allows full redundancy to be restored to the pool in the minimum amount of time. This two-phase approach will take longer than a healing resilver when the time to verify the checksums is included. However, unless there is additional pool damage, no checksum errors should be reported by the scrub. This feature is incompatible with raidz configurations.
Impact
Benefits: 2-3x faster resilvers for mirrors, making it much less likely to lose a drive during replacement. Downsides: data is not checked during resilvering, so a scrub is automatically scheduled immediately afterwards. Normal resilvering does the “scrub” inline.
FWIW: according to ZFS dev Mark Maybee from the presentation linked above, the -s flag is always a good idea with a 2-way mirror because “verifying the data does not change the results.”
I’ve poked around in the docs and couldn’t find anything about sequential reconstruction being enabled by default, and it’s mentioned at all only in reference to dRAID. Any idea where you saw this? (I guess I could just try it out on my NAS, but I’m not sure how to tell what commands are getting issued under the hood.)
Maybe. However, I’d like to point out that according to ZFS dev Mark Maybee from the presentation linked above, the -s flag is always a good idea with a 2-way mirror because “verifying the data does not change the results.”
For other configurations, a checkbox option would make sense IMO.
@winnielinnie - The feature you are thinking about, is the normal LBA rebuild which does work with RAID-Zx, (and Mirrors). This is in contrast to the original design where it scanned the pool tree top to down. That could potentially take 2 times, or even 5 times longer because of the HDD seeks required.
This normal LBA rebuild does perform checksum validation, like most original design reads for ZFS. Performing read actions without checksum validation is a new thing, which I think started with RAID-Zx Expansion.
Just a quick note here from my thoughts/opinions on the above quote…not necessarily the request.
This is true. But the next time a scrub runs, theres a real possibility you could see checksum errors and have no idea if the brand new drives you just installed are having problems or not.
Would the error reporting not indicate which drive is encountering errors?
As I see it, with a 2-way mirror:
if the original drive gets a checksum error, but the data recovers, then the original drive has to be faulty (new drive has good data)
if the new (resilvered) drive gets an error, but the data recovers, then new drive has to be faulty (original drive has good data)
if either drive gets an error and there’s no recovery, then presumably the original drive got messed up before the new drive got added and finished resilvering
… which is immediately after resilver completes, so those issues will surface pretty quickly, yet there already will be another extra copy of the data.