Woke up this morning to a failure!

The errors seem to indicate that the root block pointer, rootbp, is damaged. Later it mentions labels are damaged. Neither is good because a RAID-Z2 should be able to survive 2 disks of failure, and even more redundant metadata failures without data loss.

Some of the ways for extreme corruption like this to occur:

  • Hardware RAID controller which was doing elevator writes during a power loss
  • Using TrueNAS as a VM without proper disk controller pass through
  • Serious memory faults

In your case, it probably is the hardware RAID controller. But, you can check the firmware on the boot time messages. If it says something like MegaRAID and not IT, then it’s hardware RAID firmware.

Yes and no. If the problem is on the disks, regular ZFS scrubs can detect and fix it. But, if the corruption is happening during writes, like out of order writes, and then a power loss occurs before the final write is complete, well, bad things happen.

ZFS purposefully writes data and metadata in a specific order, using copy on write to free space. This means until the very last write, the changes are not active. However, an out of order write, like with hardware RAID card doing elevator writes, can activate garbage because the highest level block pointers, points to not yet written directory entries. And during a power loss, those lower level directory entries remain garbage.

Now a single power loss with out of order writes probably should not corrupt a pool. ZFS should be able to roll back transactions to something that is good. However, we always recommend rock stable hardware, (aka no hardware RAID controllers), just to avoid things like this.

You could try this below, and see what it has to say:

zpool import -FfXn ISCSI

Or you could look at all the disk labels using something like below. Then maybe temporarily remove a disk or 2 that appears radically different from the others. Mostly the first txg: number should be the same for all disks.

zdb -l /dev/sdX

To clarify for all, including future readers, ZFS should never corrupt a pool on power loss:

1 Like