Pool state "FAULTED" with "corrupted data", yet only one drive unavailable

eversio · April 23, 2024, 11:39am

Hello,

My TrueNAS SCALE had all its devices “drop out” of the storage pool after a clean shutdown / restart.

I had an electrician come by the house to do some testing so I shut down the NAS and unplugged it, when it came back it booted fine but the storage pool has “VDEVs not assigned” against all of the the devices in “Topology”. At the time I shut it down one of the drives seemed to be cactus and had been dropped out of the pool. The other three were looking fine on SMART tests and the pool was working.

The Storage Dashboard detects four disks as “Unassigned”. Under “Manage disks” three of them show up with “Exported pools (tank-1)”, the one that was unhealthy just has N/A next to it.

Version: TrueNAS-SCALE-23.10.1
Hardware: Supermicro 5028D-TN4T / 96GB ECC RAM
Boot disk: Samsung SSD 980 500GB M.2 drive
Disks: 4x shucked ?HC500? - WDC_WD180EDGZ-11B2DA0
Pool configuration: RAIDZ1 on the four drives, log VDEV on a 2nd partition of the boot disk (yeah I know…)
Self-encrypting drives is ON system-wide (i.e. for the data drives)

$ sudo zpool status -v
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Fri Apr 19 03:45:06 2024
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors
$ sudo zpool import
   pool: tank-1
     id: 6328939347888674582
  state: FAULTED
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        tank-1                                    FAULTED  corrupted data
          raidz1-0                                DEGRADED
            64991bef-b24b-432c-8e6d-75f2c22fec88  ONLINE
            41bf2cc9-4e04-4657-ac1c-8d6da82c7906  ONLINE
            1f2238c4-05bf-43d4-a833-893814439452  UNAVAIL
            cd29d5d9-1936-42fb-8e37-3938123b7faf  ONLINE
        logs
          nvme0n1p5                               ONLINE

So my questions are:

What exactly is this trying to tell me? The linked ticket essentially says “bye bye data, you have backups right?!” but I’d like to know how I got here - I have three working data drives, isn’t this enough?
What are the risks associated with forcing the import? Is there a chance of silent data corruption?
If I want to try recovering (I have a new disk on the way), what’s the procedure? Force the import then scrub the pool?
I have backups (using restic) from just before this all happened, should I be trying to recover this then check my data against the backup somehow? Or is it better to just abandon/rebuild the pool and go for backups? (I ran a check on the backup recently and it was fine - but I’d rather not tempt fate by destroying the pool and trusting the only backup if I don’t have to).

eversio · April 23, 2024, 11:50am

Oh, I forgot, dmesg shows this:

[359728.625383] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[359728.626323] ata4.00: irq_stat 0x40000001
[359728.627226] ata4.00: failed command: READ DMA EXT
[359728.628127] ata4.00: cmd 25/00:08:80:ff:7f/00:00:2f:08:00/e0 tag 2 dma 4096 in
                         res 53/04:08:80:ff:7f/00:00:2f:08:00/40 Emask 0x1 (device error)
[359728.629993] ata4.00: status: { DRDY SENSE ERR }
[359728.630905] ata4.00: error: { ABRT }
[359728.633727] ata4.00: supports DRM functions and may not be fully accessible
[359728.640867] ata4.00: supports DRM functions and may not be fully accessible
[359728.646788] ata4.00: configured for UDMA/100
^^^ this is repeated a few hundred times ^^^
[359728.647623] sd 3:0:0:0: [sdc] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[359728.648463] sd 3:0:0:0: [sdc] tag#2 Sense Key : Illegal Request [current] 
[359728.649393] sd 3:0:0:0: [sdc] tag#2 Add. Sense: Unaligned write command
[359728.650636] sd 3:0:0:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 08 2f 7f ff 80 00 00 00 08 00 00
[359728.651630] I/O error, dev sdc, sector 35156656000 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[359728.652264] Buffer I/O error on dev sdc, logical block 4394582000, async page read
[359728.652816] ata4: EH complete

/dev/sdc corresponds to the unavailable drive in the truenas UI.

Protopia · April 23, 2024, 1:42pm

I agree - this does look odd. With one drive down in a RAIDZ1 the pool should still be available.

At this stage I am not sure that you would have anything to lose by trying a import with -f. If there is a flag for read-only, I would personally also use that in order to preserve the readability whilst you take copy of your data.

If it comes online, once you have taken a backup you can then make it r/w and either swap out the failing drive and resilver or run a scrub on the remaining 3 drives first to see what happens.

eversio · April 23, 2024, 2:08pm

Thanks @Protopia, a read-only mount is a good idea.

I gave this a try and didn’t get anywhere:

sudo zpool import -f -o readonly=on
   pool: tank-1
     id: 6328939347888674582
  state: FAULTED
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        tank-1                                    FAULTED  corrupted data
          raidz1-0                                DEGRADED
            64991bef-b24b-432c-8e6d-75f2c22fec88  ONLINE
            41bf2cc9-4e04-4657-ac1c-8d6da82c7906  ONLINE
            1f2238c4-05bf-43d4-a833-893814439452  UNAVAIL
            cd29d5d9-1936-42fb-8e37-3938123b7faf  ONLINE
        logs
          nvme0n1p5                               ONLINE

I also tried with --rewind-to-checkpoint, since that option looks safe for read-only mounts, but I got the same result.
I see there is a -F option for recovery, I tried with -F -n to determine if it might work and the result was the same again. It seems like the pool might be toast?

I’m fine with pulling out the backups but at this point I’m puzzled as to why things aren’t working. My best guess is the “corrupted data” output is the clue but I don’t know how to dig further.

Protopia · April 23, 2024, 2:39pm

Since it says that the RAIDZ1 vdev is degraded rather than corrupted, my guess is that it is something to do with pending data on the log drive being inconsistent with the RAIDZ1 pool.

I am unclear why you have a (S)LOG drive at all - unless you have large quantities of synchronous writes (the S in SLOG stands for synchronous I think, and most I/O is typically asynchronous) the LOG device will not actually be used.

The command to import it readonly is (according to the documentation) zpool import -o readonly=on tank-1.

You can also try to import the pool without the log device using zpool import -m. I would imagine you can also combines these i.e. with zpool import -m -o readonly=on tank-1.

P.S. The -F flag is for the zfs clear -F tank-1 command as well as zfs import. According to the documentation the zfs clear command without the -F is to clear transient errors associated with a specific file (found when trying to access the file or with a scrub). The documentation for bad logs shows a different status entirely for a faulting log vdev.

Protopia · April 23, 2024, 2:54pm

Does zfs status -v tank-1 give any more info?

eversio · April 24, 2024, 1:51am

Thanks for your suggestions, and for the documentation links.

I am unclear why you have a (S)LOG drive at all

I added the SLOG drive to try to meet the TrueCharts System Requirements. Re-reading that, it seems like they want metadata on SSD as well, or the entire app pool. In hindsight I think mixing apps and data on the same pool is a bad idea, I bet that’s called out in the docs somewhere

The command to import it readonly is (according to the documentation) zpool import -o readonly=on tank-1 .

Attempting the import read-only:

$ sudo zpool import -o readonly=on tank-1
cannot import 'tank-1': I/O error
        Destroy and re-create the pool from
        a backup source.

Variants of this command also failed:

$ sudo zpool import -m -o readonly=on tank-1
cannot import 'tank-1': I/O error
        Destroy and re-create the pool from
        a backup source.
$ sudo zpool import -m
   pool: tank-1
     id: 6328939347888674582
  state: FAULTED
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        tank-1                                    FAULTED  corrupted data
          raidz1-0                                DEGRADED
            64991bef-b24b-432c-8e6d-75f2c22fec88  ONLINE
            41bf2cc9-4e04-4657-ac1c-8d6da82c7906  ONLINE
            1f2238c4-05bf-43d4-a833-893814439452  UNAVAIL
            cd29d5d9-1936-42fb-8e37-3938123b7faf  ONLINE
        logs
          nvme0n1p5                               ONLINE

P.S. The -F flag is for the zfs clear -F tank-1 command as well as zfs import . According to the documentation the zfs clear command without the -F is to clear transient errors associated with a specific file (found when trying to access the file or with a scrub). The documentation for bad logs shows a different status entirely for a faulting log vdev.

Interesting, that doesn’t appear in the man page for zpool-clear.

Does zfs status -v tank-1 give any more info?

zfs status does not appear to be a valid subcommand, I guess you mean zfs -> zpool.

$ sudo zpool status -v tank-1
cannot open 'tank-1': no such pool

I guess this is because the pool needs to be imported. I also tried some of the other commands in the 2nd link you posted; none of them can see the pool. Given this seems to be the blocker I went full YOLO:

$ sudo zpool import -FX tank-1

This is now busy doing… something. I’ll report back once it’s done. Thanks for all your suggestions so far!

NickF1227 · April 24, 2024, 2:04am

You just used the big hammers to get that to do something. For you database admins out there, effectively you’re replaying database logs up until the point of failure and stopping there. Some data may be lost.

I concur this problem probably must have been caused by uncommitted writes that existed on the SLOG and weren’t flushed to disk. You should probably investigate that avenue further as I doubt you want to be in a similar position again for a simple reboot?

eversio · April 24, 2024, 2:17am

Thanks Nick, and I appreciate you calling out the risks.

The backups are a day or two behind so some data will be lost either way - luckily this is a homelab NAS and a few days data lost is no big deal. I’m hoping forcing the import might be faster.

Is there some way to confirm uncommitted SLOG writes are the cause? Either way I see the risk now - I’ll certainly drop the log from the pool. There’s no battery backup on this device.

I tried getting an extra SSD into the system that I could dedicate to the app pool but the spare SSD I was using didn’t get picked up, probably a dud. I’ll try again with another one that’s freed up.

Stux · April 24, 2024, 3:31am

Thing is, this is precisely what a SLOG is supposed to avoid.

Of course, your SLOG device actually needs to implement sync writes correctly. Which a normal SSD will be super-slow with, unless it has PLP.

But I notice you don’t specify what the SLOG device actually is… Its not partition on your Samsung 980 boot device is it? Because that device is not a suitable device for SLOG.

EDIT:

Well… there ya go.

the Samsung is not a suitable SLOG device. If the performance were decent its cheating.
hacking the slog onto the boot-device is off the reservation.

I suspect a combination of 1+2 may have lead to the situation you’re in with a corrupted slog, leading to the pool corruption.

Probably would’ve been better to just disable sync.

eversio · April 24, 2024, 3:57am

Thanks @Stux, I appreciate the advice. I’ll be changing this once I get the pool back online (or recreate it).

Is there any way to interrogate the status of the SLOG with the pool not imported and confirm the theory? Or is this a case where the pool is corrupted because of (maybe) missing writes to the log and we can’t verify whether that happened or not?

Stux · April 24, 2024, 4:18am

Here’s an interesting post.

Doesn’t really discuss the issue at hand. But I’d be surprised if what’s written to a slog isn’t checksummed, and thus shouldn’t be capable of corrupting your pool on playback.

With no read errors on your pool other than the failed drive, there shouldn’t have been an issue.

Unless there has been another failure and it’s too far gone to say that.

eversio · April 26, 2024, 2:34am

Thanks @Stux, that helped me understand the SLOG a bit better.

The import finished some time in the last two days. I’m not sure how long it took in the end, I wasn’t watching it that closely.

$ sudo zpool status
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Fri Apr 19 03:45:06 2024
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

  pool: tank-1
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 05:40:56 with 0 errors on Sat Mar  2 22:35:53 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank-1                                    DEGRADED     0     0     0
          raidz1-0                                DEGRADED     0     0     0
            64991bef-b24b-432c-8e6d-75f2c22fec88  ONLINE       0     0     0
            41bf2cc9-4e04-4657-ac1c-8d6da82c7906  ONLINE       0     0     0
            11429662463470536091                  UNAVAIL      0     0     0  was /dev/disk/by-partuuid/1f2238c4-05bf-43d4-a833-893814439452
            cd29d5d9-1936-42fb-8e37-3938123b7faf  ONLINE       0     0     0
        logs
          nvme0n1p5                               ONLINE       0     0     0

errors: No known data errors

I’ve removed the log device:

$ sudo zpool remove tank-1 nvme0n1p5
$ sudo zpool status tank-1
  pool: tank-1
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 05:40:56 with 0 errors on Sat Mar  2 22:35:53 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank-1                                    DEGRADED     0     0     0
          raidz1-0                                DEGRADED     0     0     0
            64991bef-b24b-432c-8e6d-75f2c22fec88  ONLINE       0     0     0
            41bf2cc9-4e04-4657-ac1c-8d6da82c7906  ONLINE       0     0     0
            11429662463470536091                  UNAVAIL      0     0     0  was /dev/disk/by-partuuid/1f2238c4-05bf-43d4-a833-893814439452
            cd29d5d9-1936-42fb-8e37-3938123b7faf  ONLINE       0     0     0

errors: No known data errors

I’ve restarted the system, the pool is back. Weirdly it’s mounted under /tank-1 now whereas before it was /mnt/tank-1. I’ve kicked off a scrub, the mounts / apps can stay broken until that completes. I should have the replacement drive in a day or two.

Thanks to everyone for your advice!

eversio · April 26, 2024, 2:51am

The most recent snapshot is 2024-03-24, along with some files I was working on, suggesting at least 2 weeks of data went missing. Looks like a restore from backup is in order.

Stux · April 26, 2024, 3:03am

In a way that’s a good result. If you didn’t have backups… well 2 weeks lost is better than nothing…

Theoretically, if your backup is a pool, you could restore and it would rebase off that snapshot… if that snapshot was still present in the backup.

but at this stage I’m sure you’d rather create a new definately not corrupted pool.

Apollo · April 26, 2024, 4:54am

Theoretically, if your backup is a pool, you could restore and it would rebase off that snapshot… if that snapshot was still present in the backup.

That is actually a good scenario to be aware of. Though, not sure if this can be accomplished reliably. Replication (depending on the configuration) could ruin the pool itself and cause the loss of data prior to the 2 weeks event.

Replication can be detrimental sometimes and TrueNAS doesn’t provide a no fault or guarentee policy.

eversio · April 30, 2024, 4:02am

After a resilver + scrub, I did a full checksum of what’s on disk compared to my backups, and it turns out nothing of value was lost. Chalk one up for ZFS working hard to prevent me breaking shit.

Thanks again TrueNAS forums!

NickF1227 · April 30, 2024, 5:13am

Glad to see a happy ending here