Pool lost after drive failed

Hi all,

I’ve been running TrueNAS Scale (Cobia) on an old HP PC for a while, with 2x storage drives in RAID 1. Recently I started getting errors on one of the drives, and then the whole pool has gone offline. The disks are still visible, with SMART test failing on the bad drive, passing on the second one.

It is showing VDEVs not assigned, and when I try Import Pool there is nothing in the drop down. However when I try the Pool Creation Wizard it says: “The following disks have exported pools on them. Using those disks will make existing pools on them unable to be imported. You will lose any and all data in selected disks.” so I haven’t gone any further as I don’t want to lose all my data.

Before I go out and get a new disk to replace the failed one, is there anything I can do to get it working on the remaining drive so I can at least do a backup?

DO NOT TRY ANYTHING WITHOUT GETTING EXPERT ADVICE HERE as issuing the wrong command may make things worse.

Can you open a shell and do sudo zpool status -xv and post the results here please.

It says “all pools are healthy” but when I do it without the -xv it only shows the boot-pool, no storage pool:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdc3      ONLINE       0     0     0

Sounds like you’re being prevented from importing a degraded pool (for safety reasons).

As a non-root user without sudo, does this claim any such pools can be imported?

zpool import

It says command not found if I don’t use sudo. If I do, this is what I get:

   pool: storage
     id: 16667583890468274134
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        storage                                   ONLINE
          mirror-0                                ONLINE
            72dda190-08f9-4174-b175-dd9dc9539c73  ONLINE
            c2967e0b-caa9-44f7-a864-3106593c9ec4  ONLINE

So it looks like it can see the storage pool, what next?

To recap:

You can’t even attempt to import the pool using the GUI, since it doesn’t even show in the drop-down.

You never had “upgraded” the pool, nor upgraded your version of SCALE before reverting back to Cobia?

Correct

I upgraded from Bluefin to Cobia a couple of months back and upgraded the pool at the time. But I haven’t upgraded beyond that or reverted back

Probably shouldn’t have done that, but it might be irrelevant.


What happens if you attempt to import the pool manually with the command-line?

Don’t “force” anything if it fails to do so.

zpool import -R /mnt storage

Tried it twice, nothing happened. Not even any message on the CLI

A successful import in the command-line will not output any messages.

But I assume you mean that when you check again with zpool list, it still does not appear?

It’s still only showing the boot pool:

NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
boot-pool   220G  6.43G   214G        -         -     2%     2%  1.00x    ONLINE  -

Does the storage pool have encryption enabled?

A further choice might be to add -Fn to determine if the pool could be recovered. But would like to know if it’s an encrypted pool, wait on doing the -Fn

No it’s not encrypted

Do you have a backup? I am taking it to be a no from the first post. You do always need backups, even if you were using Raidz3.

For me, the only thing left to do is a forced import. See if @winnielinnie has any other ideas first. I’d do the -Fn option first.

Pool feature upgrades are IMO best done only after your last TN upgrade is fully stable and you determine that you will never want to go back and before you attempt the next TN upgrade (so you don’t have too many pool features unavailable in the new release).

2 Likes

Yes - it certainly sounds like it is time to use the CLI to try to import the pool successfully. Once you have managed to import it successfully and made it work perfectly, then you can export it using the CLI and reimport it using the GUI.

I should have noted that the -Fn option will only test if it COULD be imported. If so, then remove the n.

I’ve made some progress, I have managed to import the storage pool using the CLI. I read somewhere I should then export and reimport using the GUI, is that correct? (I haven’t tried rebooting yet).

Anyway here is the output of sudo zpool status -xv

admin@truenas[~]$ sudo zpool status -xv
  pool: storage
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 1M in 06:36:22 with 0 errors on Sat May 11 04:43:51 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        storage                                   ONLINE       0     0     0
          mirror-0                                ONLINE       5     0     0
            72dda190-08f9-4174-b175-dd9dc9539c73  ONLINE       0     0     0
            c2967e0b-caa9-44f7-a864-3106593c9ec4  ONLINE       5     0     0

errors: Permanent errors have been detected in the following files:

There is also some weird behaviour, such as the Storage dashboard still showing 1 unassigned disk even though both disks are in the pool, and the main dashboard showing pool is offline even though it’s now showing up on the storage dashboard.

I also still can’t see the server folders on the network.

Another weird behaviour I’ve found: This is what the ACL screen looks like:

Yes,

The gui won’t be able to see or manage the pool and associated disks until you import the pool in the gui.

(Or you could import it with the right mount options)

The simplest solution is to export the pool again, and then you should be able to import it in the gui.