Can't access my pool becouse all disks have been exported?

Solen · July 14, 2024, 4:32pm

Somehow all my disks inside my storage pool have been disconnected.

What happend was that I got a few notification of a degraded vdev a couple of days ago which happens once in a while and I couldnt do anything about it at the time. now I wanted to watch a movie with the kids and notice I couldnt access the movie so I went online and found vdev1 had faulted and vdev2 had degraded. So I hit ONLINE on the disks and nothing happend so I hit restart Truenas Scale and when i booted back up again I see 19 disks available and my pool “rust” is offline.

If i click on manage devices it’s empty

If i run zpool status I can see my pools except for the one in question “rust”

root@truenas[~]# zpool status
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:28 with 0 errors on Tue Jul 9 03:46:30 2024
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      sdg3      ONLINE       0     0     0

errors: No known data errors

pool: lightning
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using ‘zpool upgrade’. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:23:05 with 0 errors on Sun Jun 23 00:23:12 2024
config:

    NAME                                      STATE     READ WRITE CKSUM
    lightning                                 ONLINE       0     0     0
      mirror-0                                ONLINE       0     0     0
        7316ed89-6967-4555-9248-ce8327d63d19  ONLINE       0     0     0
        d34ff524-2e71-4d3b-a4de-b64e50d69bbd  ONLINE       0     0     0

errors: No known data errors

If i go into manage disks I am greeted with this screen which shows that my pool is exported?

I have no clue as to what to do, please advice thanks!

Davvo · July 14, 2024, 4:44pm

What is the output of zpool import rust and zpool online rust?
Hardware list please, because that’s a considerable amount of drives you have… and I guess you are not using an HBA to connect them

Solen · July 14, 2024, 4:46pm

zpool import rust
cannot import ‘rust’: no such pool or dataset
Destroy and re-create the pool from
a backup source.

zpool online rust
missing device name
usage:
online [–power][-e] …

Davvo · July 14, 2024, 4:48pm

You might need @HoneyBadger’s expertise in resurrecting what it appears to be a ghost pool.
What happens if you clear your browser’s caches?

Solen · July 14, 2024, 4:49pm

I am running a Dell R730XD with PERC 330 HBA

Then i have a 0J91FN Dell LSI 9300-8e 12Gb/s PCI-e HBA

which connects to a Norco case I have
AEC-82885t HBA to connect the rest of the drives to.

It’s been working just fine for a year

Davvo · July 14, 2024, 4:51pm

I assume you are running them in IT mode, or at least IR mode, right?

Solen · July 14, 2024, 4:51pm

It’s in IT mode

Solen · July 14, 2024, 4:52pm

nothing happens if i clear my cache

Davvo · July 14, 2024, 4:55pm

That’s good news, I was half expecting not to see your rust pool anymore in the WebUI.

Did you power cycle your server? If not I would try that first.
Wait for more experienced users, I have little ideas about your situation.

dan · July 14, 2024, 4:56pm

What’s the output of zpool import?

Solen · July 14, 2024, 5:00pm

root@truenas[~]# zpool import
pool: rust
id: 10465576058172428127
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: Message ID: ZFS-8000-5E — OpenZFS documentation
config:

    rust                                      UNAVAIL  insufficient replicas
      raidz2-0                                ONLINE
        e30428e3-af7e-47d1-b616-ab227cc25798  ONLINE
        ed2f222e-0d08-465b-a8e9-8ee4dc83ffcf  ONLINE
        a28464e2-4426-4370-b318-7662be9b88a0  ONLINE
        598ee8d1-4f56-4b9e-8c93-2b8e946160df  ONLINE
        6e60bb77-9bf6-4e89-8a1b-7b844ff68bbb  ONLINE
        8578902a-2b17-47bb-a22a-41cc83829a3b  ONLINE
      raidz2-1                                UNAVAIL  insufficient replicas
        e2867fd1-7e3a-4e58-a722-2b239cc94859  ONLINE
        777f5f58-68df-4042-b1aa-1aa6369b3812  UNAVAIL
        3a1823cd-e522-46de-b0b4-3ef5114f76e9  UNAVAIL
        034ea0db-75ab-4204-8621-1994a011a5de  UNAVAIL
        9d7d540a-bac4-46b5-bff1-291346d505a6  ONLINE
        2bb67cc8-5da4-49cc-85e3-bb3c0bed688e  ONLINE
      raidz2-2                                DEGRADED
        6a7d96bc-00a4-4668-b2a6-1461335a5ab8  ONLINE
        d2a84fa8-c66f-4e16-bc07-992642f863d6  UNAVAIL
        0f4a1b85-9fc1-4e25-b2d4-945e63bbfe86  ONLINE
        54d96e40-721e-4a76-8b04-d3d8102d953b  UNAVAIL
        11e442fb-49be-4a36-b148-a4d787166994  ONLINE
        6abbcc91-08d3-4d5c-8283-a74dc0ef26a6  ONLINE
      raidz2-3                                ONLINE
        3149626c-ca4c-4e38-9262-cf5bac29dd31  ONLINE
        375750f4-8acb-43ce-bd21-d8e58d811d07  ONLINE
        99edc0de-739d-4ff5-ad8c-370ea8a65345  ONLINE
        1068689c-dd25-4432-9555-2da0788c170e  ONLINE
        85e067be-4de5-408a-bd1e-c49c4acaec83  ONLINE
        7213e7b3-b7e4-4009-ade4-5883b0fddf6c  ONLINE

root@truenas[~]#

This is the output

Davvo · July 14, 2024, 5:06pm

You will have to bring one of the three UNAVAIL drives in raidz2-1 online in order to resuscitate that pool.

Solen · July 14, 2024, 5:06pm

Yes, how do I do this? Thanks

Davvo · July 14, 2024, 5:08pm

I would investigate how the drives are connected to the motherboard and to the PSU since that many drives becoming unavailable point to a different kind of hardware failure than the drives themselves giving up.

I would place my bet on the AEC-82885T.

etorix · July 14, 2024, 5:08pm

And with the full output we can now make sense of the above summary. There are five failed drives in two vdevs, exceeding what the pool can cope with. Check the cables and power. Check whether the drives are spinning. If you cannot bring back at least one of the failed drives in raidz2-1, the pool is lost—and all your data with it.

Solen · July 14, 2024, 5:10pm

So it’s a matter of physically removing the device and inserting it into another slot? Do I have to do anything else?

Davvo · July 14, 2024, 5:11pm

You have to troubleshoot: I would start from checking the drives’ last long smart test.

Solen · July 14, 2024, 5:23pm

Alright so it’s the norco backplane which is a bust. I rearrange the disks and now my zpool import looks like this. What do I do now?

root@truenas[~]# zpool import
pool: rust
id: 10465576058172428127
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

    rust                                      ONLINE
      raidz2-0                                ONLINE
        e30428e3-af7e-47d1-b616-ab227cc25798  ONLINE
        ed2f222e-0d08-465b-a8e9-8ee4dc83ffcf  ONLINE
        a28464e2-4426-4370-b318-7662be9b88a0  ONLINE
        598ee8d1-4f56-4b9e-8c93-2b8e946160df  ONLINE
        6e60bb77-9bf6-4e89-8a1b-7b844ff68bbb  ONLINE
        8578902a-2b17-47bb-a22a-41cc83829a3b  ONLINE
      raidz2-1                                ONLINE
        e2867fd1-7e3a-4e58-a722-2b239cc94859  ONLINE
        777f5f58-68df-4042-b1aa-1aa6369b3812  ONLINE
        3a1823cd-e522-46de-b0b4-3ef5114f76e9  ONLINE
        034ea0db-75ab-4204-8621-1994a011a5de  ONLINE
        9d7d540a-bac4-46b5-bff1-291346d505a6  ONLINE
        2bb67cc8-5da4-49cc-85e3-bb3c0bed688e  ONLINE
      raidz2-2                                ONLINE
        6a7d96bc-00a4-4668-b2a6-1461335a5ab8  ONLINE
        d2a84fa8-c66f-4e16-bc07-992642f863d6  ONLINE
        0f4a1b85-9fc1-4e25-b2d4-945e63bbfe86  ONLINE
        54d96e40-721e-4a76-8b04-d3d8102d953b  ONLINE
        11e442fb-49be-4a36-b148-a4d787166994  ONLINE
        6abbcc91-08d3-4d5c-8283-a74dc0ef26a6  ONLINE
      raidz2-3                                ONLINE
        3149626c-ca4c-4e38-9262-cf5bac29dd31  ONLINE
        375750f4-8acb-43ce-bd21-d8e58d811d07  ONLINE
        99edc0de-739d-4ff5-ad8c-370ea8a65345  ONLINE
        1068689c-dd25-4432-9555-2da0788c170e  ONLINE
        85e067be-4de5-408a-bd1e-c49c4acaec83  ONLINE
        7213e7b3-b7e4-4009-ade4-5883b0fddf6c  ONLINE

etorix · July 14, 2024, 5:27pm

19 unassigned drives out of 24 in this pool means that the 5 failed drives are not connected at all, so their SMART report is not available.
@Solen needs to map the drives to know what is where. To check the cables, as a loose or damaged cable could have taken four drives in a go. To check whether the falied drives are spinning—moving bays could be an option here.
And above all to be thorough because it seems there are multiple issues: at best, a failed cable and a failed drive; at worst, five old drives having reached end of life.

dan · July 14, 2024, 5:27pm

I’d suggest importing the pool through the GUI.