Trying to get a pool back online

The name of the pool is ‘backup’. After a total reboot of the Truenas server and the EMC KTN STL3 the pool went offline and said two of the three disk were N/A. The reboot happened due to a power failure. I have everything set up on battery backup and the server monitors it so that when we lose power the servers detect that and shut down. I don’t know any way of shutting down the EMC but the Truenas server was down before the battery ran out and it lost power.

Anyways after power returned and everything came back up my other pool was fine but the one labeled ‘backup’ was offline. I looked at the disk list and all disk were present but two of the three were marked N/A instead of ‘backup’. I’ve tried rebooting the Truenas server with no change. I’ve ‘exported’ the pool and unchecked the destroy data and delete saved configurations in the GUI. When I try to import from the GUI it doesn’t show up so I went to the shell.

With ‘zpool import’ I get:

pool: backup
id: 5273489763793982778
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: Message ID: ZFS-8000-5E
config:

    backup                                    UNAVAIL  insufficient replicas
      raidz1-0                                UNAVAIL  insufficient replicas
        e5e1b608-6603-47aa-972f-d62b59af1257  ONLINE
        8da7254a-d292-4224-931e-e562e7f9e506  UNAVAIL
        9da4b257-8128-4e6b-9078-b47e1db9a367  UNAVAIL

And with ‘zpool import -f -F -R /mnt backup’ I get:

cannot import ‘backup’: no such pool or dataset
Destroy and re-create the pool from
a backup source.

Sorry if things look weird, it says I can’t include media or links in my post. Any ideas?

You need to check on all connections and attempt to run SMART long tests on the drives. We need at least two of the drives showing

Full description of your hardware, OS Version and how everything attaches may help. Need to figure out if two drives died due to power, HBA or cables affected, etc.

You can expand the Details section in my sig to get a general idea of the kind of info to post.

I’ll work on getting those SMART long test done.

As far as what I’m running I’ve got Truenas Community Edition 25.10.1 installed on a Proxmox server in a VM. The server is a Dell R630 with a Xeon E5-2640 v4. 96GB of RAM total. I’m giving Truenas 16GB of that and 4 cores. I’m using a LSI HBA SAS 9207-8e to interface with the EMC KTN-STL3. Connected with a SFF-8088 external mini SAS cable. The three drives I’m using for this pool are Seagate Enterprise Capacity 3.5 HDD, model ST10000NM0016-1TT101, 10TB. They are SATA and not SAS. I think I bought specific adapters to plug them into the backplane of the EMC but it’s been a couple of years since I set this up. I believe the adapters I bought were supposed to be able to do SAS and SATA both. The pool was working just fine until the shutdown and reboot so the adapters were definitely working.

Also thanks for the tutorial tip you DM’d me. Got that done and did they really remove the SMART test from the GUI….

Well that’s going to take around 13 hours so I’ll get back to you sometime tomorrow. :laughing:

Are you passing the entire HBA controller to TrueNAS and is it blacklisted from Proxmox? If Proxmox touched it, it can be a problem since both use ZFS

I know I am passing it through to the Truenas VM. I’m not sure about whether I blacklisted it in Proxmox. It’s been so long since I set it up. When I go to ‘Hardware’ on the Truenas VM it has it listed as 'a ‘PCI Device (hostpci0) 0000:81:00,pcie=1’.

When I run ‘lspci -v’ on the host machine I get the below output:

81:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
Subsystem: Broadcom / LSI 9207-8e SAS2.1 HBA
Flags: bus master, fast devsel, latency 0, IRQ 89, NUMA node 1, IOMMU group 12
I/O ports at 8000 [size=256]
Memory at c8040000 (64-bit, non-prefetchable) [size=64K]
Memory at c8000000 (64-bit, non-prefetchable) [size=256K]
Expansion ROM at [disabled]
Capabilities: [50] Power Management version 3
Capabilities: [68] Express Endpoint, MSI 00
Capabilities: [d0] Vital Product Data
Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
Capabilities: [100] Advanced Error Reporting
Capabilities: [1e0] Secondary PCI Express
Capabilities: [1c0] Power Budgeting
Capabilities: [148] Alternative Routing-ID Interpretation (ARI)
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas

I did the tutorial but still can’t post links or media I guess.

Sorry it’s been like 18 months since I set this up and this is the only issue I’ve had with it so far so I haven’t revisited it since then. My other pool that I use for media files has never had any issues besides a couple of drive failures but I bought those drives used and also had some extras. Because of that I never dug into the issue at all. I was able to swap them out with no problems. I have like 10 4TB HGST SAS drives in that pool.

Taken from the proxmox forums:

You have to add 2 lines to /etc/modprobe.d/blacklist.conf:

softdep mpt3sas pre: vfio-pci
options vfio-pci ids=1000:00ac

(options needs the correct id of the hba, i guess)

and one has to run

update-initramfs -u

for your active kernel, otherwise it won’t be active.

So you could check that file to see if you have blacklisted it, and apparently it should not be visible during boot and on the pve disk tab.

Both drives passed the long test without any errors. They both have around 21,000 hours on them which makes sense. I bought all three at the same time and used them all together in another setup initially before moving them to the EMC.

Can you give me a link to the forum post you used?

Here you go!

I’m not sure what to try to check. I’m not familiar with Proxmox nor EMC. Are all three disks physically showing in TrueNAS for your BACKUP pool? I am wondering if TrueNAS is even seeing them. Not sure if UNAVAIL means physically missing. Maybe check the DISKs tab in the GUI?

All three disk are listed when I look at ‘Disks’. The first one shows ‘backup (Exported)’, the other two show ‘N/A’.

I am going to give some time to see if others respond. I’ll check back in a day. I just don’t know the logical next step to try. If no help by then, I will try to get others to check in

1 Like

Just watched the VM boot and noticed these error messages:

sdc and sdd are the disks in question. Don’t know if this helps.

What did the smart test return?

You said you had a power outage and truenas safely shutdown but not the whole server, right?

So the disk will still have been powered, until the battery ran out, so maybe that could have damaged the disks, or maybe proxmox was trying to access them in that moment?

So I have Truenas running in a VM on a Proxmox server. The server gets a signal from the battery backup when the power goes out and it starts a total shutdown. So everything was shut down with the exception of the EMC because I don’t know how to tell it to shut down, not for sure you can. The only thing running when the battery backup died would have been the EMC so nothing would have been reading or writing to the disk though they would have had power.

Both disk returned no errors on the long SMART test.

1 Like

If given proper hardware, ZFS was specifically written to be crash & power loss safe. There is so little chance of loosing data due to a power loss with ZFS, that it is not worth mentioning, UNLESS you virtualize or use hardware RAID.

I wrote this on the subject:

Ah ok, then I misunderstood what you were saying.

I think I may have something. I got tired of trying to restore the pool so I decided to wipe the drives. I wiped the one that showed up in the pool and then went to wipe the other two and couldn’t due to an input/output error. So I started checking the drives and ran smartctl -x on all three. Save the output and then compared them.

The ATA security on the working drive reads as Disabled. The ATA security on the other two drives read as ENABLED, PW level HIGH, LOCKED [SEC4]. Is this why I’m having this issue? Did the security lock after the reboot and that’s why I can’t do anything with the data?

If so, how do I disable this? Everything I’ve found says to go into Advanced Settings and go to SED but it’s not listed there at all. I’m using TrueNAS Community Edition if it makes any difference.

sdo.txt (10.9 KB)

sdc.txt (10.9 KB)

sdbWorking.txt (14.3 KB)

Edit: Looks like I need to use openSeaChest_Security to unlock it. I’m not sure what the password would be though. Will it be my admin account password or some sort of Seagate default password. I don’t remember choosing to encrypt the drives and the one drive doesn’t have security enabled at all so I feel like this was something that was triggered by the power failure.

Maybe you could try an older boot environment, but that might not work due to feature flags or try sedutil-cli.