CRITICAL Pool xxx state is OFFLINE: Help?

RickNils · January 18, 2025, 11:34pm

I’m new to truenas scale. I’ve just rebooted and my primary pool with several shares is offline after the last reboot. Would anyone be willing to help me to figure out how to get the pool back (without losing data, )???

I can see the disks that now show “N/A” in the Pool Column of the Storage Disks Display. The SMART test returns expected result. Are the disks okay?

There are two alerts:
First, this: New ZFS version or feature flags are available for pool n5x1x4TB…error CRITICAL
Second, this: Pool n5x1x4TB state is OFFLINE: None

Help? Can I restore this pool? Funnily enough, the windows shares that use this pool are still running, but they don’t actually share anything.

Arwen · January 19, 2025, 4:55am

Welcome to the TrueNAS forums!

Please start with full hardware listing, including disk manufacturer & model and how the disks are wired to the computer.

Next, please supply the output of the following commands, in code tags:
zpool list -v n5x1x4TB
zpool status -v n5x1x4TB

Last, in general don’t update your “ZFS version” unless you have a specific need or desire for a later feature.

RickNils · January 19, 2025, 5:25am

Thank you for your reply, Arwen of Rivendall!

zpool list -v n5x1x4TB
root@truenas[~]# zpool list -v n5x1x4TB
cannot open ‘n5x1x4TB’: no such pool
root@truenas[~]#

root@truenas[~]# zpool status -v n5x1x4TB
cannot open ‘n5x1x4TB’: no such pool
root@truenas[~]#

The drives are SATA HDDs, WDC_WD40EFAX-68JH4N1, 3.64 TiB
There are five drives connected to a pcie sata controller
Only four drives show up in the disk listing, all four show N/A in the Pool column

FYI
I did a full restart immediately before the pool disappeared. I can see only four of five drives that are / were part of the pool, n5x1x4TB. I’ve read here in this forum, I think in one of your replies to another post, actually, that TrueNAS does not automatically import pools with failed disks.

Could that be an issue? I see only four of five disks listed, so perhaps one has failed?

Thanks again,

Rick

Fleshmauler · January 19, 2025, 5:52am

SMR drives and a sata controller… ouchies. I’m hoping that you meant HBA.

Any chance at all you have available sata ports on your motherboard that you can use to see if drives are detected? A full list of your hardware; motherboard, nic, cpu, ram, the exact model of sata controller (link to it if you have to), etc. could be of help.

Quick & dirty tips - have you checked the physical connections? Any chance there is a loose power/sata data cable? Does your bios see the drives?

Protopia · January 19, 2025, 10:36am

What we need to diagnose this is detail.

Please confirm that TrueNAS is running bare metal and not virtualised under e.g. Proxmox.
Please run the following commands and copy and paste the output here (with the output of each command in a separate </> box):

lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
lspci
sudo sas2flash -list
sudo sas3flash -list
sudo zpool import

Thanks.

P.S. @Fleshmauler is absolutely right - your WD Red EFAX drives are SMR drives, and even WD state that these are completely un-suiitable for ZFS redundant drives (because their bulk write performance is terrible and during bulk writes the drives themselves or ZFS can timeout the drives and cause a ZFS drive error that can degrade or take your pool offline).

RickNils · January 19, 2025, 11:41am

The hardware advice will be helpful for the next build, thanks for that!
The system id bare metal, yes.

CLI returns:

root@truenas[~]# lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
lsblk: unknown column: START,SIZE,PARTTYPENAME,PARTUUID

root@truenas[~]# sudo sas2flash -list
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

    No LSI SAS adapters found! Limited Command Set Available!
    ERROR: Command Not allowed without an adapter!
    ERROR: Couldn't Create Command -list
    Exiting Program.

root@truenas[~]#

root@truenas[~]# sudo sas3flash -list
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

    No Avago SAS adapters found! Limited Command Set Available!
    ERROR: Command Not allowed without an adapter!
    ERROR: Couldn't Create Command -list
    Exiting Program.

root@truenas[~]#
root@truenas[~]# sudo zpool import
pool: n5x1x4TB
id: 5758399647352221700
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the ‘-f’ flag.
see: Message ID: ZFS-8000-5E — OpenZFS documentation
config:

    n5x1x4TB                                  FAULTED  corrupted data
      raidz2-0                                DEGRADED
        975fc7ef-ddf7-4646-8d17-5da8bda3fcef  ONLINE
        b8249431-6be3-4987-829e-eb2bf0181e98  ONLINE
        6c806227-d73a-4704-9836-3d97adb3c47d  ONLINE
        5b1c341f-fcf0-47fd-bb25-b2be247cd174  ONLINE
        ac55d93e-930e-436e-96f3-1ec323f7e8f9  UNAVAIL

root@truenas[~]#

neofusion · January 19, 2025, 12:24pm

The part I bolded was supposed to all be on one line, please try again without the line break.

The OpenZFS page linked to in that last zpool import command has this line:
The device listed as FAULTED with ‘corrupted data’ cannot be opened due to a corrupt label. ZFS will be unable to use the pool, and all data within the pool is irrevocably lost. The pool must be destroyed and recreated from an appropriate backup source. Using replicated configurations will prevent this from happening in the future.

Do you have a backup?
I wonder how the label could have been corrupted… bad RAM?

RickNils · January 19, 2025, 1:09pm

Unfortuantely I have only older backups. I will lose data, and be very sad if I can’t import the pool.

The pool was working before a recent restart. The cli return in our case includes, “The pool may be active on another system, but can be imported using
the ‘-f’ flag.”

I’m hoping that the import can be forced, so that I can make backups.

hmn, a restart is showing the pool now…

Farout · January 19, 2025, 1:37pm

I feel sorry, and I truely hope you recover some data. Maybe @HoneyBadger can help.

ZFS is a big boy filesystem, without training wheels.

You choose to use unsuitable hardware ( SMR drives and a sata port multiplier ), and unfortunately are paying the price.

If you recover your pool, back it up and choose proper hardware.

RickNils · January 19, 2025, 1:57pm

I think I’m okay. We’ll see in a few hours, the pool is online with two degraded/offline disks of six,

The WD40EFAX is SMR Recording Technology, according to the manufacturer’s specifications. It is marketed as a NAS drive, the whole WD Red series is supposed to be the best choice for NAS/RAID applications. I don’t see any mention of SMR as unsuitable for Z2. Amazing. How would I have known?

neofusion · January 19, 2025, 2:49pm

Are you running TrueNAS in a VM or something along those lines?

In short, for an uninitiated, it would be difficult to know this due to WD’s marketing.
After they snuck these types of drives into their Red line there was a big backlash and they eventually posted (among other things) this blog post while trying to spin it as something positive for the customer:

I personally voted with my wallet and stopped buying their products, period.

RickNils · January 19, 2025, 2:51pm

What drives do you recommend?

neofusion · January 19, 2025, 2:55pm

Recommend is a strong word here, but I rather like my Toshiba 18TB MG09’s.

Seagate is also fine I guess, just don’t get their Barracudas, as that’s the segment where they have their SMR drives.

RickNils · January 19, 2025, 3:00pm

I’m going to add the basic details of my curent TrueNAS hardware/NAS build, by editing my original post above. The point is moot, if the degraded pool lives long enough for me to make a copy of current data, then I’ll likely build a new NAS.

To answer your question though, no, the NAS is TrueNAS on bare metal. It’s an ATX mid tower case with an older, Gigabyte main board. I use two PCE8SAT-M01 VER0065, these are SATA expansion cards in PCIEx1 slots. Those seem to work extremely well.

I have two pools of 5 drives, one with the WD Red NAS drives, and one with 5 Segate drives

prez02 · January 19, 2025, 3:01pm

I think the EFAX - models where the first to use SMR, the ones before that were fine.

Protopia · January 19, 2025, 3:39pm

I think that this is not actually the case, because if they were working well you might not have needed to create this topic.

Weird - this command works perfectly on my TrueNAS Scale Dragonfish instance.

What version of TrueNAS are you running?

Protopia · January 19, 2025, 3:54pm

Exactly - how indeed? Shame on Western Digital for acting the way they did and indeed continuing to act the way they do i.e. not giving an explicit warning on the drives and on the packaging and in their marketing literature that they are SMR drives and unsuitable for ZFS.

I was aware of the SMR fiasco before I bought my drives, so I could have bought Red Plus but instead decided to do my small part in making WD pay for their actions by buying Seagate IronWolf drives instead.

I note that in the linked Blog post it says " While we work with iXsystems on DMSMR solutions for lower-workload ZFS customers…" and that strikes me as odd since iX are NOT directly responsible for OpenZFS and other NAS software (and non-NAS software like that lesser known Linux distro called Ubuntu) also use ZFS. Perhaps @kris could enlighten us as to whether WDC did indeed “work with iXsystems” and maybe tell us what choice words were spoken by iX when they were approached by WDC.

(And if iX didn’t actively “work with” WDC, does this blog post constitute corporate libel, and I wonder whether iX will request WDC to take it down?)

Fleshmauler · January 19, 2025, 3:59pm

I sadly have no relevant advice on data recovery other than validating physical connections & trying to see if drives work properly when directly connected to the motherboard. Reason being, even if some cli magic can bring the pool online, the underlying hardware & therefor chance to get data out if questionable.

If you don’t have enough sata connections on your motherboard, but have a second system, see if you can temporarily make a TrueNAS boot (usb is fine, this is just to copy data), move the drives over directly to motherboard, and see if that gives any joy.

As for port multipliers, they might ‘work great’ at first glance, but they generally set you up for failure. Frankly it’d have been better if they just immediatetly never functioned.

RickNils · January 19, 2025, 4:07pm

Thank you, (hopefully, non-literally?) Fleshmauler,
I have three of five drives online, and the pool was imported after another reboot. 'Long may it last. I’m copying data now to a sata drive, it is a desktop version drive also by Western Digital.

That is in an external drive bay connected via usb 3, so it will take several hours to copy just part of the data. For the rest of the data, I’m making a prioritized listing of directories, and I’m trying to prepare the rsync commands to copy data from the NAS to a local raid array on my local fedora machine. That will be quicker than the first directory copy, which is being copied via the usual nautilus file manager. If I can get everything copied to my local machine, then I’ll make more copies to various sata drives, that are kicking around. If I can get a few back ups made, then I’ll try to replace the failed drives in the TrueNAS pools. If I can repair the existing pools, then I’ll have copies on and off site, and I’ll have a working NAS. Then I’ll build two more NAS boxes/machines.

Fleshmauler · January 19, 2025, 4:24pm

It is what I eat! Everyone immediately goes to canabalism for some reason; I just like meat.

Happy to hear it came online & you’re actively making backups! Even happier to hear you’re rolling up some hardware for a new build man.

Hopefully all the transfers go through successfuly & this can be used as nothing more than a (mildly painful) learning experience.

When I first joined I thought everyone was too crazy conservative on best practices & generally a stick in the mud. After the years though, I found myself spreading the same advice as the old guard after learning the hard way myself…