Pool offline all drive exported please help

Hi guys, I need a bit of help. My truenas scale 24.04.0 was running fine, I have a pool of 4 drives WD NAS 4TB on a RAIDZ1 pool.

Suddenly I received a message the 2 of the drives developed errors. I did happened before, usually rebooting the system will clear the error, I also done a SMART test on all drive and they all passed. But today was different, after reboot all drive were fine but marked as Exported and there was no pool.

I tried to run zpool import HDD but it was giving some IO errors, so I tried zpool import -f HDD it did import and the pool is back online but there is no data, all the apps disappeared and going to the disk I only see the user folder, no other datasets. But I see the drive usage is 3.5TB used, so the date is still there. Can anybody help me to recover the apps and data please.

Going to datasets I see all the datasets but when clicking on one I get this error “[EFAULT] Failed retreiving GROUP quotas for HDD/App_Data”

I can’t help you with your error message but if you post details about your hardware maybe someone else can.

You need to be thorough.
Model numbers of the HDDs used, motherboard, how the drives are connected, any PCIe-cards or adapters are of interest and so on. Full smart output from the drives would also be helpful, as would a zpool status HDD.

pool: HDD
state: ONLINE
scan: scrub in progress since Sun May 12 19:32:40 2024
2.49T / 4.35T scanned at 2.25G/s, 561G / 4.35T issued at 509M/s
0B repaired, 12.61% done, 02:10:25 to go
config:

    NAME                                      STATE     READ WRITE CKSUM
    HDD                                       ONLINE       0     0     0
      raidz1-0                                ONLINE       0     0     0
        73e8954f-fd43-4770-a5f8-78faa91fd6ee  ONLINE       0     0     0
        25087691-9aac-45c6-99a2-741fcda14a58  ONLINE       0     0     0
        f4d8b7aa-08e7-49b8-a06e-c0c1bca59d58  ONLINE       0     0     0
        c842d9a6-4ca8-4477-81aa-cae554d19506  ONLINE       0     0     0

errors: No known data errors

Server info:
Motherboard: Supermicro A1SRi X10
CPU: Atom Intel(R) Corporation
RAM: 32GB Kingston ECC
4x WDC_WD40EFZX-68AWUN0
BootPool: Kingston M2 NVME connected to a USB3 on the motherboard using a adapter
PSU: 250W 80 plusbronze

running zfs list -t all I get this, if it help

Your mount point looks wrong, it’s directly to the root instead of the expected /mnt/.
If it was in mnt previously, the new location would likely break all sorts of things, like smb, apps, and so on.

Edit: I’m in somewhat uncharted territory for me, but perhaps let the scrub finish, then export the pool and import using the GUI. Or add the altroot option when doing it in the shell: Source

So how do I fix this?

Is that the configured mountpoint? Seems weird. It should be /mnt/HDD, not /HDD
Can you run zfs get mountpoint HDD && zfs get mounted HDD and paste the output?

Can you also check if the directory at /mnt/HDD exists?

root@HomeServer[~]# zfs get mountpoint HDD
NAME PROPERTY VALUE SOURCE
HDD mountpoint /HDD default

root@HomeServer[~]# zfs get mounted HDD
NAME PROPERTY VALUE SOURCE
HDD mounted no -

I made an edit to my previous post with ideas on what the next step could be.

2 Likes

but if I go cd /mnt I get the HDD and if I go to cd HDD I don’t see any of the folders
root@HomeServer[~]# cd /mnt
root@HomeServer[/mnt]# ls
HDD
root@HomeServer[/mnt]#
root@HomeServer[/mnt]# cd HDD
root@HomeServer[/mnt/HDD]# ls
users
root@HomeServer[/mnt/HDD]#

Same conclusion, if we can see the IO error in TrueNAS would be useful.
I’ve got no idea why the mountpoint has been configured as /HDD, you can change with zfs set mountpoint=/mnt/HDD HDD and it should in theory automatically attempt to mount at that location, but it would be good to know why this happened in the first place.

1 Like

Ok, I let the scrub finish then try the zfs set mountpoint=/mnt/HDD ?

I should have all this

It looks like it’s complaining because the path /HDD doesn’t actually exist. If you try moving the mountpoint to /mnt it should be able to.

Before doing that though, try letting the scrub finish and exporting & reimporting the pool via the GUI as @neofusion mentioned.

Edit / Note:
Do not mess with pool configuration (e.g. mountpoints) via CLI as a solution, middleware will not be happy.

1 Like

OMG I run the zfs set mountpoint=/mnt/HDD HDD and now I can see all the datasets

the apps still missing, I guess I have to restart

2 Likes

It’s great to read that you can see your data again.

What worries me is the unknown event that got into this situation in the first place. You shouldn’t have needed to do a manual force import.

Do you have a backup? If not, now would be a good time to make copies of anything valuable.

1 Like

reboot the server, no more errors, and all the apps are coming back on line. THANK YOU 10000000000000000 times, if you were here I would kiss both of you, I was nearly crying.

All started with this message, but I don’t think the drive are bad. Id did the same few times and usually after reboot they go back to normal, and if I run a smart test on then they come ok. The drive are 2 year old and have 16201 hours, should I replace them?

TrueNAS @ HomeServer

The following alert has been cleared:

Pool HDD state is OFFLINE: None

Current alerts:

Device: /dev/sdb [SAT], ATA error count increased from 0 to 162.
Device: /dev/sda [SAT], ATA error count increased from 0 to 174.
Failed to configure kubernetes cluster for Applications: Missing "HDD/ix-applications/k3s, HDD/ix-applications/releases" dataset(s) required for starting kubernetes.
'boot-pool' is consuming USB devices 'sde' which is not recommended.
Failed to configure kubernetes cluster for Applications: Missing "HDD/ix-applications/releases, HDD/ix-applications/k3s" dataset(s) required for starting kubernetes.

You shouldn’t be ignoring I/O errors or ATA error counts increasing.

Even if the drives are healthy, this could be hinting at an issue with your cables, connections (data and power), HBA, power supply, etc.

I´m thinking maybe the connection board where the disk are connecting when inserted in the bay., I Weill try to connect the cables directly see what happens, but it only happened maybe once a month o less and usually after reboot they go back to been ok

Folks, folks, folks … don’t mess with the mountpoints. Please. Pools always have their mountpoint set to /poolname - but: TrueNAS imports pools with the altroot=/mnt property set. If you use the CLI instead of the UI you better know what you are doing,

So export and reimport from the UI. The middleware needs to know about your pool for apps and shares etc. to work.

4 Likes

Much appreciated advice. I hadn’t touched mountpoints in TrueNAS before so only really had prior knowledge of ZFS to go off of (and I did recommend exporting and importing via the UI before messing around in CLI…).

On that note though, any reason why this would be attempting to locate the mount in /poolname as opposed to via it’s altroot?

Edit: AH! Of course, I completely missed the fact he reimported the pool via CLI, so altroot would have been set to /!