Pool offline all drive exported please help

smic717394 · May 12, 2024, 5:51pm

Hi guys, I need a bit of help. My truenas scale 24.04.0 was running fine, I have a pool of 4 drives WD NAS 4TB on a RAIDZ1 pool.

Suddenly I received a message the 2 of the drives developed errors. I did happened before, usually rebooting the system will clear the error, I also done a SMART test on all drive and they all passed. But today was different, after reboot all drive were fine but marked as Exported and there was no pool.

I tried to run zpool import HDD but it was giving some IO errors, so I tried zpool import -f HDD it did import and the pool is back online but there is no data, all the apps disappeared and going to the disk I only see the user folder, no other datasets. But I see the drive usage is 3.5TB used, so the date is still there. Can anybody help me to recover the apps and data please.

Going to datasets I see all the datasets but when clicking on one I get this error “[EFAULT] Failed retreiving GROUP quotas for HDD/App_Data”

neofusion · May 12, 2024, 5:58pm

I can’t help you with your error message but if you post details about your hardware maybe someone else can.

You need to be thorough.
Model numbers of the HDDs used, motherboard, how the drives are connected, any PCIe-cards or adapters are of interest and so on. Full smart output from the drives would also be helpful, as would a zpool status HDD.

smic717394 · May 12, 2024, 6:08pm

pool: HDD
state: ONLINE
scan: scrub in progress since Sun May 12 19:32:40 2024
2.49T / 4.35T scanned at 2.25G/s, 561G / 4.35T issued at 509M/s
0B repaired, 12.61% done, 02:10:25 to go
config:

    NAME                                      STATE     READ WRITE CKSUM
    HDD                                       ONLINE       0     0     0
      raidz1-0                                ONLINE       0     0     0
        73e8954f-fd43-4770-a5f8-78faa91fd6ee  ONLINE       0     0     0
        25087691-9aac-45c6-99a2-741fcda14a58  ONLINE       0     0     0
        f4d8b7aa-08e7-49b8-a06e-c0c1bca59d58  ONLINE       0     0     0
        c842d9a6-4ca8-4477-81aa-cae554d19506  ONLINE       0     0     0

errors: No known data errors

Server info:
Motherboard: Supermicro A1SRi X10
CPU: Atom Intel(R) Corporation
RAM: 32GB Kingston ECC
4x WDC_WD40EFZX-68AWUN0
BootPool: Kingston M2 NVME connected to a USB3 on the motherboard using a adapter
PSU: 250W 80 plusbronze

smic717394 · May 12, 2024, 6:25pm

running zfs list -t all I get this, if it help

neofusion · May 12, 2024, 6:37pm

Your mount point looks wrong, it’s directly to the root instead of the expected /mnt/.
If it was in mnt previously, the new location would likely break all sorts of things, like smb, apps, and so on.

Edit: I’m in somewhat uncharted territory for me, but perhaps let the scrub finish, then export the pool and import using the GUI. Or add the altroot option when doing it in the shell: Source

smic717394 · May 12, 2024, 6:42pm

So how do I fix this?

essinghigh · May 12, 2024, 6:47pm

Is that the configured mountpoint? Seems weird. It should be /mnt/HDD, not /HDD
Can you run zfs get mountpoint HDD && zfs get mounted HDD and paste the output?

Can you also check if the directory at /mnt/HDD exists?

smic717394 · May 12, 2024, 6:48pm

root@HomeServer[~]# zfs get mountpoint HDD
NAME PROPERTY VALUE SOURCE
HDD mountpoint /HDD default

root@HomeServer[~]# zfs get mounted HDD
NAME PROPERTY VALUE SOURCE
HDD mounted no -

neofusion · May 12, 2024, 6:49pm

I made an edit to my previous post with ideas on what the next step could be.

smic717394 · May 12, 2024, 6:50pm

but if I go cd /mnt I get the HDD and if I go to cd HDD I don’t see any of the folders
root@HomeServer[~]# cd /mnt
root@HomeServer[/mnt]# ls
HDD
root@HomeServer[/mnt]#
root@HomeServer[/mnt]# cd HDD
root@HomeServer[/mnt/HDD]# ls
users
root@HomeServer[/mnt/HDD]#

essinghigh · May 12, 2024, 6:51pm

Same conclusion, if we can see the IO error in TrueNAS would be useful.
I’ve got no idea why the mountpoint has been configured as /HDD, ~~you can change with zfs set mountpoint=/mnt/HDD HDD~~ and it should in theory automatically attempt to mount at that location, but it would be good to know why this happened in the first place.

smic717394 · May 12, 2024, 6:53pm

Ok, I let the scrub finish then try the zfs set mountpoint=/mnt/HDD ?

I should have all this

essinghigh · May 12, 2024, 6:54pm

It looks like it’s complaining because the path /HDD doesn’t actually exist. ~~If you try moving the mountpoint to /mnt it should be able to.~~

Before doing that though, try letting the scrub finish and exporting & reimporting the pool via the GUI as @neofusion mentioned.

Edit / Note:
Do not mess with pool configuration (e.g. mountpoints) via CLI as a solution, middleware will not be happy.

smic717394 · May 12, 2024, 6:55pm

OMG I run the zfs set mountpoint=/mnt/HDD HDD and now I can see all the datasets

the apps still missing, I guess I have to restart

neofusion · May 12, 2024, 7:04pm

It’s great to read that you can see your data again.

What worries me is the unknown event that got into this situation in the first place. You shouldn’t have needed to do a manual force import.

Do you have a backup? If not, now would be a good time to make copies of anything valuable.

smic717394 · May 12, 2024, 7:07pm

reboot the server, no more errors, and all the apps are coming back on line. THANK YOU 10000000000000000 times, if you were here I would kiss both of you, I was nearly crying.

All started with this message, but I don’t think the drive are bad. Id did the same few times and usually after reboot they go back to normal, and if I run a smart test on then they come ok. The drive are 2 year old and have 16201 hours, should I replace them?

TrueNAS @ HomeServer

The following alert has been cleared:

Pool HDD state is OFFLINE: None

Current alerts:

Device: /dev/sdb [SAT], ATA error count increased from 0 to 162.
Device: /dev/sda [SAT], ATA error count increased from 0 to 174.
Failed to configure kubernetes cluster for Applications: Missing "HDD/ix-applications/k3s, HDD/ix-applications/releases" dataset(s) required for starting kubernetes.
'boot-pool' is consuming USB devices 'sde' which is not recommended.
Failed to configure kubernetes cluster for Applications: Missing "HDD/ix-applications/releases, HDD/ix-applications/k3s" dataset(s) required for starting kubernetes.

winnielinnie · May 12, 2024, 7:10pm

You shouldn’t be ignoring I/O errors or ATA error counts increasing.

Even if the drives are healthy, this could be hinting at an issue with your cables, connections (data and power), HBA, power supply, etc.

smic717394 · May 12, 2024, 7:12pm

I´m thinking maybe the connection board where the disk are connecting when inserted in the bay., I Weill try to connect the cables directly see what happens, but it only happened maybe once a month o less and usually after reboot they go back to been ok

pmh · May 12, 2024, 7:16pm

Folks, folks, folks … don’t mess with the mountpoints. Please. Pools always have their mountpoint set to /poolname - but: TrueNAS imports pools with the altroot=/mnt property set. If you use the CLI instead of the UI you better know what you are doing,

So export and reimport from the UI. The middleware needs to know about your pool for apps and shares etc. to work.

essinghigh · May 12, 2024, 7:19pm

Much appreciated advice. I hadn’t touched mountpoints in TrueNAS before so only really had prior knowledge of ZFS to go off of (and I did recommend exporting and importing via the UI before messing around in CLI…).

On that note though, any reason why this would be attempting to locate the mount in /poolname as opposed to via it’s altroot?

Edit: AH! Of course, I completely missed the fact he reimported the pool via CLI, so altroot would have been set to /!