After upgrading TrueNAS Scale, my MainPool (RAIDZ1, 3x 4TB Seagate ST4000NM0035) won’t import. zpool import shows the pool as ONLINE with all devices present, but every import attempt fails with “cannot import ‘MainPool’: one or more devices is currently unavailable” (error code EZFS_BADDEV).
System Info:
TrueNAS Scale latest version (kernel 6.12.15)
Pool: MainPool, ID: 7508289598988259504
Configuration: RAIDZ1 with 3x 4TB drives (sdb, sdc, sdd)
All disks on pci-0000:03:00.0 SATA controller
What Works:
All disks are readable: dd if=/dev/sdb1 of=/dev/null works on all 3
Key Question: Why would zpool import -nFX succeed (dry-run) but actual import fail with EZFS_BADDEV when all devices are present and readable? Is this a known bug in recent TrueNAS Scale versions?
Data recovery is critical. Any help would be greatly appreciated!
Please show us the zdb -l output from all 3 disks. We have seen import failures caused by transaction differences between the member disks. Sometimes this can be over-come by importing using the common number.
Also, please describe how the disks are connected to the server.
This is likely the problems, (meaning more than 1 disk problem on RAID-Z1 prevents import):
sda1 (026760de...)
txg: 9248975
sdb1 (51880de8...)
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
sdd1 (04d45059...)
txg: 9272794
With a failed disk, AND the TXG differences between the only 2 surviving disks, the pool will not import.
Generally their are 2 causes for TXG differences:
Incorrect virtualization of TrueNAS. You still have not answered if this instance of TrueNAS is virtualized.
Hardware RAID, (or USB attached storage using poorly chosen chassis). Again you have not answered how the disks are wired to the server.
As for the fix, well, there are 23,819 ZFS write transactions difference between the 2 surviving disks. More than say 16 or 32 is bad. Over 10,000 could be fatal.
You could try:
zpool import -o readonly=on -fT 9248975 MainPool
But, be very clear, you are throwing out 23,819 writes and it is likely the pool will be corrupt. I’ve included the Read Only option just in case.
TrueNAS is running on bare metal, not in a virtual machine.
It is installed directly on an ODROID H4 Ultra.
All disks are connected directly to the ODROID (direct SATA connections). There is no hardware RAID controller, no HBA, and no USB storage involved — the disks are presented directly to the OS.
I’m not saying that the hardware itself is inherently bad or at fault, but from my perspective, this is not a particularly reliable platform for a NAS (some information online suggests that there may occasionally be issues with this platform in terms of drive handling and detection).
While the disks may be directly connected to the ODROID H4 Ultra, this is not the same as being attached to a native chipset SATA controller. AFAIK, the H4 uses a PCIe-to-SATA bridge (ASM1064 or similar…, somtimes poorly cooled…), which means all SATA ports share a single PCIe lane. This can lead to bandwidth limitations and potential I/O instability, unlike a standard motherboard SATA controller where each port is typically fully managed by the chipset.
Of course, it remains a question why it apparently worked before and only started showing problems after an upgrade.
Is there a recommended tool to safely rebuild the pool? Or software that can restore the files, since I know the data exists but the pool configuration is the problem.
Otherwise known as a “SATA controller”. The ASM1064 is old and an ASM11x4 would be better but should be at least OK-ish.
It could be a RAM issue with the N305.
Or any of the above on top of a drive failure…
Please report the results either way.
At 130 € and also running on macOS and Linux, UFS Explorer RAID Recovery could be a valuable alternative to Klennet.
From what I’m seeing, Klennet does not show me the exact folder structure, while UFS Explorer RAID Recovery shows it correctly. I am still waiting for the scan to complete.