Hello,
My TrueNAS SCALE had all its devices “drop out” of the storage pool after a clean shutdown / restart.
I had an electrician come by the house to do some testing so I shut down the NAS and unplugged it, when it came back it booted fine but the storage pool has “VDEVs not assigned” against all of the the devices in “Topology”. At the time I shut it down one of the drives seemed to be cactus and had been dropped out of the pool. The other three were looking fine on SMART tests and the pool was working.
The Storage Dashboard detects four disks as “Unassigned”. Under “Manage disks” three of them show up with “Exported pools (tank-1)”, the one that was unhealthy just has N/A next to it.
- Version: TrueNAS-SCALE-23.10.1
- Hardware: Supermicro 5028D-TN4T / 96GB ECC RAM
- Boot disk: Samsung SSD 980 500GB M.2 drive
- Disks: 4x shucked ?HC500? - WDC_WD180EDGZ-11B2DA0
- Pool configuration: RAIDZ1 on the four drives, log VDEV on a 2nd partition of the boot disk (yeah I know…)
- Self-encrypting drives is ON system-wide (i.e. for the data drives)
$ sudo zpool status -v
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:04 with 0 errors on Fri Apr 19 03:45:06 2024
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0
errors: No known data errors
$ sudo zpool import
pool: tank-1
id: 6328939347888674582
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:
tank-1 FAULTED corrupted data
raidz1-0 DEGRADED
64991bef-b24b-432c-8e6d-75f2c22fec88 ONLINE
41bf2cc9-4e04-4657-ac1c-8d6da82c7906 ONLINE
1f2238c4-05bf-43d4-a833-893814439452 UNAVAIL
cd29d5d9-1936-42fb-8e37-3938123b7faf ONLINE
logs
nvme0n1p5 ONLINE
So my questions are:
- What exactly is this trying to tell me? The linked ticket essentially says “bye bye data, you have backups right?!” but I’d like to know how I got here - I have three working data drives, isn’t this enough?
- What are the risks associated with forcing the import? Is there a chance of silent data corruption?
- If I want to try recovering (I have a new disk on the way), what’s the procedure? Force the import then scrub the pool?
- I have backups (using restic) from just before this all happened, should I be trying to recover this then check my data against the backup somehow? Or is it better to just abandon/rebuild the pool and go for backups? (I ran a check on the backup recently and it was fine - but I’d rather not tempt fate by destroying the pool and trusting the only backup if I don’t have to).