Help! Is my data gone? Can't import pool after adding new drives and a failed scrub task

Hi. I’m hoping someone can give me some insight into whether I will be able to recover any of my data from my pool.

I’m running Scale 24.10.0.2 and my pool originally consisted of two 14 TB Western Digital Ultrastar DC HC530 drives in a mirror configuration. The other day, I moved my system to a new case and in doing so, I added two 2 TB Western Digital drives (1 WD Black and 1 WD Green). After the swap, I was hoping that I would just be able to create a mirror with these drives and import them into my pool, however, I was having issues accomplishing that. I can’t remember exactly what the errors I was receiving were, but when I would try wiping the drives, they alluded to the fact that they were “busy”. Needless to say, I paused any further troubleshooting efforts and left the drives as they were. Well, this Sunday, it appears that my scheduled scrub task started, and it ran into issues due to “IO failures” on these two drives. I will paste the alert that I received below:

New alerts:

  • Snapshot Task For Dataset “BupStorage” failed: cannot open ‘BupStorage’: pool I/O is currently suspended usage: snapshot [-r] [-o property=value] … @ … For the property list, run: zfs set|get For the delegated permission list, run: zfs allow|unallow For further help on a command or topic, run: zfs help

Current alerts:

  • Snapshot Task For Dataset “BupStorage” failed: cannot open ‘BupStorage’: pool I/O is currently suspended usage: snapshot [-r] [-o property=value] … @ … For the property list, run: zfs set|get For the delegated permission list, run: zfs allow|unallow For further help on a command or topic, run: zfs help
  • Pool BupStorage state is SUSPENDED: One or more devices are faulted in response to IO failures.
    The following devices are not healthy:
    • Disk 348045067173541495 is UNAVAIL
    • Disk 16627099099116621792 is UNAVAIL

When I went into the GUI the next morning, the scrub task was still running but stuck at 0%. I tried stopping the scrub through the GUI, but that also seemed to just be stuck without completion. Eventually, I decided to just reboot the system, and after doing so, the pool seemed to change from a state of “Suspended” to “Offline”. It then started notifying me that the previous alerts were “cleared” and the new alerts were:

Pool BupStorage state is OFFLINE: None
SMB shares have path-related configuration issues that may impact service stability: media: ACL type mismatch with child mountpoint at /mnt/apps: boot-pool/ROOT/24.10.0.2/mnt - OFF

This same error seemed to repeat for each of the various apps that I had installed on my system.

When running “sudo zpool list”, my pool “BupStorage” no longer shows up and only my “apps” pool and a pool named “boot-pool” appear. If I run “sudo zpool import”, it returns the following:

pool: BupStorage
id: 1033671487070951771
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: link to open zfs github
config:

    BupStorage                                UNAVAIL  insufficient replicas
      mirror-0                                ONLINE
        703dc698-531d-4041-933a-e7cc6fe23c1b  ONLINE
        d1022e29-d974-402b-bf09-8fb36f08bea4  ONLINE
      mirror-1                                UNAVAIL  insufficient replicas
        a26520ff-bd9a-4056-ad59-68ac2e593427  UNAVAIL
        6b66d189-b017-4003-8e1e-6aa80fcc62b9  UNAVAIL

admin@truenas[~]$

Mirror 0 corresponds to the original two 14 TB WD drives and Mirror 1 corresponds to the two 2 TB drives that I tried to add. Any time I try to run “zpool import -f BupStorage” or a similar variation, it tells me that no such pool exists and I will need to destroy it and recreate from a backup.

I’m at wits end and have tried everything that I can find to try with no luck. Do I have any options here, or is my data toast? Thanks in advance!

The command zpool import is showing that you have lost all disks in a Mirror vDev. This is BAD. Unless you can get either one back and functional, you more or less have lost your pool.

Any loss of a critical vDev, (like “mirror-X”), means complete data loss. However, other vDev types, like SLOG / LOG, CACHE / L2ARC or Hot Spares are not critical vDevs.

If you can’t get one of the “mirror-1” disks back to functioning, your only other options are these:

  • Restore from backups
  • Use Klennet ZFS Recovery
  • Use an expensive recovery service

The pool is not toast IF you can bring the mirror-1 vDev back online. If you can then perhaps the pool will just spring back into life or perhaps it may need some help recovering or perhaps it will still be dead. But with the entire vDev UNAVAIL that makes the entire pool unavailable.

So the question is WHY are these drives shown as no longer available? And the first thing to determine is whether Linux can see the drives and the ZFS partitions.

Can you please run the following commands and post the output of each command in a separate </> box:

  • lsblk -bo NAME,MODEL,ROTA,PTTYPE,TYPE,START,SIZE,PARTTYPENAME,PARTUUID
  • lspci

I was able to make some progress and (from what it seems right now) get things working again, albeit with the help of ChatGPT. Running the command sudo zpool import -f -m -d /dev/disk/by-id/ -R /mnt BupStorage brought the pool back online. Both mirrors are detected and seem to be operating normally. It still leaves me a bit confused, as the last time I worked on this, it didn’t seem like I was able to successfully create the mirror-1 vdev, and thus I didn’t think it would interfere with the rest of the pool like it did. However, after checking the history of the pool, it seems that the mirror-1 vdev did get created a couple of weeks prior, hence the reason the entire pool was dependent on this being online. I’m still not entirely sure what caused the behavior with the seemingly stalled scrub and the exported pool, but I am performing a scrub now and everything seems to be operating normally.

I appreciate the insight and willingness to help @Arwen and @Protopia. I’m relatively new to TrueNAS and ZFS, so there’s a lot I still don’t understand. Needless to say, this was a good learning experience, and I’m going to exercise a lot more caution next time I go to add drives to existing pools.

:cry: Hey @Arwen - it looks like we have just been the victoms of having our jobs taken by AI. :cry: