Data recovery of pool and questions about misconception TrueNAS SCALE

Davvo · April 20, 2024, 11:53am

That’s not how ZFS works: with a few exceptions, losing a VDEV means losing the entire pool.

Data recovery on ZFS is an order of magnitude more difficult compared to other file systems and, as such, very expensive. As far as I am aware, you can only look at Klennet ZFS Recovery.

Your best bet here is to bring the failed VDEV back online, if possible: this means resurrecting one of the two drives from the dead; if you have wiped them however there is little we can do: can you please tell us step by step what you did?

I suggest you reading the following resources in order to increase your understanding of ZFS:

Now, about the specific issue… I always reccomend running https://memtest.org/; a memory error resulting in kernel panic or just a straight crash is a suspect.

Generally, we advise against running overlocks or underclocks for stability reasons, but it’s not a ZFS-specific reccomendation.

About reccomendations for the current pool, without knowing your use case there is nothing much we can say about the pool’s geometry… as far as we understand your data loss is a result of your own actions; generally, unlike with HDDs a 4-wide RAIDZ1 SSD VDEV is considered safe. You could post sas3flash -list to check if something is wrong with the HBA, but I think we won’t find anything there.

If you want to use SSDs to speed up your HDD pool, you want to use a sVDEV^[1], specifically a metadata VDEV: if you search either forum you will find plenty of material (alongside fusion pools) but to make a quick recap:

if you lose the metadata VDEV you lose the entire pool, so you want the same level of redundancy
it can drastically improve your HDD pool performanceduring traverse operations, especially useful with macOS
it can be used to store data which has a block size smaller than an amount you set, allowing your drives to not hit the IOPS celing

Generally, you want to explicate your use case in order to receive better help. For VMs and virtualization in general (both of TN’s and other VM) there are a few resources I linked to above that you should have read before building your current system: especially one is critical knowledge you must know if you virtualize TN itself as you are doing.

SLOG is needed with syncwrites if you have an HDD pool.
L2ARC is a read cache that requires TN to have at least 64GB of RAM at its disposal.

Oh, and if you aren’t already using it consider an UPS for your system.

special VDEV ↩︎