I know that redundancy does not guarantee data safety and neither RAID is a backup and there should be multiple copies of important data (3-2-1). When it comes to pool layout, i know that Z1 is quite risky if a disk fails and during the resilver/pool rebuild time after the drive replacement, it can stress out the drive. This is totally understood for the spinning drives but does it still apply to the SSDs as well?
In short, i’m talking about reliability of Z1 between HDDs pool and SSDs pool. Any guidance would be highly appreciated.
I would say that depends on the size and the quality of the SSD.
A RAIDZ1 made of enterprise grade SSDs is probably more ok than one made of
no name SSDs from aliexpress.
The question you have to ask yourself is: are you ok with no redundancy after 1 drive fails or not?
It’s all about the pool being vulnerable during the (possibly lengthy) resilver after losing a member, so it applies irrespective of the kind of drive.
But SSDs would resilver (much) faster, and typically boast lower URE rates than HDDs, so the risk window is shorter.
Beside your tolerance for risk, the question is (ta-da!!!):
What’s the use case?
Anything dealing with small blocks (database, apps, VMs, zvols) benefits from mirrors.
So are you doing bulk storage on SSDs, to make best use of raidz?
HDDs and SSDs have a stated statistical chance of those occurring, published in the storage device’s specifications. It’s been pretty high, up until we started seeing >2TB storage devices. Then, the odds start to take a bad turn.
Having one or just a few URE(s) when using ZFS is not fatal, but can take out part of file(s). Thus, you would need to recover the affected file(s).
If using Dataset defaults, and if the URE(s) happens to affect Metadata, the chance of data loss is VERY LOW. This is because Metadata is redundant by default, EVEN ON A RAID-Zx vDev / pool.
To be clear, it is entirely possible to have a 2nd storage device, HDD or SSD, fail completely during a re-silver. And if not using “replace in place” for the first failure, then with RAID-Z1 you get total pool loss.
By “replace in place”, I mean replacing a storage device that has not failed completely, with another while both are installed. ZFS will simply Mirror the 2 devices, and when the resilver is complete, detach the failing device. Any block unavailable on the source disk, ZFS will use any redundancy available, like RAID-Z1 parity.
IMHO SSDs are more prone to failure, because of firmware errors.
So I would recommend you use two different drives, from different vendors, with different controllers. That way something like a “Samsung we instantly shut down instead of throttling” or a “ADATA we lie about sync writes” firmware bug does not lead to a loss of the pool, just one disk