Which of my understandings are wrong: How spares work, or what my box is telling me now?

I’m a fairly new user of TrueNAS and ZFS, I set up my NAS this summer. I had three disks available, and figured I would use two in a mirror setup, and add the third as a spare.

My understanding was that if/when one of the disks in the mirror failed, the spare would step in and take its place and I could replace the failing disk when I had time. Until that time, everything would work fine, with the redundancy that mirroring offers. :crossed_fingers:

A few days ago, one of the disks in the mirror failed, and a resilvering was done, but the system is still set as degraded, and from the UI I’m not sure if the spare has been added to the mirror, or if the entire mirror has been taken out of use and I’m now running on a single disk. :confused:

It doesn’t help that I don’t remember which disks were in the mirror and which was the spare. :grimacing:

The UI also says I have 1 unused disk, but it doesn’t tell me which one it is, it only tells me the size. All the disks are the same size, so that’s not really helpful at all. :person_facepalming:

I’ve discussed this with some people who have used ZFS for a long time, but they all have lots of disks and never used TrueNAS so their recommendations doesn’t “cleanly apply” to my situation/UI.

I’m not allowed to embed a screenshot or a link to a screenshot (new account I guess?), so I will try to describe what I see:

Under Storage → Manage devices.

Data VDEVs
 MIRROR     DEGRADED               No errors
   sda      ONLINE      2.73 TiB   No errors
 SPARE      DEGRADED               No errors
   sdc      FAULTED     2.73 TiB   18 Errors
   sdd      ONLINE      2.73 TiB   No errors
Spare
   sdd      UNAVAIL     2.73 TiB   No errors

I guess I have two questions:

  1. What should my immediate next step be?

I think I have another 3TB drive in a different machine, which is not in use at the moment. Would any solution be easier by using that drive (either by moving into the NAS, or using via network)?

  1. Longer term, is the mirror + spare setup a bad choice, should I do something else instead?

The people I talked to suggested RAIDZ2, but that requires 4 drives. I guess I could use RAIDZ1, but I’m not sure I understand when you would use mirror+spare vs raidz1.

sdd is the spare.

sdc has faulted and been replaced by sdd.

When you pull sdc, be aware that sda/b/c/d etc are subject to change between reboots, so make sure you pull the correct drive

1 Like

The vdev should tell you which disk is bad and serial number. Expand vdev, click on the bad disk, and it should give you the information for the disk, including serial number. I have an excel that tells me which disk is connected to which port on my rig based on serial numbers because TrueNAS doesn’t care which port the disk sits.

The “Spare” is unavailable because it is acting as your reserved mirror until the failed disk can be replaced. Once the failed disk has been replaced, it will go back to healthy and the spare will go back to being a spare.

Ah. I think my confusion stems from how the spare returns to being a spare when the failing disk is replaced. I thought it would “step up” and enter into service as a normal drive fully replacing the failed drive. Instead it is more like those tiny spare wheels on cars, which you can use just to get to the workshop, where you get a proper wheel to replace the failed one, putting the spare back in the trunk.

That makes sense, thanks for clearing that up.

Longer term, would I benefit more from a RAIDZ1 setup than the current mirror+spare, or is the difference not that meaningful at this level?

SSD’s tend to fail due to write limitations of NANDs. Identical ssd’s will likely fail within short periods of each other with duplicate or similar write stress applied to them. Mirror applies identical write stress to each disk while raidz distributes the writes across the disks. Raidz will offer more capacity for storage and should extend ssd life. In either case, you’ll want a hot spare. If resilvering is successful, you could lose 2 drives and still be operational until the failed disk(s) are replaced.

I personally make it a rule to keep my ssd’s below 50% storage to preserve the life of my ssd’s. I have ssd’s that are well over 10 years old from Samsung and Mushkin actively in use and showing healthy. You can fill an ssd, just don’t keep it that way since the ssd uses the free space to cycle the writes. Because of the equal data write in a mirror, I’m not a fan of it for ssd’s, but what you have works because of the hot spare. Basically, your choice for pro’s and cons of each setup.

1 Like

It can. But in any case this must come from an explicit decision by the administrator—YOU!

Interesting, how would I do that? It’s not obvious from the UI to me at least …

I am not sure how to do it from the GUI.

But the concept is if you remove the failed disk from the pool, (aka NOT replace it), then the Hot Spare will take over the failed disk’s position in the pool. This of course causes the Hot Spare list to disappear because you no longer have one. (If you had 2 Hot Spares, then the list would still exist, just not show the disk that is now a normal data vDev device.)

At some future date, you could add a new Hot Spare disk.

Note that Hot Spares are optional vDevs that can be added or removed at any time. (Well, as long as it / they are not in use.)

As for Mirror pair over RAID-Z1, their are a lot of opinions on this subject. My opinion is that on 2TB and larger disks, RAID-Z1 is too much of a risk for my data. So either RAID-Z2, or even perhaps 3 way Mirrors.

may be helpful?

3 Likes