I have created through a mixture of neglect and ignorance a pool with two VDEV’s which started out as a two mirror VDEVs. I added a hot spare which has subsequently been used, and I have removed the related failed drive. This is the status of the pool as it stands now:
root@kore[~]# zpool status heracles
pool: heracles
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 64.6G in 00:08:33 with 0 errors on Thu May 2 08:32:47 2024
config:
NAME STATE READ WRITE CKSUM
heracles ONLINE 0 0 0
spare-0 ONLINE 0 0 0
bd2928b9-abb0-46b8-95cb-f6a2b775017e ONLINE 0 0 0
32f92237-eb6f-403f-9135-a1d4b54ab9c6 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
a2b236c0-20c3-4c16-a5cc-f0ecdf951440 ONLINE 0 0 0
55c81f95-608c-46d7-b2f7-1e9d9865329b ONLINE 0 0 0
spares
32f92237-eb6f-403f-9135-a1d4b54ab9c6 INUSE currently in use
errors: No known data errors
I am awaiting delivery of new disks as the “INUSE” spare has had a few errors in the past - I can only add a single disk at a time due to SATA port occupancy.
Is there ANY way I can ‘convert’ the vdev ‘spare-0’ back into ‘mirror-0’? Is replacing the “INUSE” spare with a new drive going to get me back to where I need to go?
What you have here is a stripe of a mirror and a single disk vdev Are you sure you removed the right drive?
The single disk had a failure and is being replaced by the spare. From there, ZFS expects decisions by the administrator.
You can extend the single drive with a new drive to make the vdev a mirror and remove the spare.
Storage > Pool > (gear) > Status > (disk) > (…) > Extend
DISCLAIMER: I am NOT a ZFS expert. Get corroborating answers before doing any ZFS actions I suggest.
I actually see only 4 drives here - because the spare is listed twice. However I would have expected the 5th drive to be shown as UNAVAIL something like:
root@kore[~]# zpool status heracles
pool: heracles
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 64.6G in 00:08:33 with 0 errors on Thu May 2 08:32:47 2024
config:
NAME STATE READ WRITE CKSUM
heracles ONLINE 0 0 0
spare-0 ONLINE 0 0 0
bd2928b9-abb0-46b8-95cb-f6a2b775017e ONLINE 0 0 0
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx UNAVAIL 0 0 0 cannot open
32f92237-eb6f-403f-9135-a1d4b54ab9c6 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
a2b236c0-20c3-4c16-a5cc-f0ecdf951440 ONLINE 0 0 0
55c81f95-608c-46d7-b2f7-1e9d9865329b ONLINE 0 0 0
spares
32f92237-eb6f-403f-9135-a1d4b54ab9c6 INUSE currently in use
errors: No known data errors
What does the TrueNAS Storage GUI show under “Manage Disks” and “Manage Devices”? Can you post screen shots please?
That said, the pool looks good, and hopefully you will not have any difficulty getting it back into the correct shape. I think we should see your screen shots before you do anything, but in principle here is what I would do?
I think the first thing to do is to see what happened to the failed drive. Can TN still see the 5th drive and if so what is its status? Can you run a short / long SMART test on it? What errors does it report?
Next you need to restore this pool back to the normal state without a spare. Unfortunately according to Solaris ZFS docs using the CLI you would do this with a zpool detach heracles xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx and this would also switch spare-0 back to mirror-0. So I am not sure what you should do here since the failed drive is not shown in the status output. But you should try to do this via the GUI in order to keep the GUI in step with the pool definitions (and avoid an export/import).
Then replace the failed drive with a new one and once it is visible in TrueNAS, run some tests on it e.g. a SMART long test, and if that is ok, add it to the pool as a spare, again using the GUI. (Or do it the other way around so that you have the spare available as quickly as possible.)
Yes I have removed the UNAVAIL drive. I can’t give any update until I receive replacement drive(s) arriving early this week and today I’m hosting a 1st birthday party so not permitted to abscond to the study…
@etorix your picture is what I recall from before my ignorant REMOVE action.
I’ll give more detail when I have fresh drives to add…
Do you mean that you have physically removed the drive?
What does the zfs status show now? What does the GUI show now?
I recommend that you do NOT make ANY further changes without getting advice here - it is pretty easy to make a mistake and lose your data if you just do things without being absolutely 100% certain that they are the right things to do.
But it is your NAS and it is your data, so you do also have the absolute freedom to do as you wish - but you will have to live with the consequences good or bad.
I have five drive bays (and SATA ports) available in total. My biggest mistake (so far?) has been to “REMOVE” from the pool the totally failed drive in state UNAVAIL. I believe now that I could have physically replaced it with a new drive, then replaced it in the pool with the new drive.
I do not understand the meaning of the ‘spare-0’ element. Clearly the drive I had designated as a spare has been combined with ‘bd2928b9…’ in some way - is it replacing it or mirrored with it? My current guess is: it’s a mirror, where the hot spare has replaced the now-removed drive.
My current plan-of-last-resort is to reanimate my previous TrueNAS server and migrate all data onto it, remove the pool completely and recreate with four good drives (yes, I’m sticking with a stripe of two mirrors).
@etorix your original suggestion still sounds feasible (extend the vdev to turn it back in to a mirror) so I’ll see if that works first (and seek further advice on dealing with the spare if needed). Still waiting for a courier to arrive…
@Stux I guess my limited experience with other SAN systems led me to the expectation that the spare would automatically become the replacement in the case of a failure. I believe I’ve cut myself off from the option to use it in that way this time, though. I will certainly be considering burning in, thanks for the advice.
@Protopia I do have freedom. It’s not always the best thing, as I’m proving. Reanimation of my old TrueNAS, replication, recreate pool, replication is my next step as the alternatives seem to be unworkable…
…largely because I did a stupid thing.
ZFS will use a hot spare to automatically replace a drive (resilver to the hot spare) if configured properly and at least as large as the drive it is replacing. When I was using very old drives, and mostly playing around, happened about every month for me, lol. It does not however do anything with the old drive. That is manual. That’s what a hot spare is, a cold spare is all manual.
The spare is mirroring the failing drive but not replacing. ZFS does not take decisions about replacing drives: Decisions are to be made by the administrator.
Possible choices include:
Trusting the failing drive back into the array, and returning the spare to spare.
Replacing the failing drive, and returning the spare to spare.
Making the spare a permanent data drive, and removing the other drive.
Making the spare a permanent data drive, and turing the other drive to spare.
Making the spare a permanent data drive, and bringing in another drive as spare.
Of these, the only feasible options seem to me to be those that involve “Making the spare a permanent data drive”. In the GUI I cannot see any obvious way to do that. To clarify, this is what I currently see:
I have identical model drive(s) now available, one of which is currently ‘warming’ in the vacated slot of the failed drive. Having got myself into this corner, I’m not willing to risk hitting the ‘replace’ button on ‘sdd’
“Remove”, “Detach”, or “Replace” on sdd or sdc, as appropriate, should make any option possible.
Ideally, new drives should be burnt-in for some time before being put into production.
And, given the particular situation, the most useful button is “Extend” to make the vdev a 2-way mirror again. However I’m not sure what’s the best, or even the correct, order of steps here: Extend and then remove, or remove and extend.