I was trying to expand my Pool ‘Ocean’ from 2x12Tb (Mirrored) “sda and sdb”+ 2x10Tb (Mirrored) “sdc and sdd” to 2x24Tb(Mirrored) “sde and sdf” + 2x10Tb(Mirrored) “sdc and sdd”. In other words, I was replacing the 12 Tb mirrored storage with 24 Tb mirrored storage for a total of 34 Tb mirrored storage.
I did a replace on drives sda with sde and sdb with sdf, one by one, letting each resilver each time. After this completed, I tried to expand the pool ‘Ocean’ because it did not expand automatically. TrueNAS gave me an error and told me to reboot. So I did. However upon the reboot I noticed that drive ‘sde’, one of the new 24 TB drives, was reading as ‘Unavailable’ in the Pool ‘Manage Devices’ screen, even though it responded to SMART tests just fine. I tried onlining it through the command line, but it didn’t work. Finally I tried ‘Detach’ on ‘sde’ in the GUI, reasoning that ‘sdf’ and my ‘sdc sdd’ mirror was healthy and I could expand the single drive first, then try to re-mirror onto ‘sde’, and worst case I could simply run with the three drive system while I waited for a new drive if I needed to replace ‘sde’. However when I hit ‘Expand’ for the pool in the three drive configuration, TrueNAS threw and error and told me to reboot. When I rebooted, the pool had ZERO drives assigned to it. When I checked the disk list, I saw all of the drives were still there, but sde read as assigned to pool ‘N/A’ and the three ‘healthy’ drives were reading as assigned to pool ‘Ocean (Exported)’.
So now I’m very concerned that if I do the wrong thing, I’m going to lose my data. Does anyone have any advice on how to recover to a healthy 4-disk mirrored setup?
It flashed up on the screen, I should have taken a screenshot of it. Something about the drive still being in use and unable to expand and recommending a reboot before making any further changes.
Is there any HBA or “SATA card” involved?
All my drives are connected directly to Mobo
What do you mean it responded to SMART tests? From the GUI you ran a short test for sde?
Correct. I ran a short test for drive ‘sde’.
What did you try? Did you actually specify the kernel identifier name sde or the PARTUUID?
I wasn’t really winging it, but trying to follow some examples from the old forums. In any case, I used the PARTUUID and it returned that the drive had been onlined but the pool would continue to be degraded.
As far as I know, TrueNAS does not allow you to do such actions on a degraded pool. (Or maybe it does now?)
Don’t know what to tell you, there was a ‘Detach’ button for the Unavailable drive.
Output of zpool import:
root@truenas[~]# zpool import
pool: Ocean
id: 12947942587710781215
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:
Ocean UNAVAIL insufficient replicas
mirror-0 ONLINE
f00e5072-d12c-11ea-94de-40167e27b4a2 ONLINE
f0136e31-d12c-11ea-94de-40167e27b4a2 ONLINE
1b97be74-19e9-48f5-9ead-844b66857f35 UNAVAIL invalid label
I was referring to the “Expand” part, later in the quote. Sorry for not being more specific.
Regardless, you shouldn’t “detach” a drive from a mirror vdev, unless you wish to convert it into a stripe.
This is unnerving. There are only three drives, one of which has an “invalid label”, and I’m assuming the other one was tossed to the void when you “detached” it.
This is out of my comfort level.
@HoneyBadger, maybe there’s a safe approach to this?
My guess is that mirror-1 was always running “degraded” with a single working drive, and then that drive was supposedly detached, leaving only the “invalid label” drive as a sole stripe. Basically, mirror-1 was superficially destroyed.
Thank you for your help. I hope there is a safe path back to my data.
Just as a note, I’m almost certain that I ‘detached’ the drive originally labeled unavailable, not the ‘healthy’ one by mistake. I checked and the serial number corresponding to the drive you see in the zpool import output was the same drive that was reading ‘Healthy’ before I detached the other drive.
I’ve given up on trying to figure out a solution to recovering my data directly. I’m just going to wipe the disks, rebuild the mirror with the new disks, and restore as much data as possible from my old server to my new one. I’m losing a couple of years of data by doing this since I stopped updating my old server several years ago, but nothing critical. All my critical stuff has offsite backup I can restore from. It’s just a bummer and a frustration is all.
Lessons learned:
A) Use checkpoints!
B) Don’t ever just detach a mirror
C) Update your independent backup more often than once every few years.
ok, so the error I’m betting is the same old crap of
“Partition(s) 1, 4 on /dev/sdwhatever have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.”
to be clear, this is an old problem, the expand button seems to be broken every other release of TrueNAS and I would not rely on it. I’ve found the only reliable way is to offline the disk, resize the partition, and online -e it. anything else turns into a gamble of missing devices. there is an unofficial guide from back in the day if anyone else might search this error before rebooting and disaster strikes.
at this point, don’t try to write new partitions in any circumstance, though can you post what the partition layout of the unhappy disk is? you can post the output from fdisk /dev/sdwhatever -l and lsblk -o NAME,SIZE,PARTUUID ?
sdg is correct. It matches the unavailable drive label I get from zpool import command.
The other drive you’re looking for is in the list, it’s sdd. This should be the drive I ‘detached’ because IT was initially reading as unavailable. (I think the error effected me twice, once when I tried to expand the full mirrored pool, and once after I detached the first unavailable disk and tried to expand the pool into the remaining disk).
If it helps, sdj and sdl are the drives the two 24TB drives were meant to replace and should have the same (mirrored) data.