ZFS disk replacement fails (despite identical replacement disk)

215n · June 6, 2024, 9:20pm

Hi all,

somewhat of a long time lurker here - happy with TrueNAS in general and never really had any issues with the system until now.

I’ve been running TrueNAS CORE v13.0-U6.1, did the upgrade/migration to SCALE v24.04.1 (using the update file) and later updated to SCALE v24.04.1.1 using the UI. Migration and updates went without any issues and the system has been running smoothly so far.

However, now a disk in one of my pools failed and I’m struggling with the replacement on SCALE. Replaced disks on CORE without issues before, though.

When attempting to replace the failed disk, I get the following error (doesn’t matter whether I use the ‘force’ flag or not if disk is empty):

Error: [EZFS_BADDEV] cannot replace 8525966784619120927 with /dev/disk/by-partuuid/127f08dc-c35c-4560-b8c0-aa5f13e5d33b: device is too small

I’ve been browsing the archived forums and stumbled upon the swap cache issue - however, setting the swap size for the pool to ‘0’ and attempting replacement again still yields the same error message (I set the swap back to 2GB afterwards - don’t want to fumble with the defaults unless I have to).

So I started digging through disk sizes and partitions and here is where I’m struggling to make sense of things.

This is the affected pool (has been created in an earlier version of CORE and all former disk replacements were done through the UI):

pool: Rusty
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 06:51:10 with 0 errors on Thu May 23 07:50:30 2024
config:

        NAME                     STATE     READ WRITE CKSUM
        Rusty                    DEGRADED     0     0     0
          raidz2-0               DEGRADED     0     0     0
            sdn2                 ONLINE       0     0     0
            sdm2                 ONLINE       0     0     0
            sda2                 ONLINE       0     0     0
            sdc2                 ONLINE       0     0     0
            sdf2                 ONLINE       0     0     0
            sdg2                 ONLINE       0     0     0
            8525966784619120927  UNAVAIL      0     0     0  was /dev/sdj2
            sde2                 ONLINE       0     0     0
            sdb2                 ONLINE       0     0     0
            sdd2                 ONLINE       0     0     0

These are the corresponding disks and partitions (minues the broken one):

Disk /dev/sdn: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68N
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D2D4D3AC-31C0-11EC-9AA7-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sdn1      128    4194431    4194304    2G FreeBSD swap
/dev/sdn2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS


Disk /dev/sdm: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WL4000GSA6454G  
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 42DB5D16-3FEA-11EB-9C96-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sdm1      128    4194431    4194304    2G FreeBSD swap
/dev/sdm2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS

Disk /dev/sdc: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: ST4000NM0053    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 9CB7AEA1-17A6-11EF-ADE0-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sdc1      128    4194431    4194304    2G FreeBSD swap
/dev/sdc2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS


Disk /dev/sdf: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 43F6024C-3FEA-11EB-9C96-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sdf1      128    4194431    4194304    2G FreeBSD swap
/dev/sdf2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS


Disk /dev/sdd: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 449365B8-3FEA-11EB-9C96-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sdd1      128    4194431    4194304    2G FreeBSD swap
/dev/sdd2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS

Disk /dev/sdb: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 44455FDA-3FEA-11EB-9C96-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sdb1      128    4194431    4194304    2G FreeBSD swap
/dev/sdb2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS


Disk /dev/sde: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 440DAE54-3FEA-11EB-9C96-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sde1      128    4194431    4194304    2G FreeBSD swap
/dev/sde2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS


Disk /dev/sda: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 43DD9ACA-3FEA-11EB-9C96-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sda1      128    4194431    4194304    2G FreeBSD swap
/dev/sda2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS


Disk /dev/sdg: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: WDC WD40EFRX-68W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 44896AC8-3FEA-11EB-9C96-1831BFC83E68

Device       Start        End    Sectors  Size Type
/dev/sdg1      128    4194431    4194304    2G FreeBSD swap
/dev/sdg2  4194432 7814037127 7809842696  3.6T FreeBSD ZFS

And this is the disk I intend to use as replacement (when empty):

Disk /dev/sdk: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: MB4000GFEMK     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

From what I can tell, size should not be an issue here, since bytes/sectors match disks already present in the pool?

Now, the strange thing is - after the replace fails in the UI, this is what I end up with on the replacement disk:

Disk /dev/sdk: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: MB4000GFEMK     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: C11AFC81-0CBC-4AB4-90AB-30D6DB4241ED

Device     Start       End   Sectors   Size Type
/dev/sdk1   4096 435294208 435290113 207.6G Solaris /usr & Apple ZFS

Incidently, the partition size matches the one from my boot pool disks (however, swap partition is missing and partition type is off, so may just be coincidence):

Disk /dev/sdj: 223.57 GiB, 240057409536 bytes, 468862128 sectors
Disk model: ADATA SU630     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 388D3A21-0E7A-11EC-945F-1831BFC83E68

Device        Start       End   Sectors   Size Type
/dev/sdj1        40      1063      1024   512K BIOS boot
/dev/sdj2  33555496 468845607 435290112 207.6G FreeBSD ZFS
/dev/sdj3      1064  33555495  33554432    16G FreeBSD swap

Partition table entries are not in disk order.


Disk /dev/sdi: 223.57 GiB, 240057409536 bytes, 468862128 sectors
Disk model: ADATA SU630     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 38A3D7BB-0E7A-11EC-945F-1831BFC83E68

Device        Start       End   Sectors   Size Type
/dev/sdi1        40      1063      1024   512K BIOS boot
/dev/sdi2  33555496 468845607 435290112 207.6G FreeBSD ZFS
/dev/sdi3      1064  33555495  33554432    16G FreeBSD swap

To be sure this is not an issue with the disk hardware type, I’ve also tried using a disk identical to dev/sdc (ST4000NM0053) - with the same result.

I’m at a loss here, my best guess is that TrueNAS (for some reason) is looking at the boot-pool when doing the partitioning for the replacement disk and ends up with too small of a partition for my data pool.

I haven’t tried manual partitioning and replacement through the shell. I also haven’t tried the replace command on TrueNAS CLI since I was unable to find documentation for the exact parameters required.

Would some kind soul be able to point me in the right direction? Happy to provide further details of the system where required.

Many thanks!

Stux · June 6, 2024, 10:33pm

Certainly looks like it’s setting up the disk wrong.

AFAIK the right way is how your other disks are setup.

Then you want to replace the unavailable disk with the partid of the 2nd partition on the new disk.

AFAIK that’s what the gui should do

You can use lsblk to get partids

215n · June 8, 2024, 10:57am

Hi Stux,

thanks for taking a look.

So I should be OK if I create a partition set identical to the existing disks and then do a “zpool replace” referencing the second partition?

Just asking as browsing through the archived forums, I got the impression that bypassing the UI was kinda frowned upon and may lead to other issues down the road.

Do I have to do anything with the swap partition?

Other than that, I might try ex-/importing the pool first and maybe a clean reinstall with a config import to make sure this isn’t some weird relict of the upgrade process.

Much obliged!

Stux · June 8, 2024, 10:58am

yes, but by getting the partitioning right you’re doing what the gui would. if it worked right.

That’s the key to going behind the gui’s back.

215n · June 8, 2024, 11:26am

Quick question regarding the UUID then:

Normally, in “zpool status” I can only see the partitions:

root@venture[~]# zpool status Rusty
  pool: Rusty
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 06:51:10 with 0 errors on Thu May 23 07:50:30 2024
config:

        NAME                     STATE     READ WRITE CKSUM
        Rusty                    DEGRADED     0     0     0
          raidz2-0               DEGRADED     0     0     0
            sdn2                 ONLINE       0     0     0
            sdm2                 ONLINE       0     0     0
            sda2                 ONLINE       0     0     0
            sdc2                 ONLINE       0     0     0
            sdf2                 ONLINE       0     0     0
            sdg2                 ONLINE       0     0     0
            8525966784619120927  UNAVAIL      0     0     0  was /dev/sdj2
            sde2                 ONLINE       0     0     0
            sdb2                 ONLINE       0     0     0
            sdd2                 ONLINE       0     0     0

errors: No known data errors

So I go with “zpool status -g”?

root@venture[~]# zpool status Rusty -g
  pool: Rusty
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 06:51:10 with 0 errors on Thu May 23 07:50:30 2024
config:

        NAME                      STATE     READ WRITE CKSUM
        Rusty                     DEGRADED     0     0     0
          9863754867359506781     DEGRADED     0     0     0
            3609866525612761193   ONLINE       0     0     0
            4933973449480118574   ONLINE       0     0     0
            5620212995317312806   ONLINE       0     0     0
            12188169839306550085  ONLINE       0     0     0
            16225804577128544654  ONLINE       0     0     0
            16691950763733439981  ONLINE       0     0     0
            8525966784619120927   UNAVAIL      0     0     0  was /dev/sdj2
            13215263822972852517  ONLINE       0     0     0
            1625842107130927213   ONLINE       0     0     0
            15847930523293626441  ONLINE       0     0     0

errors: No known data errors

However, these UUIDs don’t seem to match the ones from “lsblk -f”:

root@venture[~]# lsblk -f
NAME     FSTYPE     FSVER LABEL     UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                     
├─sda1                                                                                  
└─sda2   zfs_member 5000  Rusty     13810091753465678436                                
sdb                                                                                     
├─sdb1                                                                                  
└─sdb2   zfs_member 5000  Rusty     13810091753465678436                                
sdc                                                                                     
├─sdc1                                                                                  
└─sdc2   zfs_member 5000  Rusty     13810091753465678436                                
sdd                                                                                     
├─sdd1                                                                                  
└─sdd2   zfs_member 5000  Rusty     13810091753465678436                                
sde                                                                                     
├─sde1                                                                                  
└─sde2   zfs_member 5000  Rusty     13810091753465678436

What seems to match is “UUID_SUB” from “blkid”:

root@venture[~]# blkid
/dev/sdf2: LABEL="Rusty" UUID="13810091753465678436" UUID_SUB="16225804577128544654" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="459927b6-3fea-11eb-9c96-1831bfc83e68"
/dev/sdd2: LABEL="Rusty" UUID="13810091753465678436" UUID_SUB="15847930523293626441" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="4614a252-3fea-11eb-9c96-1831bfc83e68"
/dev/sdm2: LABEL="Rusty" UUID="13810091753465678436" UUID_SUB="4933973449480118574" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="436dbb2f-3fea-11eb-9c96-1831bfc83e68"
/dev/sdb2: LABEL="Rusty" UUID="13810091753465678436" UUID_SUB="1625842107130927213" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="45902ddf-3fea-11eb-9c96-1831bfc83e68"

Would that be the ID type I’m looking for? Is there any way to see what was initially used as reference to create the pool?

Apologies for the amount of questions, I’m still learning my way around this.

dan · June 8, 2024, 11:54am

See:
https://wiki.familybrown.org/manual-replacement

215n · June 10, 2024, 6:47pm

Cheers, dan, that did the trick - great writeup!

Pool is now back to full health.

One question remains, after replacing the failed disk, this one is the only one being referenced by its UUID:

pool: Rusty
 state: ONLINE
  scan: resilvered 18.8M in 00:00:04 with 0 errors on Mon Jun 10 20:42:05 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        Rusty                                     ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            sdm2                                  ONLINE       0     0     0
            sdn2                                  ONLINE       0     0     0
            sda2                                  ONLINE       0     0     0
            sdf2                                  ONLINE       0     0     0
            sde2                                  ONLINE       0     0     0
            sdg2                                  ONLINE       0     0     0
            3dc31f7d-3adf-48cb-9ed6-80f7e4babeb2  ONLINE       0     0     0
            sdd2                                  ONLINE       0     0     0
            sdc2                                  ONLINE       0     0     0
            sdb2                                  ONLINE       0     0     0

Am I right to assume that TrueNAS is referencing all other disks using the “normal” Linux disk identifiers? The pool has been automatically imported during the migration from CORE to SCALE.

If so, why? From what I’ve read so far, using the UUIDs is highly desireable since these won’t chance between reboots or when adding new disks.

Should I be doing something to remediate this?

Thanks for the great support, guys!

Stux · June 10, 2024, 10:15pm

This appears to be an artifact of the core to scale migration.

Happened to my pool too. I’m not sure if it’s actually referencing by disk name/part or if that’s just how it’s displayed.

I’m core it was glabel for the partition.

When you replace the disk, it becomes partition uuid.

I don’t know if this is an issue or not. But be very certain you have the right disk identified before replacing a failed disk!

Meanwhile, I’ve been replacing disks one at a time.

If you have a spare you just roll it through. Replace one disk. Use that to replace the next etc.

If you have enough redundancy you can do that too.