Pool state is DEGRADED : needing to id drive to replace

David_Bowman · July 29, 2025, 6:03pm

Dear TrueNas Fellowship,

I am in need of your assistance to help me bring my server back to life.

My true nas server is running in a degraded state at present, and I get this error

Critical

Pool {ServerName} state is DEGRADED: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.

The following devices are not healthy:

Disk 7831626758070777019 is UNAVAIL

Here I have included a snap shot of my zpool status showing that all my drives are online, and the one id that is unavailable is the issue.

admin@{ServerName}[~]$ sudo zpool status
[sudo] password for admin:
pool: {ServerName}
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using ‘zpool replace’.
see:
scan: scrub repaired 0B in 06:51:25 with 0 errors on Sun May 4 06:51:32 2025
config:

    NAME                                      STATE     READ WRITE CKSUM
    {ServerName}                             DEGRADED     0     0     0
      raidz1-0                                DEGRADED     0     0     0
        89281c87-59cd-4165-bb86-ce8a6664551a  ONLINE       0     0     0
        7831626758070777019                   UNAVAIL      0     0     0  wa                                                                                     s /dev/disk/by-partuuid/91c95d3b-6ba2-43fc-a56a-5d52aa418854
        57093934-5f88-4fcb-b1b7-d7892c179c65  ONLINE       0     0     0
        3c32f442-34ee-4aba-ae6d-b1b554bc08cf  ONLINE       0     0     0
        269dd20f-7756-44a0-9a76-42e029d1a91a  ONLINE       0     0     0
    dedup
      7f7a12d1-a01a-4283-aac7-8a6282886a93    ONLINE       0     0     0
    cache
      5ee090ff-e43b-4b64-b5f5-eda75b330cc5    ONLINE       0     0     0

errors: No known data errors

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:01 with 0 errors on Sat Jul 26 03:46:02 2025
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      sdc3      ONLINE       0     0     0

errors: No known data errors

I have read a few forums saying it could be an issue when one of the drives re-silvered, or possible bad sata connections.

If I could understand what drive is bad I can replace it as I have a spare drive for that purpose, but I get a feeling this issue is related to something else.

If there is anyone out there that has experienced this issue and has any advice on my next steps, I would hugely appreciate it.

thanks

SmallBarky · July 29, 2025, 6:19pm

You should be able to use the following commands to get disk serial number. You can use that to rule out the working ones from your list.

Do you have a spare hard drive port to to an in place replacement of the unavail drive? You would add the new disk physically, without removing the unavail and then choose the unavail in the GUI and then Replace option and point to new drive.
Otherwise, you just have to shuffle bad and good drives and then do a replace procedure.

lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME

sudo ZPOOL_SCRIPTS_AS_ROOT=1 zpool status -vLtsc lsblk,serial,smartx,smart

HoneyBadger · July 29, 2025, 7:05pm

FYI, this configuration means that your deduplication drive is a single point of failure. Do you have a second identical/larger drive you can use to ATTACH to this device and provide redundancy?

Protopia · July 29, 2025, 7:33pm

In addition to the two commands that @SmallBarky has come up with, please let us know why you think you need dedup and how much memory your system has?

Once we have the output from these commands we will probably want you to run a couple of zdb commands to see what the labels say, and also likely some smartctl commands to check the drive diagnostic status.

In the mean time I would refrain from rebooting because the pool is currently working degraded and after a reboot it may have difficulties being imported.

David_Bowman · July 29, 2025, 8:07pm

Hello Small Barky,

Thank you for your reply. I am posting here the result from the first command line you sent me. I will now run the second one.

admin@{server name}[~]$ lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME
NAME LABEL MAJ:MIN TRAN ROTA ZONED VENDOR MODEL SERIAL PARTUUID START SIZE PA RTTYPENAME
sda 8:0 sata 0 none ATA SAMSU S206NX 128035676160
└─sda1
8:1 0 none 5ee090ff-e43b-4b64-b5f5-eda75b330cc5 2048 128034275328 So laris /usr & Apple ZFS
sdb 8:16 sata 1 none ATA WDC W 5MG1GB 14000519643136
└─sdb1
{server name}
8:17 1 none 57093934-5f88-4fcb-b1b7-d7892c179c65 4096 14000517529088 So laris /usr & Apple ZFS
sdc 8:32 sata 0 none ATA Samsu S39KNW 256060514304
├─sdc1
│ 8:33 0 none 2769aa24-912e-4096-9fd6-5521a1b3b2b6 4096 1048576 BI OS boot
├─sdc2
│ EFI 8:34 0 none a8591b9f-f687-4674-b288-39d8817c5d7f 6144 536870912 EF I System
├─sdc3
│ boot-pool
│ 8:35 0 none 201caab2-3aa6-458e-b423-73db31b84504 34609152 238340611584 So laris /usr & Apple ZFS
└─sdc4
8:36 0 none e81fb459-24e5-4998-a8a2-65b44a53da9e 1054720 17179869184 Li nux swap
sdd 8:48 sata 1 none ATA WDC W 6AGB36 14000519643136
└─sdd1
{server name}
8:49 1 none 3c32f442-34ee-4aba-ae6d-b1b554bc08cf 4096 14000517529088 So laris /usr & Apple ZFS
sde 8:64 sata 1 none ATA WDC W 5MG1E6 14000519643136
└─sde1
{server name}
8:65 1 none 269dd20f-7756-44a0-9a76-42e029d1a91a 4096 14000517529088 So laris /usr & Apple ZFS
sdf 8:80 sata 1 none ATA WDC W 5MG1EN 14000519643136
└─sdf1
{server name}
8:81 1 none 7f7a12d1-a01a-4283-aac7-8a6282886a93 2048 14000518577664 So laris /usr & Apple ZFS
sdg 8:96 sata 1 none ATA WDC W 6AGB7R 14000519643136
└─sdg1
{server name}
8:97 1 none 89281c87-59cd-4165-bb86-ce8a6664551a 4096 14000517529088 So

David_Bowman · July 29, 2025, 8:12pm

here is the results form the second command line.

pool: {ServerName}
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using ‘zpool replace’.
see:
scan: scrub repaired 0B in 06:51:25 with 0 errors on Sun May 4 06:51:32 2025
config:

    NAME                     STATE     READ WRITE CKSUM  SLOW    size  vendor                       model                                                                                                      serial  hours_on  pwr_cyc  health  realloc  cmd_to  temp  off_ucor  ata_err  rep_ucor  pend_sec
    {ServerName}             DEGRADED     0     0     0     -
      raidz1-0               DEGRADED     0     0     0     -
        sdg1                 ONLINE       0     0     0     0   12.7T     ATA       WDC WD142KFGX-68AFPN0         6A                                                                                           GB7R7U     10193       27  PASSED        0       -    46         0        -         -         0  (trim unsupported)
        7831626758070777019  UNAVAIL      0     0     0     0  was /dev/disk/by-partuuid/91c95d3b-6ba2-43fc-a56a-5d5                                                                                           2aa418854       -       -                           -                -         -        -       -        -       -     -                                                                                                    -        -         -         -  (trim unsupported)
        sdb1                 ONLINE       0     0     0     0   12.7T     ATA       WDC WD142KFGX-68AFPN0         5M                                                                                           G1GBVK      8635       24  PASSED        0       -    43         0        -         -         0  (trim unsupported)
        sdd1                 ONLINE       0     0     0     0   12.7T     ATA       WDC WD142KFGX-68AFPN0         6A                                                                                           GB365U     10314       27  PASSED        0       -    46         0        -         -         0  (trim unsupported)
        sde1                 ONLINE       0     0     0     0   12.7T     ATA       WDC WD142KFGX-68AFPN0         5M                                                                                           G1E6YK      8611       23  PASSED        0       -    47         0        -         -         0  (trim unsupported)
    dedup
      sdf1                   ONLINE       0     0     0     0   12.7T     ATA       WDC WD142KFGX-68AFPN0         5M                                                                                           G1EN9K      8168       22  PASSED        0       -    43         0       36         -         0  (trim unsupported)
    cache
      sda1                   ONLINE       0     0     0     0  119.2G     ATA  SAMSUNG MZNLN128HCGR-000H1   S206NXAG                                                                                           A06786     11582     3606  PASSED        0      16    36         0        -         -         -  (untrimmed)

errors: No known data errors

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:01 with 0 errors on Sat Jul 26 03:46:02 2025
config:

    NAME        STATE     READ WRITE CKSUM  SLOW    size  vendor                       model           serial  hours                                                                                           _on  pwr_cyc  health  realloc  cmd_to  temp  off_ucor  ata_err  rep_ucor  pend_sec
    boot-pool   ONLINE       0     0     0     -
      sdc3      ONLINE       0     0     0     0  238.5G     ATA   Samsung SSD 850 PRO 256GB  S39KNWAJ303212A     25                                                                                           536     3420  PASSED        0       -    29         -        -         -         -  (untrimmed)

errors: No known data errors

David_Bowman · July 29, 2025, 8:14pm

to add, I do not have a spare port unfortunatly, I can only take out and replace as it stands at the moment.

David_Bowman · July 29, 2025, 8:17pm

hey Protopia, To be honest, I had no idea what the dedup drive was, I don;t remember ever putting it into this setup. Should I replace that drive?

David_Bowman · July 29, 2025, 8:21pm

I also wanted to add. the system has 64gb of ram and 128gb cache drive.

I have rebooted the system a couple of times since I saw the error, and the system has actually been in a degraded state for some months now. I will avoid rebooting though from this point as you have instructed.

SmallBarky · July 29, 2025, 8:35pm

Don’t touch the dedupe drive. It is critical to the pool once there. Is the dedup and L2ARC cache drive the same size? The L2ARC can be removed from a pool without damage, the dedupe would destroy the pool.

You need to copy and paste any CLI commands and results back to the forum using Preformatted Text </> on the toolbar or Ctrl-+e. I think you are just copy/pasting everything into the reply box and it is autoformatting. It’s hard to read. Take a look at the results for the second command and the box of text describing the pool setup. As you scroll the window to the right, it looks like a huge blank space is inserted before you see the results on the far right of the box.

Protopia · July 29, 2025, 9:07pm

AFAIK you cannot remove a dedup vDev - but as I understand it if you haven’t got a use case then it is likely to have a significant detrimental effect on the performance of your NAS.

So as a long term strategy you should aim to move your data off, destroy and recreate your pool and move your data back again. In the mean time, if this fails you will lose the entire pool’s worth of data.

You would also be advised to create the new pool as RAIDZ2 because a 5x 14TB RAIDZ1 is recommended to be RAIDZ2 because of the risk of a resilver (which you now need to do) causing a 2nd drive to fail causing you to lose all your data.

I see that your current drives have between 8,600 and 10,200 power-on hours = so they are only between just under 1 year and 1.25 years old and thus should still be under warranty.

It looks like the failing drive can no longer be seen by Linux as those listed by lsblk are all accounted for:

sda - cache
sdb - data
sdc - boot
sdd - data
sde - data
sdf - dedup
sdg - data

I have three possible theories that could have happened to this drive:

It has failed completely - you will need a physical replacement and will need to resilver - and you can likely return the broken drive under warranty;
It has lost connectivity with the MB (e.g. sata or power cable came off) - but you can bring this back to life by reconnecting it and then resilver;
The dedup drive was until recently the missing data drive and somehow it has been reassigned as a dedup drive. You cannot remove this and you don’t have any spare slots to resilver so backup, recreate and restore is your only solution.

What you do next will depend on which of these is the case?

How many 14TB drives do you physically have in your system? And do you have a record of the serial numbers for them (without having to physically withdraw the drives to look)?

David_Bowman · July 29, 2025, 11:12pm

Hello Protopia, and Small Barky,

Apologize for not understanding the correct formatting for the results, I’m a little new to the forum, I will improve on that moving forward.

I think I will move the data off and rebuild the Nas. it is still relatively new, and can hopefully make all data safe within a few days. I will then rebuild the nas based on RAIDZ2 for the future. I might also order some new connectors just to be safe.

Thank you so much for your answers, I really appreciate it.

Protopia · July 30, 2025, 9:27am

Copying your data elsewhere is a good safety action anyway, however I would advise that you work out what happened and why before you destroy the existing pool and recreate it.

How many 14TB drives do you physically have in your system? And did you take any actions before the pool got into this state that might explain it?

Topic		Replies	Views
Help! Shut down system to replace failed disk, now storage pool is not mounted TrueNAS General	32	263	March 28, 2026
TrueNAS Scale - Drive/Pool issues after power problems TrueNAS General	49	234	July 1, 2025
Cascading failures while replacing a drive in RAIDZ2 TrueNAS General CORE , Hardware , ZFS	8	153	November 6, 2025
Help! Pool disappeared & Disks Showing as Unassigned TrueNAS General	9	429	October 6, 2024
Replaced a broken disk and having trouble resilvering TrueNAS General CORE	33	641	October 7, 2024

Pool state is DEGRADED : needing to id drive to replace

Critical

Pool {ServerName} state is DEGRADED: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.

Related topics