Dear TrueNas Fellowship,
I am in need of your assistance to help me bring my server back to life.
My true nas server is running in a degraded state at present, and I get this error
Critical
Pool {ServerName} state is DEGRADED: One or more devices could not be used because the label is missing or invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
- Disk 7831626758070777019 is UNAVAIL
Here I have included a snap shot of my zpool status showing that all my drives are online, and the one id that is unavailable is the issue.
admin@{ServerName}[~]$ sudo zpool status
[sudo] password for admin:
pool: {ServerName}
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using ‘zpool replace’.
see:
scan: scrub repaired 0B in 06:51:25 with 0 errors on Sun May 4 06:51:32 2025
config:
NAME STATE READ WRITE CKSUM
{ServerName} DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
89281c87-59cd-4165-bb86-ce8a6664551a ONLINE 0 0 0
7831626758070777019 UNAVAIL 0 0 0 wa s /dev/disk/by-partuuid/91c95d3b-6ba2-43fc-a56a-5d52aa418854
57093934-5f88-4fcb-b1b7-d7892c179c65 ONLINE 0 0 0
3c32f442-34ee-4aba-ae6d-b1b554bc08cf ONLINE 0 0 0
269dd20f-7756-44a0-9a76-42e029d1a91a ONLINE 0 0 0
dedup
7f7a12d1-a01a-4283-aac7-8a6282886a93 ONLINE 0 0 0
cache
5ee090ff-e43b-4b64-b5f5-eda75b330cc5 ONLINE 0 0 0
errors: No known data errors
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:01 with 0 errors on Sat Jul 26 03:46:02 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdc3 ONLINE 0 0 0
errors: No known data errors
I have read a few forums saying it could be an issue when one of the drives re-silvered, or possible bad sata connections.
If I could understand what drive is bad I can replace it as I have a spare drive for that purpose, but I get a feeling this issue is related to something else.
If there is anyone out there that has experienced this issue and has any advice on my next steps, I would hugely appreciate it.
thanks
You should be able to use the following commands to get disk serial number. You can use that to rule out the working ones from your list.
Do you have a spare hard drive port to to an in place replacement of the unavail drive? You would add the new disk physically, without removing the unavail and then choose the unavail in the GUI and then Replace option and point to new drive.
Otherwise, you just have to shuffle bad and good drives and then do a replace procedure.
lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME
sudo ZPOOL_SCRIPTS_AS_ROOT=1 zpool status -vLtsc lsblk,serial,smartx,smart
FYI, this configuration means that your deduplication drive is a single point of failure. Do you have a second identical/larger drive you can use to ATTACH to this device and provide redundancy?
3 Likes
In addition to the two commands that @SmallBarky has come up with, please let us know why you think you need dedup and how much memory your system has?
Once we have the output from these commands we will probably want you to run a couple of zdb commands to see what the labels say, and also likely some smartctl commands to check the drive diagnostic status.
In the mean time I would refrain from rebooting because the pool is currently working degraded and after a reboot it may have difficulties being imported.
Hello Small Barky,
Thank you for your reply. I am posting here the result from the first command line you sent me. I will now run the second one.
admin@{server name}[~]$ lsblk -bo NAME,LABEL,MAJ:MIN,TRAN,ROTA,ZONED,VENDOR,MODEL,SERIAL,PARTUUID,START,SIZE,PARTTYPENAME
NAME LABEL MAJ:MIN TRAN ROTA ZONED VENDOR MODEL SERIAL PARTUUID START SIZE PA RTTYPENAME
sda 8:0 sata 0 none ATA SAMSU S206NX 128035676160
└─sda1
8:1 0 none 5ee090ff-e43b-4b64-b5f5-eda75b330cc5 2048 128034275328 So laris /usr & Apple ZFS
sdb 8:16 sata 1 none ATA WDC W 5MG1GB 14000519643136
└─sdb1
{server name}
8:17 1 none 57093934-5f88-4fcb-b1b7-d7892c179c65 4096 14000517529088 So laris /usr & Apple ZFS
sdc 8:32 sata 0 none ATA Samsu S39KNW 256060514304
├─sdc1
│ 8:33 0 none 2769aa24-912e-4096-9fd6-5521a1b3b2b6 4096 1048576 BI OS boot
├─sdc2
│ EFI 8:34 0 none a8591b9f-f687-4674-b288-39d8817c5d7f 6144 536870912 EF I System
├─sdc3
│ boot-pool
│ 8:35 0 none 201caab2-3aa6-458e-b423-73db31b84504 34609152 238340611584 So laris /usr & Apple ZFS
└─sdc4
8:36 0 none e81fb459-24e5-4998-a8a2-65b44a53da9e 1054720 17179869184 Li nux swap
sdd 8:48 sata 1 none ATA WDC W 6AGB36 14000519643136
└─sdd1
{server name}
8:49 1 none 3c32f442-34ee-4aba-ae6d-b1b554bc08cf 4096 14000517529088 So laris /usr & Apple ZFS
sde 8:64 sata 1 none ATA WDC W 5MG1E6 14000519643136
└─sde1
{server name}
8:65 1 none 269dd20f-7756-44a0-9a76-42e029d1a91a 4096 14000517529088 So laris /usr & Apple ZFS
sdf 8:80 sata 1 none ATA WDC W 5MG1EN 14000519643136
└─sdf1
{server name}
8:81 1 none 7f7a12d1-a01a-4283-aac7-8a6282886a93 2048 14000518577664 So laris /usr & Apple ZFS
sdg 8:96 sata 1 none ATA WDC W 6AGB7R 14000519643136
└─sdg1
{server name}
8:97 1 none 89281c87-59cd-4165-bb86-ce8a6664551a 4096 14000517529088 So
here is the results form the second command line.
pool: {ServerName}
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using ‘zpool replace’.
see:
scan: scrub repaired 0B in 06:51:25 with 0 errors on Sun May 4 06:51:32 2025
config:
NAME STATE READ WRITE CKSUM SLOW size vendor model serial hours_on pwr_cyc health realloc cmd_to temp off_ucor ata_err rep_ucor pend_sec
{ServerName} DEGRADED 0 0 0 -
raidz1-0 DEGRADED 0 0 0 -
sdg1 ONLINE 0 0 0 0 12.7T ATA WDC WD142KFGX-68AFPN0 6A GB7R7U 10193 27 PASSED 0 - 46 0 - - 0 (trim unsupported)
7831626758070777019 UNAVAIL 0 0 0 0 was /dev/disk/by-partuuid/91c95d3b-6ba2-43fc-a56a-5d5 2aa418854 - - - - - - - - - - - - - - (trim unsupported)
sdb1 ONLINE 0 0 0 0 12.7T ATA WDC WD142KFGX-68AFPN0 5M G1GBVK 8635 24 PASSED 0 - 43 0 - - 0 (trim unsupported)
sdd1 ONLINE 0 0 0 0 12.7T ATA WDC WD142KFGX-68AFPN0 6A GB365U 10314 27 PASSED 0 - 46 0 - - 0 (trim unsupported)
sde1 ONLINE 0 0 0 0 12.7T ATA WDC WD142KFGX-68AFPN0 5M G1E6YK 8611 23 PASSED 0 - 47 0 - - 0 (trim unsupported)
dedup
sdf1 ONLINE 0 0 0 0 12.7T ATA WDC WD142KFGX-68AFPN0 5M G1EN9K 8168 22 PASSED 0 - 43 0 36 - 0 (trim unsupported)
cache
sda1 ONLINE 0 0 0 0 119.2G ATA SAMSUNG MZNLN128HCGR-000H1 S206NXAG A06786 11582 3606 PASSED 0 16 36 0 - - - (untrimmed)
errors: No known data errors
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:01:01 with 0 errors on Sat Jul 26 03:46:02 2025
config:
NAME STATE READ WRITE CKSUM SLOW size vendor model serial hours _on pwr_cyc health realloc cmd_to temp off_ucor ata_err rep_ucor pend_sec
boot-pool ONLINE 0 0 0 -
sdc3 ONLINE 0 0 0 0 238.5G ATA Samsung SSD 850 PRO 256GB S39KNWAJ303212A 25 536 3420 PASSED 0 - 29 - - - - (untrimmed)
errors: No known data errors
to add, I do not have a spare port unfortunatly, I can only take out and replace as it stands at the moment.
hey Protopia, To be honest, I had no idea what the dedup drive was, I don;t remember ever putting it into this setup. Should I replace that drive?
I also wanted to add. the system has 64gb of ram and 128gb cache drive. 
I have rebooted the system a couple of times since I saw the error, and the system has actually been in a degraded state for some months now. I will avoid rebooting though from this point as you have instructed.
Don’t touch the dedupe drive. It is critical to the pool once there. Is the dedup and L2ARC cache drive the same size? The L2ARC can be removed from a pool without damage, the dedupe would destroy the pool.
You need to copy and paste any CLI commands and results back to the forum using Preformatted Text </> on the toolbar or Ctrl-+e. I think you are just copy/pasting everything into the reply box and it is autoformatting. It’s hard to read. Take a look at the results for the second command and the box of text describing the pool setup. As you scroll the window to the right, it looks like a huge blank space is inserted before you see the results on the far right of the box.
AFAIK you cannot remove a dedup vDev - but as I understand it if you haven’t got a use case then it is likely to have a significant detrimental effect on the performance of your NAS.
So as a long term strategy you should aim to move your data off, destroy and recreate your pool and move your data back again. In the mean time, if this fails you will lose the entire pool’s worth of data.
You would also be advised to create the new pool as RAIDZ2 because a 5x 14TB RAIDZ1 is recommended to be RAIDZ2 because of the risk of a resilver (which you now need to do) causing a 2nd drive to fail causing you to lose all your data.
I see that your current drives have between 8,600 and 10,200 power-on hours = so they are only between just under 1 year and 1.25 years old and thus should still be under warranty.
It looks like the failing drive can no longer be seen by Linux as those listed by lsblk are all accounted for:
- sda - cache
- sdb - data
- sdc - boot
- sdd - data
- sde - data
- sdf - dedup
- sdg - data
I have three possible theories that could have happened to this drive:
- It has failed completely - you will need a physical replacement and will need to resilver - and you can likely return the broken drive under warranty;
- It has lost connectivity with the MB (e.g. sata or power cable came off) - but you can bring this back to life by reconnecting it and then resilver;
- The dedup drive was until recently the missing data drive and somehow it has been reassigned as a dedup drive. You cannot remove this and you don’t have any spare slots to resilver so backup, recreate and restore is your only solution.
What you do next will depend on which of these is the case?
How many 14TB drives do you physically have in your system? And do you have a record of the serial numbers for them (without having to physically withdraw the drives to look)?
2 Likes
Hello Protopia, and Small Barky,
Apologize for not understanding the correct formatting for the results, I’m a little new to the forum, I will improve on that moving forward.
I think I will move the data off and rebuild the Nas. it is still relatively new, and can hopefully make all data safe within a few days. I will then rebuild the nas based on RAIDZ2 for the future. I might also order some new connectors just to be safe.
Thank you so much for your answers, I really appreciate it.
Copying your data elsewhere is a good safety action anyway, however I would advise that you work out what happened and why before you destroy the existing pool and recreate it.
How many 14TB drives do you physically have in your system? And did you take any actions before the pool got into this state that might explain it?