Help with Getting a drive to continue being setup as parity for raid 1

Hi,

I am using HexOS but it is truenas scale 25.10.1 - Goldeye.

I recently had a HDD fail and I replaced it with a new one. Then I started having issues with this new WD red plus drive after 3 weeks. I tried another drive and that was having the same problem instantly (clicking and checksum errors etc)
I eventually replaced the Sata cable which fixed the issue but I’m now unable to get the orignal replacement drive (the WD red plus) to be become apart of the raid 1.
When I look in the VDEVs I can see the Mirror VDEV with the drive that has worked the whole time and a Replacing VDEV with an unknown drive thing and the WD red plus drive there. When I first put it in it was resilvering and I’m not sure if it completed that or not but its kinda just stuck there and I would absolutely prefer having some drive protection incase the other one also fails :slight_smile:

Could someone please help me figure how to fix this? I really don’t wanna have to restart from scratch!

First off, we like to see the output of the following command’s output to clearly show how the pool is laid out and it’s status, (in CODE tags please):

zpool status

Next, we need to see your disks, (also in CODE tags):

lsblk -o NAME,MODEL,SIZE,LABEL,TYPE

Last, few of us here use HexOS, so any recommendations will either be through the TrueNAS GUI, (which is supposed to be available to HexOS users). Or via the Linux command line shell, (not the TrueNAS CLI).

1 Like

CODE tags means use Preformatted Text mode. (</>) or Ctrl+e on the reply toolbar. It makes items easiser to read. Arwen used it for the zpool and lsblk commands above.

2 Likes

zpool status: (I started a srub to see if that would kick off anything)

  pool: HDDs
 state: DEGRADED
  scan: scrub in progress since Wed Jan 21 13:14:18 2026
        1.07T / 2.61T scanned at 338M/s, 188G / 2.61T issued at 57.7M/s
        896K repaired, 7.03% done, 12:14:23 to go
config:

        NAME                                        STATE     READ WRITE CKSUM
        HDDs                                        DEGRADED     0     0     0
          mirror-0                                  DEGRADED     0     0     0
            238dfdfa-0a7c-40d9-90ca-3569714c58b8    ONLINE       0     0     0
            replacing-1                             DEGRADED    45     0     0
              8695138f-e042-4dab-bc81-62f97b6dbbb6  ONLINE       0     0    48  (repairing)
              2435261468018471615                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/a88dafb9-ab3c-4f60-92a8-7a28734131d9

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Jan 17 03:47:54 2026
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdc3      ONLINE       0     0     0

errors: No known data errors

Disks:

NAME   MODEL                   SIZE LABEL     TYPE
sda    WDC WD40EFPX-68C6CN0    3.6T           disk
└─sda1                         3.6T HDDs      part
sdb    ST4000DM004-2U9104      3.6T           disk
└─sdb1                         3.6T HDDs      part
sdc    KINGSTON SA400S37240G 223.6G           disk
├─sdc1                           1M           part
├─sdc2                         512M EFI       part
└─sdc3                       223.1G boot-pool part
sdd    SD/MMC                    0B           disk
sde    Compact Flash             0B           disk
sdf    SM/xD-Picture             0B           disk
sdg    MS/MS-Pro                 0B           disk
zd0                            100G           disk

Thanks for the help, I’ve gotten familiar with truenas scale’s gui (barely touched the cli though)

In general, you don’t want to run a scrub at the same time the pool is replacing a disk. In the case of a single, simple Mirror vDev, a scrub while replacing a failed disk, is somewhat useless / redundant.

But, since it is running, let it continue and check back in a few hours with another zpool status.

Now as for the status. Your server only shows 2 x 3.6TB, (aka 4TB), disks. So that may imply you removed the failing / failed disk.

Based on this output, your pool is fine at the moment. The first Mirror disk has no errors at the ZFS level.


PS: There is a difference between normal Linux shell command line, and the TrueNAS API command line, (sometimes referred to as CLI). The TrueNAS API / CLI manages TrueNAS, and the Linux shell can manage / view everything else.

You do not want to make any changes via Linux SHELL, that affect something that TrueNAS normally does. That can confuse the TrueNAS API… However, the commands that I’ve listed are “view” only type commands.

1 Like

Ahh perfect!

Luckily I’m using the shell command line.

For the resilvering, last I saw (last night) was that it was scanning and when I woke up i couldn’t see it running. Could it just be that it was visible in the GUI or could something have gone wrong with the resilvering process?

I’ll check on the Srub later today.

  pool: HDDs
 state: DEGRADED
  scan: scrub repaired 5.37M in 17:33:04 with 0 errors on Thu Jan 22 06:47:22 2026
config:

        NAME                                        STATE     READ WRITE CKSUM
        HDDs                                        DEGRADED     0     0     0
          mirror-0                                  DEGRADED     0     0     0
            238dfdfa-0a7c-40d9-90ca-3569714c58b8    ONLINE       0     0     0
            replacing-1                             DEGRADED    45     0     0
              8695138f-e042-4dab-bc81-62f97b6dbbb6  ONLINE       0     0    86
              2435261468018471615                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/a88dafb9-ab3c-4f60-92a8-7a28734131d9

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Jan 17 03:47:54 2026
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdc3      ONLINE       0     0     0

errors: No known data errors

I just noticed that the checksum errors increased by 40 :confused:
I haven’t heard any problematic sounds anymore from the HDD. Could there still be a problem with the drive or is it fine-ish?

Also could that be why its not resilvering at the moment?

Some additional hardware details may be of use; how is this hard drive connect, motherboard make/model, etc.

If you heard weird noises it could be that a head or platter was already damaged. We got rust spinning several thousand times per second while metals needles read/write 0s & 1s without touching the magic spinning rust. Inane that it works even in the best of conditions…

The likely reason the resilver has stopped is that the new, replacement drive developed too many errors. People here tend to attribute “CHSUM” errors to cables or the disk controller used by the drive. Proof of that stall is likely in logs somewhere, (and I don’t remember where it is).

If you have good backups, I would use something like this to restart the resilver:

  1. Offline the replacement drive
  2. Shutdown and replace the data cable to the replacement drive
  3. Power up
  4. Using zpool status make a note of the errors
  5. Clear the pool errors
  6. Online the replacement drive, this should re-start the resilver
  7. Monitor the pool for new errors

Now I can’t be certain this is the correct method to “fix” your problem. Lots of variables here including your disk controller, power supply, HexOS and potential lack of skill to implement the above procedure. (Lack of skill in this context means if you get an warning or error, is knowing when to stop and figure out why. Not continue and potentially make things worse.)

1 Like

It was more of a quick spin up then a sudden stop sound. And sometimes some “clicking” noises. Though they stopped with the new Sata cable.

The motherboard is something from a HP prebuilt from 2011-2013 with an i7-870e

How would I clear the pool Errors?

try sudo zpool clear Linux has you using ‘sudo’ before a lot of commands

2 Likes

When trying to run the zpool clear command I get the following error:
cannot clear errors for HDDs: permission denied
Should I run as root or not risk it?

Just clears errors - I’d be very impressed if you managed to sudo yourself into trouble with zpool clear

Alternatively, a reboot should also clear the errors if you’d prefer.

We have to be careful. HexOS users may have a special skill set. You know those users IT department talk about that can crash anything. The PEBKAC. (Problem Exists Between Keyboard and Chair)

@Stressed_Out09 You are doing very well with running things and replying with info.

I work in I.T. XD

Just new to selfhosting. I’m just worried about an 1D10T issue being caused by the chair warmer himself lol.

Give me some enterprise servers and ill figure it out. (Also have plenty of backups and test deployments to muck around) tho as you can probably see with my homelab server its not very expensive…

Thoigh hexos did get me into truenas which im now using so much more than hexos’s dashboard.

1 Like
  pool: HDDs
 state: DEGRADED
  scan: scrub repaired 5.37M in 17:33:04 with 0 errors on Thu Jan 22 06:47:22 2026
config:

        NAME                                        STATE     READ WRITE CKSUM
        HDDs                                        DEGRADED     0     0     0
          mirror-0                                  DEGRADED     0     0     0
            238dfdfa-0a7c-40d9-90ca-3569714c58b8    ONLINE       0     0     0
            replacing-1                             UNAVAIL      0     0     0  insufficient replicas
              8695138f-e042-4dab-bc81-62f97b6dbbb6  OFFLINE      0     0     0
              2435261468018471615                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/a88dafb9-ab3c-4f60-92a8-7a28734131d9

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Jan 17 03:47:54 2026
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sda3      ONLINE       0     0     0

errors: No known data errors

So far clean, about to online the drive. Wish me luck!

Resilvering: 0.00%

2 days 12 hours 50 minutes 28 seconds remaining

Cooked :sweat_smile:

looks Liked the resilver finished nearly instantly… Any ideas?

  pool: HDDs
 state: DEGRADED
  scan: resilvered 233M in 00:00:28 with 0 errors on Thu Jan 22 22:26:24 2026
config:

        NAME                                        STATE     READ WRITE CKSUM
        HDDs                                        DEGRADED     0     0     0
          mirror-0                                  DEGRADED     0     0     0
            238dfdfa-0a7c-40d9-90ca-3569714c58b8    ONLINE       0     0     0
            replacing-1                             DEGRADED     0     0     0
              8695138f-e042-4dab-bc81-62f97b6dbbb6  ONLINE       0     0     0
              2435261468018471615                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/a88dafb9-ab3c-4f60-92a8-7a28734131d9

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:02:52 with 0 errors on Sat Jan 17 03:47:54 2026
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sda3      ONLINE       0     0     0

errors: No known data errors