It has gone all wrong - trying to replace a disk in my volume

bchmura · July 26, 2025, 5:13pm

I had started receiving SMART warnings on one of my disks, so today I figured I would take care of that. I also figured I would do an upgrade before I tackled that in case any odd issues were present. I went from 13 to the latest 13… I am on 13.0-U6.8 now. Upon the restart my BIOS halted on an F1 due to a bad disk. I hit F1 and got past that. TrueNas shows all the disks online, but the errors for ATA2 still persisted. I have a spare disk to replace any failed ones.

So like other times, I popped in to the disks to take the bad one offline. I hit offline… and it spins forever, well at least an hour so far.

the log is showing nothing about this it seems except for errors from that device

Jul 26 13:02:55 freenas smartd[1553]: Device: /dev/ada2, FAILED SMART self-check. BACK UP DATA NOW!
Jul 26 13:02:55 freenas smartd[1553]: Device: /dev/ada2, Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.

If this disk is so bad I can’t get to an offline failed, can I power down, replace it and bring the whole volume back up?

I am not sure at all what to do here…

As an aside, in case it makes a difference, after the upgrade I could not start my virtual machines either.

[EFAULT] VM will not start as DISK Device: /dev/zvol/Main_Volume/VMStorage/AlpineDock-g6xlgd device(s) are not available.

Any advice, or at least a “its going to be alright” would be great!

eqartimus · July 26, 2025, 5:24pm

I think that without knowing a little more about your system – drive configurations and such, it’s hard to weigh-in on what is likely happening or how to diagnose it.
I suggest that you add a detailed description of your configuration if you want people to dig in and help – disks, memory. What is your raid config? How much redundancy?

SmallBarky · July 26, 2025, 5:30pm

Expand my Details under my post to get an idea of what hardware, pool layout details we’re asking for. We can only go off what you post to diagnose

Browse some other threads and do the Tutorial by the Bot to get your forum trust level up if you need to post images, etc.

TrueNAS-Bot
Type this in a new reply and send to bring up the tutorial, if you haven’t done it already.

@TrueNAS-Bot start tutorial

bchmura · July 26, 2025, 5:44pm

Well, that is part of the issue… once I issued the take disk offline, I don’t have access to any pool information, disk information, etc… nothing.

From memory though, I have 64GB of memory in the server. I have 6 disks, and I recall from designing it long ago I could lose two of them and still run… they are all SATA drives. I am sure they are striped, not mirrored.

This was a long time ago

SmallBarky · July 26, 2025, 6:00pm

If you are sure you had a Raid-Z2 configuration. You should be able to power down and physically replace the disk. Did you note the drive serial number of the bad disk? You need to remove the failing or failed disk and only that one.

bchmura · July 26, 2025, 6:30pm

Thanks. I got ansy and sent a restart. After the long coming back up I have access to setups again. It is definitely RaidZ2 and I do have the serial number already.

I want to get another backup of some of my data before trying this. I did find that my main volume was not mounted to mnt for some reason, and even after i got it mounted, it is not being found by either SMB shares or the virtual machine disk entries. Sigh.

It took a long time, but they did appear after I manually mounted them.

etorix · July 26, 2025, 6:57pm

That’s likely because the pool is not mounted to /mnt.
Would you care to share the outputs, formatted using the </> button of ?
zpool status
camcontrol devlist
smartctl -x /dev/ada# for all the relevant numbers #

Also indicate what’s your motherboard (how many SATA ports?), whether you’re using a HBA (or a “SATA card”), ECC RAM, etc.

bchmura · July 26, 2025, 7:44pm

Hey, thanks for posting back…

I’ve replaced the bad disk and now it is currently re-slivering. I had a problem with the pool not mounting in /mnt, but I had gotten around that. The VM’s were looking in /dev/zvol which did not have the device in there for the pool.

But, now that the new drive is in, the array went from ONLINE to DEGRADED and is incorporating the new disk. It also placed the relevant volumes in the /dev/zvol directory.

I THINK I AM GOOD!

Thank you!

ZPOOL STATUS

errors: No known data errors
root@freenas:~ # zpool status
  pool: Main_Volume
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Jul 26 15:31:07 2025
        1.79T scanned at 6.63G/s, 261M issued at 966K/s, 11.9T total
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                              STATE     READ WRITE CKSUM
        Main_Volume                                       DEGRADED     0     0   0
          raidz2-0                                        DEGRADED     0     0   0
            gptid/b7650a52-483a-11e8-b678-ac1f6b83258e    ONLINE       0     0   0
            gptid/b8121849-483a-11e8-b678-ac1f6b83258e    ONLINE       0     0   0
            replacing-2                                   DEGRADED     0     0   0
              11565328451382339695                        UNAVAIL      0     0   0  was /dev/gptid/b8c51a1d-483a-11e8-b678-ac1f6b83258e
              gptid/de680275-6a56-11f0-9df7-ac1f6b83258e  ONLINE       0     0   0
            gptid/b9749884-483a-11e8-b678-ac1f6b83258e    ONLINE       0     0   0
            gptid/fbeb27ac-5189-11ec-aeba-ac1f6b83258e    ONLINE       0     0   0
            gptid/bad9a8ad-483a-11e8-b678-ac1f6b83258e    ONLINE       0     0   0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:16:42 with 0 errors on Fri Jul 25 04:01:42 2025
config:

        NAME          STATE

CAM CONTROL DEVLIST

            da1p2     ONLINE       0     0     0

errors: No known data errors
root@freenas:~ #
root@freenas:~ # camcontrol devlist
<WDC WD60EFRX-68L0BN1 82.00A82>    at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD60EFRX-68L0BN1 82.00A82>    at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD60EFAX-68JH4N1 83.00A83>    at scbus2 target 0 lun 0 (ada2,pass2)
<WDC WD60EFRX-68L0BN1 82.

Topic		Replies	Views
Hard drive disappeared after replacing Proxmox with Truenas TrueNAS General	19	440	March 20, 2025
Degraded pool in a bit of a weird state TrueNAS General CORE	4	295	July 14, 2024
Pool degraded, device not healthy TrueNAS General	3	96	June 1, 2025
Tuenas core 13.0-U6.3 - replace disk - help please TrueNAS General	16	131	December 10, 2024
Help with Getting a drive to continue being setup as parity for raid 1 TrueNAS General SCALE , Hardware , HexOS	17	97	January 22, 2026

It has gone all wrong - trying to replace a disk in my volume

Related topics