Degraded pool - Support Needed

I am running truenas on proxmox with 48GB DDR5 RAM with 3950x with 24 CPUs assigned. I amrunning 25.04.0

I have Raidz2 with 5x18TB drives, I had one drive which developed bad sectors. I replaced it and resilvered with no issues. Next, I added another drive with the intention to expand the vdev turning it into 6x18TB pool. This process started but at some point truenas crashed and got on boot. I removed the drive and booted and it showed a degraded pool.

I ran smart checks both long and short both passed for the disk. I manually removed and added the disk again, it restarted the resilver and when that finished it crashed again during expand and stuck on boot again. I have tried it multiple times with the same result.

I did a scrub all good on the data front, but the pool is degraded. I am unable to cancel expand or able to re-run successfully.

Here is the output from zpool status
pool: Storage
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using ‘zpool replace’.
see: msg/ZFS-8000-4J
scan: resilvered 230G in 00:57:04 with 0 errors on Sat Apr 26 18:07:16 2025
expand: expansion of raidz2-0 in progress since Thu Apr 17 18:51:53 2025
29.0T / 58.1T copied at 38.0M/s, 49.88% done, paused for resilver or clear
config:

    NAME                                      STATE     READ WRITE CKSUM
    Storage                                   DEGRADED     0     0     0
      raidz2-0                                DEGRADED     0     0     0
        f88a42c3-7304-496d-ba86-0db7af32d212  ONLINE       0     0     0
        f95bcba6-ebcb-484c-8680-bed661af0b89  ONLINE       0     0     0
        13e1c816-a9a9-4035-9a81-eaf115f6d127  ONLINE       0     0     0
        e9a031aa-e5ca-4e56-8660-e93f3bb7b5c1  ONLINE       0     0     0
        3f4c9848-d7b6-4ce5-85af-f4217d979504  ONLINE       0     0     0
        7580203092224097384                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/0c754d23-17af-483c-8928-f67f70955149

errors: No known data errors

Would it not be best to reinstall the new drive and try to complete the expansion?

I already took out the drive, formatted did a resilver after which the same result when the expansion happens. I haven’t got an additional drive, given the the drive is new and all the tests pass I am assuming drive isn’t the issue.

You made a big mistake in removing the drive.

Expansion continues after a reboot. You should have left the drive in.

So just put the drive back in, do a resilver for the missing drive and let expansion continue once the resilver has finished.

And next time, ask for advice before you do the wrong thing.

2 Likes

Thanks for the suggestion, as I mentioned above truenas won’t boot with the drive inside. I tried multiple reboots etc but it would freeze. I have added the drive done the resilver and when expansion starts same thing stuck on reboot.

What is your HBA or drive controller?
Are you passing through the drives or the controller to your VM?

I am passing the HBA entirely to the VM.

HBA is 9300-8i LSI SAS in IT Mode

1 Like

When ever expansion is in process the truenas reboots and get stuck, here is a screenshot. If I remove the drive it boots. If I format the drive, resilver it works but when expansion starts same thing.

It remains stuck at 34 secs and doesn’t move

Did you also blacklist the HBA PCIe card in Proxmox? This is recommended in order to prevent Proxmox from using it before it starts the VMs.

New drives can fail, this is sometimes called a bathtub curve, example graphs.
How* did you test/verify it?

1 Like

I didn’t black list it, I will have a look, but with 7 drives installed after every reboot it work fine no issues.

I tested the drive by running the short and long smart ctl on the drive, also added it to a synology and that worked as well

I have now reproduce the problem, it happens when I import Storage pool

I booted truenas with pool loaded and after that tried this to confirm

zpool import -f Storage

After this it reboots and gets stuck, if I import ready only it works or if a remove the new drive and import it works but degraded.

I ran the smart long and short on truenas and synology the drive says ok, what else can I test to confirm a drive issue.

As when I add the as new, resilver works without issue this happens only when expansion resumes.