Help with drive assignment

I’m having some difficulties with my server and have therefore rebooted it a few times. The issue is that I added 2 new drives and one is not being recognized. I’m still troubleshooting that but so far it seems 2 drive bays (Dell R540) are not working.

My issue is that it seems every time I reboot, TrueNAS picks a drive and says it needs to be resilvered. I have the 2 new drives still in the system but never added either of them to my pool.

Is this expected behavior or maybe it adds some additional information to my troubleshooting?

Any insight is greatly appreciated!!!

No, it is definitely not normal to resilver a vDev in a pool after reboot.

What is your pool layout? (Output of zpool status please.)
How are the existing drives wired to the server?
Make, model & firmware revision of the SAS controller please.
What are the 2 new drives?
SAS, SATA or NVMe?

Hardware RAID controllers are known to cause problems with ZFS because of re-ordering writes. ZFS writes in a specific order for data integrity.

Arwen, thanks for the reply!

All of the drives in the pool are SAS and are connected to the server drive backplane.

SAS Controller is:
Dell HBA330 Adp 16.17.01.00

The 2 new drives are the same as the existing 7 drives. (Manufacturer, model, capacity, etc.)

Here is the zpool status:

  pool: data_pool
 state: ONLINE
  scan: resilvered 10.3G in 00:08:19 with 0 errors on Mon Oct 20 09:51:12 2025
config:

        NAME                                        STATE     READ WRITE CKSUM
        data_pool                                   ONLINE       0     0     0
          raidz2-0                                  ONLINE       0     0     0
            ae983e3f-e3c8-4978-ba96-c33219e10e09    ONLINE       0     0     0
            cacea778-8e19-4b72-ac69-0e6f2a0fa658    ONLINE       0     0     0
            a9b84d5e-1e6c-433e-9d27-35aeab6f0162    ONLINE       0     0     0
            5f120ead-79d5-4d73-bbf6-dff913978904    ONLINE       0     0     0
            0131b224-6128-43af-9b11-38283772efc9    ONLINE       0     0     0
            spare-5                                 ONLINE       0     0     0
              b528a750-a698-4130-85fe-0acf1cb9b796  ONLINE       0     0     0
              540970a4-23d5-401a-b4ad-8900486d006c  ONLINE       0     0     0
        spares
          540970a4-23d5-401a-b4ad-8900486d006c      INUSE     currently in use

errors: No known data errors

You have a Spare drive in use. Not sure why ZFS would want to re-silver repeatedly.

My recommendation is to either replace the failing disk. In which case your Hot Spare will be restored to available status.

Or activate the Hot Spare as a permanent pool disk by removing the failing disk. Afterwards, you will not have a Hot Spare and can remove the failing disk to test and see if it needs to be considered dead. You can always add another Hot Spare again.

As for the other problems, I am not sure.

Thanks for the advice. The strange thing is that the “failing” drive is one that I newly added to the server. I never added it to a pool. The other strange thing is that if I move that drive to a different bay in the chassis, it works. I verified this by moving around the 2 “new to me” drives.

At one point when checking drives I did accidentally remove one of the active drives that, understandably, started a resilver. I waited for that to completely finish before doing anything else. I then put in another “new to me” drive into the problem bay and that kicked off a resilver. Unfortunmately, I didn’t have a good handle on what drives were where and I’m not 100% certain what drive caused the issue.

I now have my drives labeled by serial number and I’m keeping a visual representation of where the problem drive is every time I reboot (only once so far) to see what happens.

I have a live USB of SystemRescue to be able to do some testing tonight and take TrueNAS out of the picture.

After doing a bunch of research, it seems that there is an obscure firmware bug in the Dell HBA330.

When swapping (or inserting new) drives it can mark the drive in that bay as bad and the flag never clears.

Solutions I have found are:

  1. Get a special tool from Dell (about $1000 for out of warranty)
  2. Replace the HBA
  3. Re-flash with generic LSI firmware

It may be possible to clear the failed drive flag using the boot time menu system of the HBA. It may be worth 30 minutes or so of poking around to see if that is possible. The $1,000 tool from Dell may be an OS “live” version of the same ability.

I’ve had to use such boot time menu for other reasons, like on Cisco server’s SAS controller, (also based on LSI), to set boot drives.

Arwen,

Thank you so much!

It turns out that in my panic when the 2nd drive didn’t work I didn’t keep good track of my testing. I also didn’t believe that I got a second bad drive.

Well, after about 4 hours of very methodical testing and keeping good track of serial numbers, it does seem that I did get 2 bad drives.

I am still investigating if it may be a firmware issue causing different behavior with the power disable pin on the drive and keeping it from spinning up.

Ah, yes, that is something annoying. Lots of people either check with:

  • Special power cords that don’t pass 3.3v through. Like Molex to SATA power which can’t supply 3.3v, (as the Molex 4 pin has 2 ground, 1 x 12v and 1 x 5v lines).
  • Tape up the 3.3v traces on the hard drive

Plus, you can check the make & model’s user manual to see if it does support the power disable pin function.