TrueNAS 13 on SuperMicro SuperServer 1029P-N32R

I have a TrueNAS 13 server that has been running for 550 days. We had nvd12 drive fail with in TrueNAS and we purchased a replacement. When we swapped the drive the process of ejecting and inserting via IPMI did not work as expected. The insert option under IPMI failed several times but the drive was presented to TrueNAS after a time so I chose the replace option for the named nvd12 drive. After the drive was imported and the resilvering process was completed the removed drive was still showing as removed and nvd20 was replaced with nvd12. When looking at diskinfo for /dev/nvd12 it now points to nvme20. But when pulling nvme20 or the drive in slot 20 of the chassis nvme12 is removed now.

My question is, short of a reboot, is there a way to rescan the chassis to force the bios to present the drive to the OS?

As I was reading your posting my mind kept saying, reboot.

1 Like

But core doesn’t support hot plug nvme does it?

Thought that was a Linux/scale feature.

1 Like

Good point. But there are people who would do it. I ALWAYS power off before replacing anything.

2 Likes

Never hot plug pci slots or m.2!

Shouldn’t hot plug sata either, unless you have a hot plug backplane (look for the capacitors)

The drive that was hot plugged was a drive caddy which is built to be hot plugged. This server is a SuperMicro with hot pluggable drive bays. This is the first TrueNAS Core system that I have had an issue with hot plugging the drive.

As it was in active use the server could not be rebooted, the storage build was not my design, as it is a single server storage array. I have since rebooted the server and both nvd12 and nvd21 now show as online. Unfortunately, I have now lost nvd31 during the reboot. The disk shows in bios, IPMI, and during the boot process. I am in talks with the application owner to migrate this array to a TrueNAS Scale instance.

Turns out to be faulty hardware. The drive bay is not allowing the OS to fully control or access the drive. Beyond the very strange issue of ejecting the bad drive via IPMI takes a random drive offline with in TrueNAS, ie eject nvme12 which is not showing with in the OS actually ejects nvme20 which is linked to nvd19 with in TrueNAS, everything TrueNAS wise has been working flawlessly. The random eject is also probably a hardware issue.

Time for a new server.

Thanks for your thoughts,
Jared

P.S. Always read the subject and description before assuming people are stupid and pulling cables directly out of running hardware. TrueNAS is more than just consumer level software.