I’ve got an issue with a known good TrueNAS system that I’ve done a bit of an upgrade to that is doing my head in. The system gets to the login prompt and then the whole system reboots but only when the SATA SSDs are plugged in. If I boot without the SSDs, everything is fine. If I boot without the SSDs and then I plug them in then everything is fine. If I pull all the drives and put them into another machine, everything is fine. I have no idea what could possibly cause this kind of problem. But it’s probably hardware related (and possibly the HBA), I’ve tried almost every possible combination at this point but the only thing which consistently reproduces the issue is plugging in the SSDs.
Hardware list:
AMD Ryzen 5800G
Asrock B550 Pro4
128GB of DDR4 RAM at 3600MHz overclock (known good ram, known stable configuration and tested to death)
1000W Superflower PSU
There is a lot plugged into the PCI bus which is one thing I am suspicious of (that it might be one of the rails of the PSU going over capacity). PCI config:
1st slot (16x):
8x8x bifurcation with Lenovo 430-16i flashed with Lenovo’s 24.00.07.00 firmware and an Intel X710 dual 10GbE SFP+ NIC
3rd slot (4x):
NVME adaptor
5th slot (1x):
The venerable 1080Ti
Storage:
4x Seagate Exos 2x14 Mach.2 (ST14000NM0001)
4x Dell 400GB SSDs (LB406M these don’t cause any issues and do not affect boot)
1x Patriot M.2 P300 128GB NVMe (the OS drive)
1x KINGSTON OM8PGP4512Q-A0 NVMe
1x CT500P2SSD8 NVMe
And the rest (this is the pile that causes issues)
3x Silicon Power Ace A55 1TB SATA SSD
1x CT1000MX500SSD1
1x Ediloca ES106 1TB
1x Samsung SSD 860 EVO 1TB
There are two pools, one with the HDDs in two Z1 VDEVs and the two 512gb NVMe drives and a second pool of mirrored pairs of cheap SSDs of the same size.
Any ideas why it would fail at this point and why there is nothing in the logs? And why it succeeds if I just wait a minute and plug in the drives after.