"Power-on or device reset" errors under load 9400-16i

I am having trouble figuring out an issue with my TrueNAS Scale system and wonder if anyone else has seen this or could help me figure it out. Anytime I put a load on a pool connected to my 9400-16i card I get a bunch of errors like this:

Jul 29 20:43:22 truenas kernel: sd 0:0:0:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:3:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:1:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:4:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:5:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:2:0: Power-on or device reset occurred

Both the boot-pool (mirror of samsung 870’s) and a pool of 4 Micron 5100 Max 960GB drives (mirror pairs) are connected to the card. It appears that all the drives drop out and then come back. I saw that there were problems like this on the 9300-16i that required a firmware update but could find nothing about 9400’s having this problem. I have changed out the controller and both had the same problem. Did I get really unlucky and get a second bad controller? Is there anything I can do to get more details of what is happening? There is no critical data on any of these drives. I have the configuration backed up and the Micron drives are just used for temporary storage.

For reference, here are the details of my machine:
TrueNAS Community Edition 25.04.1
AMD Ryzen 9 5900X
128GB Non-ECC RAM
AsrockRack X470D4U
9400-16i HBA
Intel X550-T2
Intel A310 GPU

Motherboard NVME:
2x 480GB Seagate Nytro XP480LE30002 in mirror

Motherboard sata:
5x 10TB Seagate Exos sata in raidz2 pool

9400-16i:
2x 500GB Samsung 870 Evo sata ssds for boot-pool
4x 960GB Micron 5100 Max sata ssds in pool of mirrored pairs (technically HPE branded versions)

Have you checked your firmware version info?

What power supply do you have? Is it enough for all your hardware?

Is your cooling across the card good. LSI / Broadcom list about 200 linear feet per minute air flow for their models before. I didn’t check the 9400 docs, though.

Since you’re on a 9400 series, I think this is the command to run in the CLI to check

sudo storcli show all

Previous series used sudo sas2flash -list or sudo sas3flash -list

I updated the firmware to the latest version available from broadcom. Plenty of power (650W).

The hba is running about 39-40c consistently. There is a fan directly mounted over it for cooling.

Can you switch around some power cables? Like mixing ssd and hdd on each power cable.
The power supply can have enough total power, but that does not mean that every power rail has enough.

And another suggestion i received (I admit from chatgpt😏), was to disable some power savings in the bios as the LSI could have issues with it.

I have different hardware but also seen this kind of error. Recently I decided to take the whole machine apart and replace all the paste (mobo, hba, network card) I could find as it was used for over 10 years. And also replaced the power supply a few months back, which did help (was also 10 years old).