I am having trouble figuring out an issue with my TrueNAS Scale system and wonder if anyone else has seen this or could help me figure it out. Anytime I put a load on a pool connected to my 9400-16i card I get a bunch of errors like this:
Jul 29 20:43:22 truenas kernel: sd 0:0:0:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:3:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:1:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:4:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:5:0: Power-on or device reset occurred
Jul 29 20:43:22 truenas kernel: sd 0:0:2:0: Power-on or device reset occurred
Both the boot-pool (mirror of samsung 870âs) and a pool of 4 Micron 5100 Max 960GB drives (mirror pairs) are connected to the card. It appears that all the drives drop out and then come back. I saw that there were problems like this on the 9300-16i that required a firmware update but could find nothing about 9400âs having this problem. I have changed out the controller and both had the same problem. Did I get really unlucky and get a second bad controller? Is there anything I can do to get more details of what is happening? There is no critical data on any of these drives. I have the configuration backed up and the Micron drives are just used for temporary storage.
For reference, here are the details of my machine:
TrueNAS Community Edition 25.04.1
AMD Ryzen 9 5900X
128GB Non-ECC RAM
AsrockRack X470D4U
9400-16i HBA
Intel X550-T2
Intel A310 GPU
Motherboard NVME:
2x 480GB Seagate Nytro XP480LE30002 in mirror
Motherboard sata:
5x 10TB Seagate Exos sata in raidz2 pool
9400-16i:
2x 500GB Samsung 870 Evo sata ssds for boot-pool
4x 960GB Micron 5100 Max sata ssds in pool of mirrored pairs (technically HPE branded versions)
Have you checked your firmware version info?
What power supply do you have? Is it enough for all your hardware?
Is your cooling across the card good. LSI / Broadcom list about 200 linear feet per minute air flow for their models before. I didnât check the 9400 docs, though.
Since youâre on a 9400 series, I think this is the command to run in the CLI to check
sudo storcli show all
Previous series used sudo sas2flash -list
or sudo sas3flash -list
I updated the firmware to the latest version available from broadcom. Plenty of power (650W).
The hba is running about 39-40c consistently. There is a fan directly mounted over it for cooling.
Can you switch around some power cables? Like mixing ssd and hdd on each power cable.
The power supply can have enough total power, but that does not mean that every power rail has enough.
And another suggestion i received (I admit from chatgptđ), was to disable some power savings in the bios as the LSI could have issues with it.
I have different hardware but also seen this kind of error. Recently I decided to take the whole machine apart and replace all the paste (mobo, hba, network card) I could find as it was used for over 10 years. And also replaced the power supply a few months back, which did help (was also 10 years old).
Thanks for the help. I did verify it wasnât power. The supply is capable of the full 650w on the 1 12v rail and it can do 130w on the 5v rail. Different power cable arrangements made no difference.
All the power saving settings in BIOS were disabled which makes sense since this is a server board.
Sounds similar LSI 9300-xx Firmware Update | TrueNAS Community
If you are using an LSI 9300 HBA with FreeNAS or the soon-to-be TrueNAS CORE, you may experience some performance issues causing the controller to reset when using SATA HDDs.
Appreciate youâre using a 9400 butâŚ
What firmware are you currently on?
Another one to read Anyone using LSI SAS 9400-16i | TrueNAS Community
Looks like Broadcom provides Mixed Tri-Mode and SAS-SATA only firmware. Perhaps try the SAS-SATA only if you havenât already.
I did see a bunch about the 9300 including that exact link.
Currently the latest firmware from broadcom for the 9400. I have not tried swapping from the Tri-Mode.
Unfortunately no luck with the other firmware option. Still get the same behavior.
Good news. I think I was able to solve it by adding the kernel option to disable pcie_aspm. I found details on this here:
One question I have is whether this is a permanent change or something that will go away after an update?
1 Like
How did you make the change? Did you add them in System, Advanced Settings in the Init or Sysctl sections?
I made the change with this command via the shell:
midclt call system.advanced.update '{ "kernel_extra_options": "pcie_aspm=off" }'
I believe it should persist since that is just the truenas API but I am not completely sure.
I think it needs to be added to the GUI so it persists on reboot and upgrades?
cat /proc/cmdline
does show âpcie_aspm=offâ after a reboot so it does persist in that case. Still not sure about upgrade. I suppose I can find out. 25.04.2 is available today so I can test this.
It does appear to be a persistent change at least for this update. cat /proc/cmdline
still shows the right stuff.