Upgraded from Cobia > Dragonfish > Electric Eel, now my fans are running 100% - HPE DL380 Gen 9

I have a DL380 G9 that I’ve been running with TrueNAS Scale since late 2021. I’d persisted on Cobia for longer than I’d have liked but I was taking too long to migrate my TrueCharts apps to community apps.

I finally bit the bullet and started to migrate the last 10 apps, starting with Plex. Immediately after moving that to the community version, things got weird, namely with communication to other network resources on the NAS such as Pi-Hole, and whatnot. I noticed this happened when I migrated the next app, so I figured I’d do them all and go to EE.

After doing the 2 step update, on first boot into EE, none of my apps were there, and the migration script didn’t appear to be running. I found and ran the command to kick it off, and it threw an error. After checking network, I noticed it didn’t like something with my bridge. So I rebooted. On this reboot the migration kicked off and I let it run. After it finished I saw there were a few apps missing, so I ran the migration script again, with a successful completion.

Here’s where it started getting (more) weird. As soon as I started my apps, I heard the chassis fans go to full speed and they didn’t stop. Checked the iLO and sure enough every fan shows 100%. Decided to reboot. Fans went to normal as things spun down (~40%) but as soon as the device got to where the apps appeared to be deploying/starting, they went back to 100%. Temps are very close to what they were under Cobia, so that probably isn’t the issue. CPU utilization is between 5%-8%, so not being taxed. I still have 1 thread that is always showing pegged at 100%, but this has been the case since a version or 2 prior to Cobia. Not sure what’s using that thread, but from looking at Top, qemu-system-x86 stays at the top of the list. Before the upgrade, it was something related to k8s. Was hoping that would go away with EE.

I’ve also noticed that maybe 1 out of every 3 reboots, the apps service doesn’t start, and the network is weird, rebooting again fixes it. This seems like DNS so I set my primary DNS to the Pi-Hole running on Prox Mox, will see if that fixes that part at least.

This is installed on an HP DL380 Gen9, with the most recent iLO firmware (2.82) and System ROM (p89v3.30) that is available.

  • 2x Intel(R) Xeon(R) CPU E5-2660 v4 (28 total cores, 56 threads)
  • 2x LSI 6Gbs HBAs, (One internal and the other for external) in IT Mode
  • 2x EMC KTN-STL3 15-Bay 3.5"disk shelves (about 40% populated)
  • 384GB RAM
  • OS installed on 2x 250GB SSDs mounted in the front bays.
  • Tesla P4 GPU

Update
As I was putting this post together over 20-30 minutes, the fans magically went to somewhat normal. Dropped to ~50% speed. I didn’t change anything. So basically it looks like this period of maxed fans lasted about 13 hours. No idea what changed, but I’m posting this anyway in case it comes back, or if someone has suggestions of what to check. Since I’ve already typed all this out and with my luck, as soon as I delete it all, the fans will ramp up again…

Thanks for reading!

1 Like

2nd update:
Needed a reboot last night and the fans have been running 100% for the past 16 hours, starting right as the apps began deploying. Hopefully it’ll quiet down again by the morning. So something in my DL380 G9 setup doesn’t like EE.

Also fighting the app service erroring out 3/4 of the time due to not being able to determine the default interface. I’ve been running with only a bridge connection with both of my 10g links as the only connection to the network since 22.x or earlier. Seems this is flaking on bootup for me on a regular basis. Last night it took 7-8 reboots and removing one of the 10g links from the bridge to get an operative network stack. And I’m afraid to reboot again…
I’ll check the issues list/Jira to see if there is anything related on there.

Thanks for reading.

I think we can mark this solved… Probably.

I’ve now migrated to newer Dell hardware, and not having the issue with the R740XD variant. But I did notice right after the new server arrived that I could quiet the fans down on the DL380 after a reboot by rebooting the ILO4. So something in 24.10.0 & 24.10.1 caused the ILO to spaz out every time the Docker engine started up.