HP ProLiant MicroServer Gen8 instable on v25, OK on v24 and Debian12: kernel issue?

CDuv · September 14, 2025, 1:59pm

TL;DR: HPE ProLiant MicroServer Gen8 crashes with Unrecoverable System Error (NMI) errors on TrueNAS SCALE v25.04 but does not on v24.10 nor Debian 12. Different kernel = cause?

Hello,

For some time now I’m trying to get TrueNAS SCALE working on a HPE ProLiant MicroServer Gen8 (CPU: E3-1220L V2, RAM: 16GB PC3L 12800E Memtest86+ OK) with extra PCI Express 9211-8i SAS card (to extend the existing storage provided by the integrated HPE Dynamic Smart Array B120i controller)

I get “Unrecoverable System Error (NMI)” and it reboots, the symptoms are:

The server reboots, the hardware “Health LED” blinks red and the iLO’s “Integrated Management Log” (BMC tool) page says:

Class: System Error Description: Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible
Class: OS Description: User Initiated NMI Switch

I’ll detail all my incremental attempts below but I am at point where it looks like it works with v24.10 “Electric Eel” but fails with v25.04 “Fangtooth”.

On my last attempt, I’ve managed to have it running a full week (without any crash/reboot in the end) on TrueNAS-SCALE v24.10.2.4 (Linux truenas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 6 20:07:31 UTC 2025 x86_64 GNU/Linux) (thanks to CertainBumblebee769 on Reddit)
where the previous attempt on TrueNAS-SCALE v25.04.2.3 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 20 13:31:09 UTC 2025 x86_64 GNU/Linux) failed in 4 days resulting in a crash/reboot.

It took me a while because I suspected my PCI Express 9211-8i SAS card to be faulty (I also had to downgrade it’s firmware), or the support of SSDs, so I’ve tested without/without the card, with/without SSDs, vanilla Debian (v6.1.140-1) and TrueNAS v24/v25.

Those 2 tests are without my SAS card, I’m currently testing with the SAS card, then I’ll add storage HDDs.

If it works fine, I’d have built a working v24.10 setup, but I’d like to have a v25.04 .

Let’s say it works, the issue would most likely be related to the kernel? Is it possible to run TrueNAS SCALE v25.04 on v6.6 kernel? On a version comprised between 6.6 and 6.12 (to find the latest working one)?

Thanks

Recap:

TrueNAS SCALE v25.04.2.3 runs v6.12.15 (not working)
TrueNAS SCALE v24.10.2.4 runs v6.6.44 (working)
Debian v12.11 runs v6.1.140-1 (working)

Server firmware/BIOS are up-to-date:

System ROM: J06 04/04/2019
System ROM Date: 04/04/2019
Backup System ROM: J06 11/02/2015
iLO Firmware Version: 2.82 Feb 06 2023
Server Platform Services (SPS) Firmware: 2.2.0.31.2
System Programmable Logic Device: Version 0x06
System ROM Bootblock: 02/04/2012
Embedded Flash/SD-CARD: Controller firmware revision 2.10.00

Here are all my incremental attempts (🔧 highlights the change)

Test #1
- Setup:
  - One 3.5" HDD on B120i
  - No 9211-8i PCIe SAS card
  - Debian 12.11 (kernel v6.1.140-1) installed on B120i’s HDD
- Duration: 3 days
- Verdict: No crash, no reboot, no NMI error
Test #2
- Setup:
  - One 2.5" () HDD on B120i
  - No 9211-8i PCIe SAS card
  - Debian 12.11 (kernel v6.1.140-1) installed on B120i’s HDD
- Duration: 3 days
- Verdict: No crash, no reboot, no NMI error
Test #3
- Setup:
  - One 2.5" HDD on B120i
  - 9211-8i PCIe SAS card inserted ()
  - Debian 12.11 (kernel v6.1.140-1) installed on B120i’s HDD
- Duration: 3 days
- Verdict: No crash, no reboot, no NMI error
Test #4
- Setup:
  - One 2.5" HDD on B120i
  - 9211-8i PCIe SAS card inserted
  - One 3.5" HDD powered and SATA-connected to the PCIe SAS card ()
  - Debian 12.11 (kernel v6.1.140-1) installed on B120i’s HDD
- Duration: 3 days
- Verdict: No crash, no reboot, no NMI error
Test #5
- Setup:
  - One 2.5" SSD () on B120i
  - 9211-8i PCIe SAS card inserted
  - One 3.5" HDD powered and SATA-connected to the PCIe SAS card
  - Debian 12.11 (kernel v6.1.140-1) installed on B120i’s SDD
- Duration: 3 days
- Verdict: No crash, no reboot, no NMI error
Test #6
- Setup:
  - One 2.5" SSD on B120i
  - 9211-8i PCIe SAS card inserted
  - Four () 3.5" HDDs powered and SATA-connected to the PCIe SAS card
  - Debian 12.11 (kernel v6.1.140-1) installed on B120i’s SDD
- Duration: Was OK idle, but failed when started to process data on thoses HDDs (disk I/O)
- Verdict: kernel errors (“kernel: DMAR: ERROR: DMA PTE for vPFN 0xf1f80 already set (to f1f80003 not 120d5c001)”), No reboot
Test #6a
- Setup:
  - One 2.5" SSD on B120i
  - 9211-8i PCIe SAS card inserted
  - Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
  - Debian 12.11 (kernel v6.1.140-1) installed on B120i’s SDD
  - Added intel_iommu=off to GRUB’s GRUB_CMDLINE_LINUX_DEFAULT (source) ()
- Duration: (Sadly, I didn’t write it down)
- Verdict: No crash, no reboot, no NMI error
Test #7
- Setup:
  - Two 2.5" SSD on B120i
  - 9211-8i PCIe SAS card inserted
  - Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
  - TrueNAS SCALE v25.04.2.3 () (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 20 13:31:09 UTC 2025 x86_64 GNU/Linux) installed on SSDs
  - ZFS Data-pool on the 9211-8i HDDs ()
- Duration: 42 hours
- Verdict: NMI errors, Server reboot
Test #8
- Setup:
  - One () SSD on B120i
  - 9211-8i PCIe SAS card inserted
  - Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
  - TrueNAS-SCALE v25.04.2.3 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 20 13:31:09 UTC 2025 x86_64 GNU/Linux) installed on SSD
  - ZFS Data-pool on 4 9211-8i HDDs
  - Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied ()
- Duration: 19 hours
- Verdict: NMI errors, Server reboot
Test #9
- Setup:
  - One SSD on B120i
  - No 9211-8i PCIe SAS card ()
  - Four 3.5" HDDs powered but not SATA-connected ()
  - TrueNAS-SCALE v25.04.2.3 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 20 13:31:09 UTC 2025 x86_64 GNU/Linux) installed on SSD
  - ZFS Data-pool on 4 HDDs, but offline
  - Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied
- Duration: 4 days and 5 hours
- Verdict: NMI errors, Server reboot
Test #10
- Setup:
  - Two SSDs on B120i
  - No 9211-8i PCIe SAS card
  - No HDD
  - TrueNAS-SCALE v24.10.2.4 (Linux truenas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 6 20:07:31 UTC 2025 x86_64 GNU/Linux) installed on SSDs
- Duration: 7 days
- Verdict: No crash, no reboot, no NMI error
Test #11
- Setup:
  - Two SSDs on B120i
  - 9211-8i PCIe SAS card inserted ()
  - No HDD
  - TrueNAS-SCALE v24.10.2.4 (Linux truenas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 6 20:07:31 UTC 2025 x86_64 GNU/Linux) installed on SSDs
  - Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied ()
- Duration: ? (pending)
- Verdict: ? (pending)
Test #12
- Setup:
  - Two SSDs on B120i
  - 9211-8i PCIe SAS card inserted
  - Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card ()
  - TrueNAS-SCALE v24.10.2.4 (Linux truenas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 6 20:07:31 UTC 2025 x86_64 GNU/Linux) installed on SSDs
  - Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied
- Duration: ? (no started)
- Verdict: ? (no started)

prez02 · September 14, 2025, 5:18pm

Hi,

I have a Gen8 Microserver, with the same Celeron, 16Gb ECC memory, I do not remember which brand/type, likely Crucial, or Kingston. I bought it later, so it is not HPE.

The firmware versions you have listet are the newest, afaik.

It boots via an sdcard with grub pointing to an ssd on the ODD-Sata Port. 4 HDDs are connected to the Onboard SATA in AHCI mode, no PCI-E card in use.

I just updated it to the newest version of Fangtooth, 25.04.2.4 just now, from 25.04.1.

It has always been a kind of cold storage backup system for me, so it never has been running for several days on end, so although it is rather old hardware, it has not been running for such a long time.

Not sure, would not a kernel issue prevent it from working at all?

Edit:

Why was that, usually one has to make sure to use the latest versions?

And those type of cards can get quite hot, they do need airflow.

CDuv · September 15, 2025, 8:43am

My 2x 8GB of RAM are SK Hynx one, with ECC.

It looks like we have the same setup except for the boot device where I use the B120i.

Because it was running v7.39.02.00 which is said to be incompatible with Gen8 v7.39.00.00 seems to last good one.

This I don’t know, your assumption looks fairly right but I don’t really know. I guess some might trigger something in some special/rare cases?

prez02 · September 15, 2025, 6:02pm

Ah, strictly speaking, you do not the the BIOS, just the HBA flashed with IT-Mode P.20 firmware. The BIOS is for configuring the disks, but we want HBA to just give truenas full access so that it can do its thing. I think I remember something about some P.20 versions being buggy.

Or just under load, if the drivers are buggy?

My Gen8 has been running for 25 hours now without any errors or reboots

CDuv · October 19, 2025, 11:44pm

Quick update:

My 11th ended without issue:

Test #11

Two SSDs on B120i
9211-8i PCIe SAS card inserted ()
No HDD
TrueNAS-SCALE v24.10.2.4 (Linux truenas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 6 20:07:31 UTC 2025 x86_64 GNU/Linux) installed on SSDs
Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied ()

From 2025-09-11 22:59:11 to 2025-09-22 10:06

Duration: 10 days and 11 min

Verdict: No crash, no reboot, no NMI error

My last test (12th) is still running (for 27 days now):

Test #12

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card ()
TrueNAS-SCALE v24.10.2.4 (Linux truenas 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug 6 20:07:31 UTC 2025 x86_64 GNU/Linux) installed on SSDs
Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied
(both “intel_iommu=on” and “intel_iommu=off” are present, in this order, in /boot/grub/grub.cfg)

From 2025-09-22 12:07:30 to ???

I’ve found those bug report on Debian for Gen8 ProLiant:

It suggests trying adding intel_idle.max_cstate=2 to the kernel command line.

I’ll try upgrading my v24.10 to v25.04 (or v25.10) and test it.

WiteWulf · October 20, 2025, 9:17am

Sorry I didn’t spot your posts on this earlier! This problem’s been around for quite a while now, and is well known.

You can fix it by applying the “intel_iommu=off” flag to the kernel, or turning off a couple of Intel virtualisation features in the BIOSL:

CDuv · December 22, 2025, 4:47pm

My “Test #12” setup (running ElectricEel-24.10.2.4, with intel_iommu=off visible in the /proc/cmdline) had been running for about 89 days, so I’ve updated it to v25.04.2.6 using the WebUI.

Test #13

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs ()
Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied
(both “intel_iommu=on” and “intel_iommu=off” are present, in this order, in /boot/grub/grub.cfg)

So far it’s running fine for 2 days .

I need to double check but I’m pretty sure I did not disabled "Intel Virtualisation Technology” and “Intel VT-d” in the BIOS.

CDuv · December 22, 2025, 5:24pm

There is also another possible solution: setting intel_idle.max_cstate to 2.

Source:

Bug#1111027: linux-image-6.12.38+deb13-amd64: HP Gen8: Crashing with 6.12 from trixie, NMI error in IML logs
Bug#1111027: NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0

CDuv · December 23, 2025, 1:11am

Test #13 (TrueNAS-SCALE v25.04.2.6) crashed after only 2 days and 8 hours.

From 2025-12-20 17:34:37 to 2025-12-23 11:54:40

Duration: 2 days and 8 hours

Verdict: NMI errors, Server reboot

So I’m trying the intel_idle.max_cstate=2 fix:

Test #14

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs
Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off intel_idle.max_cstate=2"}' applied ()
(both “intel_iommu=on” and “intel_iommu=off” are present, in this order, in /boot/grub/grub.cfg)

CDuv · December 31, 2025, 11:18am

No luck, Test #14 (TrueNAS-SCALE v25.04.2.6) rebooted, iLO mentions a “User Initiated NMI Switch“ but no “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible“.

Test #14 (TrueNAS-SCALE v25.04.2.6) crashed after only

From 2025-12-23 02:09:31 to 2025-12-26 06:25:00

Duration: 3 days and 4 hours

Verdict: Server reboot

I am now testing with intel_iommu=off but without the default intel_iommu=on (from /etc/default/grub.d/truenas.cfg) and without intel_idle.max_cstate=2 (might not be the right value anyway as grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name only lists POLL, C1, C1E, C3 and C6 for my Intel Xeon E3-1220L 2V CPU):

Test #15

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs
Fix midclt call system.advanced.update '{"kernel_extra_options": "intel_iommu=off"}' applied ()
(“intel_iommu=on” removed from /boot/grub/grub.cfg using sed -i -E ‘s,(\Wlinux\W/ROOT/.*) intel_iommu=on(.*$),\1\2,’ /boot/grub/grub.cfg)

ClimbingKid · January 15, 2026, 4:45pm

Just wanted to say as a Gen8 owner waiting to move from core to scale, this is awesome work, just wanted to say thanks. Will watch and wait on your final tests before upgrading.

CC

CDuv · January 15, 2026, 10:07pm

I’ll need to post all my newer attempts (intel_iommu=off, blacklist=hpwdt, intel_idle.max_cstate=2 , intel_idle.max_cstate=1, intel_idle.max_cstate=0) but they all failed.
I’ve also tried updating to version 25.10 “Goldeye” and it’s 6.12.33 kernel, it’s worse (server reboots in less than a day).

For now I’m back on v 24.10.2.4 which I’m sure will work fine (I had 89 days of uptime on v 24.10 before starting this “experiment”).

E_B · January 15, 2026, 10:44pm

I’ve been running 25.04.2.6 for about a month (continuous uptime) and prior to that I have run all of the versions of TN Scale dating back to Bluefin (which was the first installation on my gen8).

OS Version:25.04.2.6
Product:ProLiant MicroServer Gen8
Model:Intel(R) Xeon(R) CPU E3-1265L V2 @ 2.50GHz
Memory:16 GB ECC

My signature block contains a link to the set of instructions which I followed to first install TrueNAS on my box (please ignore the description of the current version of TN - the sig is out of date).

I know I have had a few updating problems here and there due to not being able to boot from USB sticks in certain circumstances, but I don’t remember the details (I posted about it several times though, so the info should be able to be found by searching these forums) .

Are there any useful settings I can glean for anyone?

CDuv · January 15, 2026, 11:29pm

Thanks @E_B for the offer, as you have a working system can be useful to compare.

As a reminder, here is my hardware:
CPU: Intel E3-1220L V2
RAM: 16GB PC3L 12800E
SAS card: PCI Express 9211-8i
BIOS: J06 04/04/2019

I’ll post my new test attempts below, but what had not shared since now is the BIOS configuration:

System Options:
- Processor Options:
  - No-Execute Memory Protection: Enabled
  - Intel(R) Virtualization Technology: Enabled
  - Intel(R) Hyperthreading Options: Enabled
  - Intel(R) Turbo Boost Technology: Enabled
  - Intel(R) VT-d: Enabled
Power Management Options:
- HP Power Profile: Custom
- HP Power Regulator: OS Control Mode
- Advanced Power Management Options:
  - Intel QPI Link Power Management: Enabled
  - Minimum Processor Idle Power Core State: C6 State
  - Minimum Processor Idle Power Package State: Package C6 (retention) State
  - Maximum Memory Bus Frequency: 1066 MHz
  - Memory Interleaving: No Interleaving
  - Dynamic Power Savings Mode Response: Fast
  - Collaborative Power Control: Enabled
  - DIMM Voltage Preference: Optimized for Power

Tests #15 and #16 were failed ttempts on how to really set intel_iommu=off

Test #17

(TL;DR: intel_iommu=off)

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs

Disabled intel_iommu with (

):

mount -o remount,rw,nodev,noatime,xattr,noacl,casesensitive boot-pool/ROOT/25.04.2.6/usr /usr
sed -i 's/intel_iommu=on/intel_iommu=off/' /usr/local/bin/truenas-grub.py

Removed other kernel_extra_options with: midclt call system.advanced.update '{"kernel_extra_options": ""}' ()
Checked for intel_iommu value in grep 'kernel: Kernel command line: ' /var/log/syslog, dmesg | grep 'ommand line: ', cat /proc/cmdline:

kernel: Kernel command line: BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1 nvme_core.multipath=N
From: 2026-01-02 18:15:17
To: 2026-01-03 18:28:17
Duration: 1 day
Verdict: “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible” + “User Initiated NMI Switch”

I’ve noticed dmesg confirms disabling IOMMU, but it also seems to fail on an AMD-Vi option:

[    0.021973] AMD-Vi: Unknown option - 'on'
[    0.022039] DMAR: IOMMU disabled

(But this message is also present on v24.10 boots)

Test #18

(TL;DR: intel_iommu=off + blacklist=hpwdt)

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs

Disabled intel_iommu with:

mount -o remount,rw,nodev,noatime,xattr,noacl,casesensitive boot-pool/ROOT/25.04.2.6/usr /usr
sed -i 's/intel_iommu=on/intel_iommu=off/' /usr/local/bin/truenas-grub.py

Blacklisted hpwdt module with: midclt call system.advanced.update '{"kernel_extra_options": "modprobe.blacklist=hpwdt"}' ()
Checked for intel_iommu value in grep 'kernel: Kernel command line: ' /var/log/syslog, dmesg | grep 'ommand line: ', cat /proc/cmdline:

kernel: Kernel command line: BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1 nvme_core.multipath=N modprobe.blacklist=hpwdt
Made sure hpwdt module is not loaded with: lsmod | grep hpwdt (returns nothing)
From: 2026-01-04 01:00:02
To: 2026-01-05 14:04:00
Duration: 1 day and 13 hours
Verdict: “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible” + “User Initiated NMI Switch”

Test #19

(TL;DR: intel_iommu=off + blacklist=hpwdt + intel_idle.max_cstate=2)

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs

Disabled intel_iommu with:

mount -o remount,rw,nodev,noatime,xattr,noacl,casesensitive boot-pool/ROOT/25.04.2.6/usr /usr
sed -i 's/intel_iommu=on/intel_iommu=off/' /usr/local/bin/truenas-grub.py

Blacklisted hpwdt module and set intel_idle.max_cstate to 2 with: midclt call system.advanced.update '{"kernel_extra_options": "modprobe.blacklist=hpwdt intel_idle.max_cstate=2"}' ()
Checked for intel_iommu value in grep 'kernel: Kernel command line: ' /var/log/syslog, dmesg | grep 'ommand line: ', cat /proc/cmdline:

kernel: Kernel command line: BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1 nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=2
Made sure hpwdt module is not loaded with: lsmod | grep hpwdt (returns nothing)
From: 2026-01-05 14:15:57
To: 2026-01-08 13:14:29
Duration: 2 days and 23 hours
Verdict: “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible” + “User Initiated NMI Switch”

In this test, I’ve executed powertop (which I must admit I don’t really understand), here is it’s output:

PowerTOP 2.14     Overview   *Idle stats*   Frequency stats   Device stats   Tunables   WakeUp                            


           Pkg(HW)  |            Core(HW) |            CPU(OS) 0   CPU(OS) 2
                    |                     | C0 active   0.9%        0.5%
                    |                     | POLL        0.0%    0.0 ms  0.0%    0.0 ms
                    |                     | C1          0.5%    0.4 ms  0.9%    0.6 ms
C2 (pc2)    0.0%    |                     | C1E        98.0%   12.0 ms 98.4%   13.1 ms
C3 (pc3)    0.0%    | C3 (cc3)    0.0%    |
C6 (pc6)    0.0%    | C6 (cc6)    0.0%    |
C7 (pc7)    0.0%    | C7 (cc7)    0.0%    |

                    |            Core(HW) |            CPU(OS) 1   CPU(OS) 3
                    |                     | C0 active   0.6%        0.5%
                    |                     | POLL        0.0%    0.0 ms  0.0%    0.0 ms
                    |                     | C1          0.8%    0.5 ms  0.9%    0.6 ms
                    |                     | C1E        98.3%   10.5 ms 98.4%   11.3 ms
                    | C3 (cc3)    0.0%    |
                    | C6 (cc6)    0.0%    |
                    | C7 (cc7)    0.0%    |

Test #20

(TL;DR: intel_iommu=off + blacklist=hpwdt + intel_idle.max_cstate=1)

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs

Disabled intel_iommu with:

mount -o remount,rw,nodev,noatime,xattr,noacl,casesensitive boot-pool/ROOT/25.04.2.6/usr /usr
sed -i 's/intel_iommu=on/intel_iommu=off/' /usr/local/bin/truenas-grub.py

Blacklisted hpwdt module and set intel_idle.max_cstate to 1 with: midclt call system.advanced.update '{"kernel_extra_options": "modprobe.blacklist=hpwdt intel_idle.max_cstate=1"}' ()
Checked for intel_iommu value in grep 'kernel: Kernel command line: ' /var/log/syslog, dmesg | grep 'ommand line: ', cat /proc/cmdline:

kernel: Kernel command line: BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1 nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=1
Made sure hpwdt module is not loaded with: lsmod | grep hpwdt (returns nothing)
From: 2026-01-08 14:51:17
To: 2026-01-10 12:09
Duration: 1 day and 21 hours
Verdict: “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible” + “User Initiated NMI Switch”

Test #21

(TL;DR: intel_iommu=off + blacklist=hpwdt + intel_idle.max_cstate=0)

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.04.2.6 (Linux truenas 6.12.15-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Oct 29 14:40:06 UTC 2025 x86_64 GNU/Linux) installed on SSDs

Disabled intel_iommu with:

mount -o remount,rw,nodev,noatime,xattr,noacl,casesensitive boot-pool/ROOT/25.04.2.6/usr /usr
sed -i 's/intel_iommu=on/intel_iommu=off/' /usr/local/bin/truenas-grub.py

Blacklisted hpwdt module and set intel_idle.max_cstate to 0 with: midclt call system.advanced.update '{"kernel_extra_options": "modprobe.blacklist=hpwdt intel_idle.max_cstate=0"}' ()
Checked for intel_iommu value in grep 'kernel: Kernel command line: ' /var/log/syslog, dmesg | grep 'ommand line: ', cat /proc/cmdline:

kernel: Kernel command line: BOOT_IMAGE=/ROOT/25.04.2.6@/boot/vmlinuz-6.12.15-production+truenas root=ZFS=boot-pool/ROOT/25.04.2.6 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=off zfsforce=1 nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=0
Made sure hpwdt module is not loaded with: lsmod | grep hpwdt (returns nothing)
From: 2026-01-11 11:40:09 (might have an “1 hour” lifetime but I’m not totally sure as I might have rebooted it on purpose)
To: 2026-01-13 04:08
Duration: 1 day and 16 hours
Verdict: “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible” + “User Initiated NMI Switch”

Test #22

(TL;DR: v25.10 + intel_iommu=off + blacklist=hpwdt + intel_idle.max_cstate=0)

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.10.1 (Linux truenas 6.12.33-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Dec 17 21:17:21 UTC 2025 x86_64 GNU/Linux) installed on SSDs
Blacklisted hpwdt module and set intel_idle.max_cstate to 0 with: midclt call system.advanced.update '{"kernel_extra_options": "modprobe.blacklist=hpwdt intel_idle.max_cstate=0"}' ()
Checked for intel_iommu value in grep 'kernel: Kernel command line: ' /var/log/syslog, dmesg | grep 'ommand line: ', cat /proc/cmdline:

kernel: Kernel command line: BOOT_IMAGE=/ROOT/25.10.1@/boot/vmlinuz-6.12.33-production+truenas root=ZFS=boot-pool/ROOT/25.10.1 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on zfsforce=1 nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=0
Made sure hpwdt module is not loaded with: lsmod | grep hpwdt (returns nothing)
From: 2026-01-13 14:31
To: 2026-01-13 21:28
Duration: 7 hours
Verdict: “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible” + “User Initiated NMI Switch”

Test #23

(TL;DR: v25.10 + intel_iommu=off + blacklist=hpwdt + intel_idle.max_cstate=0)

Two SSDs on B120i
9211-8i PCIe SAS card inserted
Four 3.5" HDDs powered and SATA-connected to the PCIe SAS card
TrueNAS-SCALE v25.10.1 (Linux truenas 6.12.33-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Dec 17 21:17:21 UTC 2025 x86_64 GNU/Linux) installed on SSDs

Disabled intel_iommu with (

):

mount -o remount,rw,nodev,noatime,xattr,noacl,casesensitive boot-pool/ROOT/25.10.1/usr /usr
sed -i 's/intel_iommu=on/intel_iommu=off/' /usr/local/bin/truenas-grub.py

Blacklisted hpwdt module and set intel_idle.max_cstate to 0 with: midclt call system.advanced.update '{"kernel_extra_options": "modprobe.blacklist=hpwdt intel_idle.max_cstate=0"}' ()
Checked for intel_iommu value in grep 'kernel: Kernel command line: ' /var/log/syslog, dmesg | grep 'ommand line: ', cat /proc/cmdline:

kernel: Kernel command line: BOOT_IMAGE=/ROOT/25.10.1@/boot/vmlinuz-6.12.33-production+truenas root=ZFS=boot-pool/ROOT/25.10.1 ro libata.allow_tpm=1 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on zfsforce=1 nvme_core.multipath=N modprobe.blacklist=hpwdt intel_idle.max_cstate=0
Made sure hpwdt module is not loaded with: lsmod | grep hpwdt (returns nothing)
From: 2026-01-13 23:27:37
To: 2026-01-14 08:50
Duration: 9 hours
Verdict: “Unrecoverable System Error (NMI) has occurred. System Firmware will log additional details in a separate IML entry if possible” + “User Initiated NMI Switch”

Topic		Replies	Views
Two new Intel servers randomly rebooting TrueNAS General SCALE	38	364	February 24, 2025
[Not Accepted] Include intel_idle Kernel Driver in TrueNAS SCALE to Improve Idle Power Efficiency on Intel CPUs Feature Requests	38	2194	October 9, 2025
Random reboots after upgrade to 24.10 TrueNAS General SCALE	12	882	November 18, 2024
How to debug Truenas Scale as it becomes unresponsive from time to time TrueNAS General SCALE	12	1155	March 8, 2025
NVIDIA GPUs not working and nvidia-smi fails to communicate with NVIDIA drivers TrueNAS General SCALE , Hardware , Apps	65	5221	November 25, 2025

HP ProLiant MicroServer Gen8 instable on v25, OK on v24 and Debian12: kernel issue?

Related topics