After upgrading from 24.10.2 to 25.04-RC.1, my TrueNAS server keeps hanging every morning

Hello!

On 24.10.2, I had almost 2 months of uptime.

After upgrrading to 25.04-RC.1, every morning my TrueNAS hangs:

Should I return to 24.10.2? :frowning:

Thanks!

After searching on Google, I found the following solutions, with some small variations between them:

ethtool -K eno1 tx off rx off
ethtool -K eno1 tx off rx off gso off gro off tso off
ethtool -K eno1 tso off gso off

I’m not sure which one I should use.

You should provide more details about your system as it can make a big difference in the answer. Please refer to the link below called Joe’s Rules.

As for rolling back, that is probably the most sound thing to do however if this is repeatable, you should file a jira ticket to report the failure through the TrueNAS GUI.

Those commands look like flow controls for the NIC. Not sure which one, if any, will help you.

  1. First file that jira ticket.
  2. ā€œCloneā€ your boot environment and make that clone Active, then reboot.
  3. Make any changes to the cloned boot environment. If it fails to work or you cause more damage, you can just boot up to the previous boot environment and make it active again.

This is the safest way to test these commands out.

Good luck.

admin@truenas[~]$ sudo ethtool -i eno1
driver: e1000e
version: 6.12.15-production+truenas
firmware-version: 0.2-4
expansion-rom-version: 
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
admin@truenas[~]$ sudo lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 0a)
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10)
00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)
admin@truenas[~]$ sudo lspci -v
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 0a)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] 8th Gen Core Processor Host Bridge/DRAM Registers
        Flags: bus master, fast devsel, latency 0
        Capabilities: [e0] Vendor Specific Information: Len=10 <?>
        Kernel driver in use: skl_uncore
        Kernel modules: ie31200_edac

00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] (prog-if 00 [VGA controller])
        DeviceName: Onboard - Video
        Subsystem: Micro-Star International Co., Ltd. [MSI] CoffeeLake-S GT2 [UHD Graphics 630]
        Flags: bus master, fast devsel, latency 0, IRQ 124
        Memory at a0000000 (64-bit, non-prefetchable) [size=16M]
        Memory at 90000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 3000 [size=64]
        Expansion ROM at 000c0000 [virtual] [disabled] [size=128K]
        Capabilities: [40] Vendor Specific Information: Len=0c <?>
        Capabilities: [70] Express Root Complex Integrated Endpoint, MSI 00
        Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [100] Process Address Space ID (PASID)
        Capabilities: [200] Address Translation Service (ATS)
        Capabilities: [300] Page Request Interface (PRI)
        Kernel driver in use: i915
        Kernel modules: i915

00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
        Flags: fast devsel, IRQ 255
        Memory at a103a000 (64-bit, non-prefetchable) [disabled] [size=4K]
        Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
        Capabilities: [dc] Power Management version 2
        Capabilities: [f0] PCI Advanced Features

00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Cannon Lake PCH Thermal Controller
        Flags: fast devsel, IRQ 16
        Memory at a1039000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [50] Power Management version 3
        Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-
        Kernel driver in use: intel_pch_thermal
        Kernel modules: intel_pch_thermal

00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10) (prog-if 30 [XHCI])
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Cannon Lake PCH USB 3.1 xHCI Host Controller
        Flags: bus master, medium devsel, latency 0, IRQ 121
        Memory at a1020000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [70] Power Management version 2
        Capabilities: [80] MSI: Enable+ Count=1/8 Maskable- 64bit+
        Capabilities: [90] Vendor Specific Information: Len=14 <?>
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci

00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Lake PCH Shared SRAM
        Flags: fast devsel
        Memory at a1032000 (64-bit, non-prefetchable) [disabled] [size=8K]
        Memory at a1038000 (64-bit, non-prefetchable) [disabled] [size=4K]
        Capabilities: [80] Power Management version 3

00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Cannon Lake PCH HECI Controller
        Flags: bus master, fast devsel, latency 0, IRQ 123
        Memory at a1037000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [50] Power Management version 3
        Capabilities: [8c] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [a4] Vendor Specific Information: Len=14 <?>
        Kernel driver in use: mei_me
        Kernel modules: mei_me

00:17.0 SATA controller: Intel Corporation Cannon Lake PCH SATA AHCI Controller (rev 10) (prog-if 01 [AHCI 1.0])
        DeviceName: Onboard - SATA
        Subsystem: Micro-Star International Co., Ltd. [MSI] Cannon Lake PCH SATA AHCI Controller
        Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 122
        Memory at a1030000 (32-bit, non-prefetchable) [size=8K]
        Memory at a1036000 (32-bit, non-prefetchable) [size=256]
        I/O ports at 3090 [size=8]
        I/O ports at 3080 [size=4]
        I/O ports at 3060 [size=32]
        Memory at a1035000 (32-bit, non-prefetchable) [size=2K]
        Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [70] Power Management version 3
        Capabilities: [a8] SATA HBA v1.0
        Kernel driver in use: ahci
        Kernel modules: ahci

00:1f.0 ISA bridge: Intel Corporation Z390 Chipset LPC/eSPI Controller (rev 10)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Z390 Chipset LPC/eSPI Controller
        Flags: bus master, medium devsel, latency 0

00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Cannon Lake PCH SMBus Controller
        Flags: medium devsel, IRQ 16
        Memory at a1034000 (64-bit, non-prefetchable) [size=256]
        I/O ports at efa0 [size=32]
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801

00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
        DeviceName: Onboard - Other
        Subsystem: Micro-Star International Co., Ltd. [MSI] Cannon Lake PCH SPI Controller
        Flags: fast devsel
        Memory at fe010000 (32-bit, non-prefetchable) [size=4K]

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)
        DeviceName: Onboard - Ethernet
        Subsystem: Micro-Star International Co., Ltd. [MSI] Ethernet Connection (7) I219-V
        Flags: bus master, fast devsel, latency 0, IRQ 120
        Memory at a1000000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Kernel driver in use: e1000e
        Kernel modules: e1000e
admin@truenas[~]$ sudo ifconfig -a
br0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.8  netmask 255.255.255.0  broadcast 10.0.0.255
        inet6 fd98:e1d2:7c83:0:2002:59ff:febf:d90c  prefixlen 64  scopeid 0x0<global>
        inet6 2804:1b2:9500:5838:2002:59ff:febf:d90c  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::dcd7:d8ff:fefe:7317  prefixlen 64  scopeid 0x20<link>
        ether xx:xx:xx:xx:xx:xx  txqueuelen 1000  (Ethernet)
        RX packets 8355492  bytes 12044168172 (11.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5966443  bytes 81461499093 (75.8 GiB)
        TX errors 0  dropped 16 overruns 0  carrier 0  collisions 0

eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::2d8:61ff:fe0e:f637  prefixlen 64  scopeid 0x20<link>
        ether xx:xx:xx:xx:xx:xx  txqueuelen 1000  (Ethernet)
        RX packets 12474157  bytes 15221788861 (14.1 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6564442  bytes 2260505979 (2.1 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 16  memory 0xa1000000-a1020000  

incusbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 10.162.23.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fd42:fa96:99d6:155f::1  prefixlen 64  scopeid 0x0<global>
        ether xx:xx:xx:xx:xx:xx  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 88 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 72513  bytes 45960468 (43.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 72513  bytes 45960468 (43.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tailscale0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1280
        inet yyy.yyy.yyy.yyy  netmask 255.255.255.255  destination yyy.yyy.yyy.yyy
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
        RX packets 151048  bytes 8044146 (7.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 207688  bytes 328009213 (312.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tapda261bbb: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether xx:xx:xx:xx:xx:xx  txqueuelen 1000  (Ethernet)
        RX packets 2035808  bytes 1517024387 (1.4 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4006434  bytes 83672060940 (77.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethe57e653: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::88ce:8cff:fe44:bfd  prefixlen 64  scopeid 0x20<link>
        ether xx:xx:xx:xx:xx:xx  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1297  bytes 614353 (599.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

I just did ethtool -K eno1 tso off gso off. No problem… so far.

If it lasts 1 week, then you likely fixed it. I say 1 week because 1 day is not enough to positively state it is fixed.

Let us know how it turns out.

1 Like

The command disables hardware offload, so expect some lower performance, but otherwise should be safe. The question is whether the driver in the new version started to use some functionality that is buggy in hardware, or it is a regression in the kernel driver itself. Quite likely it is model-specific, so I’d search for feedback specific to the model and the 6.12 kernel version.

2 Likes

@joeschmuck @mav, ok, I’m gonna wait 1 week to see if this fixed it.

Anyway: 118721 – e1000e hardware unit hangs when TSO is on

From comment #11 onwards, the reports are identical to mine: everything worked fine on kernel 6.6, but the problems started after updating to kernel 6.12.

Ticket created: Jira

Thanks, guys!

@protoman Would be good to bump the mentioned Linux kernel issue to keep them aware. We’d be happy to include whatever upstream patches it produce, but we have no realistic means to debug it ourselves.

1 Like

Uptime: 6 days 17 hours 2 minutes as of 09:26.

I think the problem is solved…

1 Like