Unfortunately, my main system also froze today. Three Incus VMs were running (idle). It was perfectly stable on 24.10 before and upgraded to 25.04-RC.1.
During the freeze, the web UI and VMs were unreachable. I could ping the TrueNAS IP though. Weirdly, pings to VMs only responded 4 minutes(!) later, and only some.
Force rebooted the machine after it made no recovery about 10 minutes later.
This was the only relevant thing after the syslog in the reboot, about two minutes after the freeze started (it might have not saved some other messages):
Apr 02 07:50:24 Prime kernel: INFO: task txg_sync:1264 blocked for more than 120 seconds.
Apr 02 07:50:24 Prime kernel: Tainted: P OE 6.12.15-production+truenas #1
Apr 02 07:50:27 Prime kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 02 07:50:31 Prime kernel: task:txg_sync state:D stack:0 pid:1264 tgid:1264 ppid:2 flags:0x00004000
Apr 02 07:50:31 Prime kernel: Call Trace:
Apr 02 07:50:31 Prime kernel: <TASK>
Apr 02 07:50:31 Prime kernel: __schedule+0x461/0xa10
Apr 02 07:50:31 Prime kernel: schedule+0x27/0xd0
Apr 02 07:50:31 Prime kernel: schedule_timeout+0x9e/0x170
Apr 02 07:50:31 Prime kernel: ? __pfx_process_timeout+0x10/0x10
Apr 02 07:50:31 Prime kernel: io_schedule_timeout+0x51/0x70
Apr 02 07:50:31 Prime kernel: __cv_timedwait_common+0x129/0x160 [spl]
Apr 02 07:50:31 Prime kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Apr 02 07:50:31 Prime kernel: __cv_timedwait_io+0x19/0x20 [spl]
Apr 02 07:50:31 Prime kernel: zio_wait+0x11a/0x240 [zfs]
Apr 02 07:50:31 Prime kernel: dsl_pool_sync+0xb9/0x410 [zfs]
Apr 02 07:50:31 Prime kernel: spa_sync_iterate_to_convergence+0xd8/0x200 [zfs]
Apr 02 07:50:31 Prime kernel: spa_sync+0x30a/0x600 [zfs]
Apr 02 07:50:31 Prime kernel: txg_sync_thread+0x1ec/0x270 [zfs]
Apr 02 07:50:31 Prime kernel: ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
Apr 02 07:50:31 Prime kernel: ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
Apr 02 07:50:31 Prime kernel: thread_generic_wrapper+0x5a/0x70 [spl]
Apr 02 07:50:31 Prime kernel: kthread+0xcf/0x100
Apr 02 07:50:31 Prime kernel: ? __pfx_kthread+0x10/0x10
Apr 02 07:50:31 Prime kernel: ret_from_fork+0x31/0x50
Apr 02 07:50:31 Prime kernel: ? __pfx_kthread+0x10/0x10
Apr 02 07:50:31 Prime kernel: ret_from_fork_asm+0x1a/0x30
Apr 02 07:50:31 Prime kernel: </TASK>
I’ve had issues with my other test system freezing on RC.1 as well, but I kinda hoped it was due to the unsupported install config, but this system has a dedicated NVMe SSD for booting TrueNAS and one for VMs and Docker. It’s also set as the “System Pool”.
Not sure if it’s the same issue as I’ve had on my HPE server: TrueNAS 25.04-RC.1 is Now Available! - #85 by TheJulianJES
This system is running an i7-7700k, 500 GB WD Black NVMe SSD for booting TrueNAS and a 1 TB Samsung 970 Evo for VMs + Docker + “System Pool”. Both SSDs have the latest firmware, are mostly empty, and “work fine”.
(Auto TRIM was off, but they’re trimed every couple of weeks. No TRIM happened before or during the freeze.)
After the system was force restarted, it’s working just fine again.
I guess I should create a ticket about these freezes, now that’s it happened on two separate systems… which both have been fine on all previous versions. Guessing this will be hard to debug.
Saved a debug file directly after I restarted the system.
Just curious, has anyone else experience seemingly random freezes of the entire TrueNAS machine (maybe especially when running Incus VMs)? Kinda have a feeling that could be related.