Truenas Scale 24.10.1 -- pool fails to load

Mike_Horist · January 24, 2025, 7:55pm

Hello,

I have an IX Systems Mini 3.0 X+ running TrueNAS Scale 24.10.1 (upgraded in December , no problems)

I’ve been moving large files (10+ Terabytes of files) between data mounts in the single large pool when I ran out of space (I have lots of space)

I found out that it was the snapshots eating up all the space, so I removed them from the appropriate (/mnt/Spinner/Media and /mnt/Spinner/Media_Jellyfin ) points in the web UI.

The web UI wouldn’t do anything (yes I held down shift before hitting the ok button after clicking the “confirm” checkbox) , so I restarted from the Web UI .

When I woke up this morning it was back up and running so I continued to transfer files but noticed it was VERY slow (it would transfer a large chunk then drop to 0 bytes/second for a while)

I restarted again from the Web UI and since then none of the following works:

WebUI : won’t load the Storage tab and the Data tab shows nothing (which makes sense if the pool isn’t loading)
Web UI: If I click on disks , it can’t load the plugin
WebUI: only widget that will load on the front page are “CPU Model” and “System Information”
CLI: zpool status hangs
CLI: zfs list hangs

I am getting a bunch of info on bootup (see attachments) indicating
“task xyz blocked for more than xxx seconds” , which includes middlewared, zpool, txg_sync, etc… , which causes it to take about 20 mins to boot on restart now.

I’ve left the system running for 6 hours and I see activity going on with hte hard drives, and during boot it finds all 7 (5 spinners, 2 SSDs) , and I do notice a rather large IOWait when i run sar 1 1 (I’m watching it on the display running non stop and I can see it go up to 70% disk i/o wait , then drops down to 11% then comes back up)

I’ve checked dmesg and nothing stands out, also /var/log/messages doesn’t show much.

I’m seeing messages in either the daemon.log or net_api.log file talking about trying to start services.container .

Now when I woke up this morning, as mentioned I could access the shares and was able to delete the large snapshots , but then after another reboot (not forcefully) in the WEB UI , it is having issues with the pool.

Anyone have any suggestions or are there other logs besides /var/log/messages and dmesg I should check to see what exactly the disk i/o is doing?

Is there a log or a command (obviously nothing that’s zfs or zpool as those will just hang , even zpool history and zpool status hang) I can run to see what it is doing on the hard drives ? (is it trying to recover the RAID for some reason? Is it running fsck on the large pool )

Any assistance is appreciated

Mike_Horist · January 24, 2025, 8:13pm

error log:
Jan 24 13:18:15 truenas blkmapd[2102]: exit on signal(15)
Jan 24 13:18:15 truenas upsd[2467]: mainloop: Interrupted system call
Jan 24 13:19:45 truenas systemd[1]: Failed unmounting data.mount - /data.
Jan 24 13:19:45 truenas systemd[1]: Failed unmounting var-log-journal.mount.
Jan 24 13:19:46 truenas systemd[1]: Failed unmounting var-log.mount.
Jan 24 13:19:46 truenas systemd[1]: Failed unmounting var.mount.
Jan 24 13:30:23 truenas kernel: Error: Driver ‘pcspkr’ is already registered, aborting…
Jan 24 13:32:37 truenas systemd[1]: Failed to start ix-etc.service - Generate TrueNAS /etc files.
Jan 24 13:34:40 truenas systemd[1]: Failed to start ix-netif.service - Setup TrueNAS Network.
Jan 24 13:36:02 truenas kernel: INFO: task middlewared (wo:1293 blocked for more than 120 seconds.
Jan 24 13:36:02 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:36:02 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:36:02 truenas kernel: INFO: task zpool:1406 blocked for more than 121 seconds.
Jan 24 13:36:02 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:36:02 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:36:03 truenas kernel: INFO: task txg_sync:1690 blocked for more than 121 seconds.
Jan 24 13:36:03 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:36:03 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:38:05 truenas kernel: INFO: task middlewared (wo:1293 blocked for more than 243 seconds.
Jan 24 13:38:05 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:38:05 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:38:05 truenas kernel: INFO: task zpool:1406 blocked for more than 243 seconds.
Jan 24 13:38:05 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:38:05 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:38:05 truenas kernel: INFO: task txg_sync:1690 blocked for more than 244 seconds.
Jan 24 13:38:06 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:38:06 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:40:08 truenas kernel: INFO: task middlewared (wo:1293 blocked for more than 366 seconds.
Jan 24 13:40:08 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:40:08 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:40:08 truenas kernel: INFO: task zpool:1406 blocked for more than 366 seconds.
Jan 24 13:40:08 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:40:08 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:42:09 truenas kernel: INFO: task middlewared (wo:1293 blocked for more than 487 seconds.
Jan 24 13:42:09 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:42:09 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:42:09 truenas kernel: INFO: task zpool:1406 blocked for more than 487 seconds.
Jan 24 13:42:09 truenas kernel: Tainted: P OE 6.6.44-production+truenas #1
Jan 24 13:42:09 truenas kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Jan 24 13:46:37 truenas systemd[1]: Failed to start ix-zfs.service - Import ZFS pools.
Jan 24 13:46:37 truenas blkmapd[2003]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
Jan 24 13:46:37 truenas nfsdcld[2024]: Failed to open main database: -2
Jan 24 13:46:39 truenas systemd[1]: Failed to start nfs-mountd.service - NFS Mount Daemon.
Jan 24 13:46:43 truenas systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking…
Jan 24 13:46:47 truenas systemd[1]: Failed to start winbind.service - Samba Winbind Daemon.
Jan 24 13:48:13 truenas systemd[1]: Failed to start containerd.service - containerd container runtime.
Jan 24 13:49:49 truenas systemd[1]: Failed to start containerd.service - containerd container runtime.
Jan 24 13:51:24 truenas systemd[1]: Failed to start containerd.service - containerd container runtime.
Jan 24 13:53:00 truenas systemd[1]: Failed to start containerd.service - containerd container runtime.
Jan 24 13:54:35 truenas systemd[1]: Failed to start containerd.service - containerd container runtime.
Jan 24 13:56:11 truenas systemd[1]: Failed to start containerd.service - containerd container runtime.
Jan 24 13:57:46 truenas systemd[1]: Failed to start containerd.service - containerd container runtime.

in message I see alot of stack dumps for middlewared:

Jan 24 13:38:05 truenas kernel: task:middlewared (wo state:D stack:0 pid:1293 ppid:1165 flags:0x00004002
Jan 24 13:38:05 truenas kernel: Call Trace:
Jan 24 13:38:05 truenas kernel:
Jan 24 13:38:05 truenas kernel: __schedule+0x349/0x950
Jan 24 13:38:05 truenas kernel: schedule+0x5b/0xa0
Jan 24 13:38:05 truenas kernel: schedule_preempt_disabled+0x15/0x30
Jan 24 13:38:05 truenas kernel: __mutex_lock.constprop.0+0x399/0x700
Jan 24 13:38:05 truenas kernel: cv_wait_common+0xc7/0x130 [spl]
Jan 24 13:38:05 truenas kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jan 24 13:38:05 truenas kernel: spa_lookup+0x51/0x140 [zfs]
Jan 24 13:38:05 truenas kernel: spa_open_common+0x79/0x440 [zfs]
Jan 24 13:38:05 truenas kernel: spa_get_stats+0x4e/0x210 [zfs]
Jan 24 13:38:05 truenas kernel: ? spl_kmem_alloc_impl+0xb4/0xf0 [spl]
Jan 24 13:38:05 truenas kernel: zfs_ioc_pool_stats+0x40/0x90 [zfs]
Jan 24 13:38:05 truenas kernel: zfsdev_ioctl_common+0x680/0x790 [zfs]
Jan 24 13:38:05 truenas kernel: ? __kmalloc_node+0xc6/0x150
Jan 24 13:38:05 truenas kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
Jan 24 13:38:05 truenas kernel: __x64_sys_ioctl+0x97/0xd0
Jan 24 13:38:05 truenas kernel: do_syscall_64+0x59/0xb0
Jan 24 13:38:05 truenas kernel: ? kmem_cache_free+0x20/0x380
Jan 24 13:38:05 truenas kernel: ? __call_rcu_common.constprop.0+0xe5/0x6b0
Jan 24 13:38:05 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:38:05 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:38:05 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:38:05 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:38:05 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:38:05 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:38:05 truenas kernel: ? __sysvec_apic_timer_interrupt+0x50/0x140
Jan 24 13:38:05 truenas kernel: ? __irq_exit_rcu+0x3b/0xc0
Jan 24 13:38:05 truenas kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
Jan 24 13:38:05 truenas kernel: RIP: 0033:0x7efd1634ec5b
Jan 24 13:38:05 truenas kernel: RSP: 002b:00007ffd0b4c6790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 24 13:38:05 truenas kernel: RAX: ffffffffffffffda RBX: 00000000038d5d30 RCX: 00007efd1634ec5b
Jan 24 13:38:05 truenas kernel: RDX: 00007ffd0b4c6810 RSI: 0000000000005a05 RDI: 000000000000001d
Jan 24 13:38:05 truenas kernel: RBP: 00007ffd0b4c9e00 R08: 000000000394ae40 R09: 00007efd16423d80
Jan 24 13:38:05 truenas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd0b4c6810
Jan 24 13:38:05 truenas kernel: R13: 00000000038d5d30 R14: 0000000004455310 R15: 00007ffd0b4c9e24
Jan 24 13:38:05 truenas kernel:
Jan 24 13:38:05 truenas kernel: task:zpool state:D stack:0 pid:1406 ppid:1376 flags:0x00004002
Jan 24 13:38:05 truenas kernel: Call Trace:
Jan 24 13:38:05 truenas kernel:
Jan 24 13:38:05 truenas kernel: __schedule+0x349/0x950
Jan 24 13:38:05 truenas kernel: schedule+0x5b/0xa0
Jan 24 13:38:05 truenas kernel: io_schedule+0x46/0x70
Jan 24 13:38:05 truenas kernel: cv_wait_common+0xaa/0x130 [spl]
Jan 24 13:38:05 truenas kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jan 24 13:38:05 truenas kernel: txg_wait_synced_impl+0xc0/0x110 [zfs]
Jan 24 13:38:05 truenas kernel: txg_wait_synced+0x10/0x40 [zfs]
Jan 24 13:38:05 truenas kernel: spa_config_update+0x43/0x160 [zfs]
Jan 24 13:38:05 truenas kernel: spa_import+0x5d6/0x690 [zfs]
Jan 24 13:38:05 truenas kernel: zfs_ioc_pool_import+0x15b/0x180 [zfs]
Jan 24 13:38:05 truenas kernel: zfsdev_ioctl_common+0x680/0x790 [zfs]
Jan 24 13:38:05 truenas kernel: ? __kmalloc_node+0xc6/0x150
Jan 24 13:38:05 truenas kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
Jan 24 13:38:05 truenas kernel: __x64_sys_ioctl+0x97/0xd0
Jan 24 13:38:05 truenas kernel: do_syscall_64+0x59/0xb0
Jan 24 13:38:05 truenas kernel: ? exc_page_fault+0x77/0x170
Jan 24 13:38:05 truenas kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
Jan 24 13:38:05 truenas kernel: RIP: 0033:0x7f11f42c3c5b
Jan 24 13:38:05 truenas kernel: RSP: 002b:00007ffe7246f300 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 24 13:38:05 truenas kernel: RAX: ffffffffffffffda RBX: 0000560caae35960 RCX: 00007f11f42c3c5b
Jan 24 13:38:05 truenas kernel: RDX: 00007ffe7246f3c0 RSI: 0000000000005a02 RDI: 0000000000000003
Jan 24 13:38:05 truenas kernel: RBP: 00007ffe72473ab0 R08: 00007f11f4399450 R09: 00007f11f4399450
Jan 24 13:38:05 truenas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000560caad10dd0
Jan 24 13:38:05 truenas kernel: R13: 00007ffe7246f3c0 R14: 00007f11e4004cb0 R15: 0000560caad3fec0
Jan 24 13:38:05 truenas kernel:
Jan 24 13:38:06 truenas kernel: task:txg_sync state:D stack:0 pid:1690 ppid:2 flags:0x00004000
Jan 24 13:38:06 truenas kernel: Call Trace:
Jan 24 13:38:06 truenas kernel:
Jan 24 13:38:06 truenas kernel: __schedule+0x349/0x950
Jan 24 13:38:06 truenas kernel: schedule+0x5b/0xa0
Jan 24 13:38:06 truenas kernel: schedule_timeout+0x98/0x160
Jan 24 13:38:06 truenas kernel: ? __pfx_process_timeout+0x10/0x10
Jan 24 13:38:06 truenas kernel: io_schedule_timeout+0x50/0x80
Jan 24 13:38:06 truenas kernel: __cv_timedwait_common+0x12a/0x160 [spl]
Jan 24 13:38:06 truenas kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jan 24 13:38:06 truenas kernel: __cv_timedwait_io+0x19/0x20 [spl]
Jan 24 13:38:06 truenas kernel: zio_wait+0x124/0x240 [zfs]
Jan 24 13:38:06 truenas kernel: dsl_process_async_destroys+0x326/0x580 [zfs]
Jan 24 13:38:06 truenas kernel: dsl_scan_sync+0x1e1/0x9b0 [zfs]
Jan 24 13:38:06 truenas kernel: ? list_head+0x9/0x30 [zfs]
Jan 24 13:38:06 truenas kernel: spa_sync_iterate_to_convergence+0x127/0x200 [zfs]
Jan 24 13:38:06 truenas kernel: spa_sync+0x30a/0x600 [zfs]
Jan 24 13:38:06 truenas kernel: txg_sync_thread+0x1ec/0x270 [zfs]
Jan 24 13:38:06 truenas kernel: ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
Jan 24 13:38:06 truenas kernel: ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
Jan 24 13:38:06 truenas kernel: thread_generic_wrapper+0x5e/0x70 [spl]
Jan 24 13:38:06 truenas kernel: kthread+0xe8/0x120
Jan 24 13:38:06 truenas kernel: ? __pfx_kthread+0x10/0x10
Jan 24 13:38:06 truenas kernel: ret_from_fork+0x34/0x50
Jan 24 13:38:06 truenas kernel: ? __pfx_kthread+0x10/0x10
Jan 24 13:38:06 truenas kernel: ret_from_fork_asm+0x1b/0x30
Jan 24 13:38:06 truenas kernel:
Jan 24 13:40:08 truenas kernel: task:middlewared (wo state:D stack:0 pid:1293 ppid:1165 flags:0x00004002
Jan 24 13:40:08 truenas kernel: Call Trace:
Jan 24 13:40:08 truenas kernel:
Jan 24 13:40:08 truenas kernel: __schedule+0x349/0x950
Jan 24 13:40:08 truenas kernel: schedule+0x5b/0xa0
Jan 24 13:40:08 truenas kernel: schedule_preempt_disabled+0x15/0x30
Jan 24 13:40:08 truenas kernel: __mutex_lock.constprop.0+0x399/0x700
Jan 24 13:40:08 truenas kernel: cv_wait_common+0xc7/0x130 [spl]
Jan 24 13:40:08 truenas kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jan 24 13:40:08 truenas kernel: spa_lookup+0x51/0x140 [zfs]
Jan 24 13:40:08 truenas kernel: spa_open_common+0x79/0x440 [zfs]
Jan 24 13:40:08 truenas kernel: spa_get_stats+0x4e/0x210 [zfs]
Jan 24 13:40:08 truenas kernel: ? spl_kmem_alloc_impl+0xb4/0xf0 [spl]
Jan 24 13:40:08 truenas kernel: zfs_ioc_pool_stats+0x40/0x90 [zfs]
Jan 24 13:40:08 truenas kernel: zfsdev_ioctl_common+0x680/0x790 [zfs]
Jan 24 13:40:08 truenas kernel: ? __kmalloc_node+0xc6/0x150
Jan 24 13:40:08 truenas kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
Jan 24 13:40:08 truenas kernel: __x64_sys_ioctl+0x97/0xd0
Jan 24 13:40:08 truenas kernel: do_syscall_64+0x59/0xb0
Jan 24 13:40:08 truenas kernel: ? kmem_cache_free+0x20/0x380
Jan 24 13:40:08 truenas kernel: ? __call_rcu_common.constprop.0+0xe5/0x6b0
Jan 24 13:40:08 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:40:08 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:40:08 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:40:08 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:40:08 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:40:08 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:40:08 truenas kernel: ? __sysvec_apic_timer_interrupt+0x50/0x140
Jan 24 13:40:08 truenas kernel: ? __irq_exit_rcu+0x3b/0xc0
Jan 24 13:40:08 truenas kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
Jan 24 13:40:08 truenas kernel: RIP: 0033:0x7efd1634ec5b
Jan 24 13:40:08 truenas kernel: RSP: 002b:00007ffd0b4c6790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 24 13:40:08 truenas kernel: RAX: ffffffffffffffda RBX: 00000000038d5d30 RCX: 00007efd1634ec5b
Jan 24 13:40:08 truenas kernel: RDX: 00007ffd0b4c6810 RSI: 0000000000005a05 RDI: 000000000000001d
Jan 24 13:40:08 truenas kernel: RBP: 00007ffd0b4c9e00 R08: 000000000394ae40 R09: 00007efd16423d80
Jan 24 13:40:08 truenas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd0b4c6810
Jan 24 13:40:08 truenas kernel: R13: 00000000038d5d30 R14: 0000000004455310 R15: 00007ffd0b4c9e24
Jan 24 13:40:08 truenas kernel:
Jan 24 13:40:08 truenas kernel: task:zpool state:D stack:0 pid:1406 ppid:1376 flags:0x00004002
Jan 24 13:40:08 truenas kernel: Call Trace:
Jan 24 13:40:08 truenas kernel:
Jan 24 13:40:08 truenas kernel: __schedule+0x349/0x950
Jan 24 13:40:08 truenas kernel: schedule+0x5b/0xa0
Jan 24 13:40:08 truenas kernel: io_schedule+0x46/0x70
Jan 24 13:40:08 truenas kernel: cv_wait_common+0xaa/0x130 [spl]
Jan 24 13:40:08 truenas kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jan 24 13:40:08 truenas kernel: txg_wait_synced_impl+0xc0/0x110 [zfs]
Jan 24 13:40:08 truenas kernel: txg_wait_synced+0x10/0x40 [zfs]
Jan 24 13:40:08 truenas kernel: spa_config_update+0x43/0x160 [zfs]
Jan 24 13:40:08 truenas kernel: spa_import+0x5d6/0x690 [zfs]
Jan 24 13:40:08 truenas kernel: zfs_ioc_pool_import+0x15b/0x180 [zfs]
Jan 24 13:40:08 truenas kernel: zfsdev_ioctl_common+0x680/0x790 [zfs]
Jan 24 13:40:08 truenas kernel: ? __kmalloc_node+0xc6/0x150
Jan 24 13:40:08 truenas kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
Jan 24 13:40:08 truenas kernel: __x64_sys_ioctl+0x97/0xd0
Jan 24 13:40:08 truenas kernel: do_syscall_64+0x59/0xb0
Jan 24 13:40:08 truenas kernel: ? exc_page_fault+0x77/0x170
Jan 24 13:40:08 truenas kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
Jan 24 13:40:08 truenas kernel: RIP: 0033:0x7f11f42c3c5b
Jan 24 13:40:08 truenas kernel: RSP: 002b:00007ffe7246f300 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 24 13:40:08 truenas kernel: RAX: ffffffffffffffda RBX: 0000560caae35960 RCX: 00007f11f42c3c5b
Jan 24 13:40:08 truenas kernel: RDX: 00007ffe7246f3c0 RSI: 0000000000005a02 RDI: 0000000000000003
Jan 24 13:40:08 truenas kernel: RBP: 00007ffe72473ab0 R08: 00007f11f4399450 R09: 00007f11f4399450
Jan 24 13:40:08 truenas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000560caad10dd0
Jan 24 13:40:08 truenas kernel: R13: 00007ffe7246f3c0 R14: 00007f11e4004cb0 R15: 0000560caad3fec0
Jan 24 13:40:08 truenas kernel:
Jan 24 13:42:09 truenas kernel: task:middlewared (wo state:D stack:0 pid:1293 ppid:1165 flags:0x00004002
Jan 24 13:42:09 truenas kernel: Call Trace:
Jan 24 13:42:09 truenas kernel:
Jan 24 13:42:09 truenas kernel: __schedule+0x349/0x950
Jan 24 13:42:09 truenas kernel: schedule+0x5b/0xa0
Jan 24 13:42:09 truenas kernel: schedule_preempt_disabled+0x15/0x30
Jan 24 13:42:09 truenas kernel: __mutex_lock.constprop.0+0x399/0x700
Jan 24 13:42:09 truenas kernel: cv_wait_common+0xc7/0x130 [spl]
Jan 24 13:42:09 truenas kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jan 24 13:42:09 truenas kernel: spa_lookup+0x51/0x140 [zfs]
Jan 24 13:42:09 truenas kernel: spa_open_common+0x79/0x440 [zfs]
Jan 24 13:42:09 truenas kernel: spa_get_stats+0x4e/0x210 [zfs]
Jan 24 13:42:09 truenas kernel: ? spl_kmem_alloc_impl+0xb4/0xf0 [spl]
Jan 24 13:42:09 truenas kernel: zfs_ioc_pool_stats+0x40/0x90 [zfs]
Jan 24 13:42:09 truenas kernel: zfsdev_ioctl_common+0x680/0x790 [zfs]
Jan 24 13:42:09 truenas kernel: ? __kmalloc_node+0xc6/0x150
Jan 24 13:42:09 truenas kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
Jan 24 13:42:09 truenas kernel: __x64_sys_ioctl+0x97/0xd0
Jan 24 13:42:09 truenas kernel: do_syscall_64+0x59/0xb0
Jan 24 13:42:09 truenas kernel: ? kmem_cache_free+0x20/0x380
Jan 24 13:42:09 truenas kernel: ? __call_rcu_common.constprop.0+0xe5/0x6b0
Jan 24 13:42:09 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:42:09 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:42:09 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:42:09 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:42:09 truenas kernel: ? syscall_exit_to_user_mode+0x22/0x40
Jan 24 13:42:09 truenas kernel: ? do_syscall_64+0x65/0xb0
Jan 24 13:42:09 truenas kernel: ? __sysvec_apic_timer_interrupt+0x50/0x140
Jan 24 13:42:09 truenas kernel: ? __irq_exit_rcu+0x3b/0xc0
Jan 24 13:42:09 truenas kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
Jan 24 13:42:09 truenas kernel: RIP: 0033:0x7efd1634ec5b
Jan 24 13:42:09 truenas kernel: RSP: 002b:00007ffd0b4c6790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 24 13:42:09 truenas kernel: RAX: ffffffffffffffda RBX: 00000000038d5d30 RCX: 00007efd1634ec5b
Jan 24 13:42:09 truenas kernel: RDX: 00007ffd0b4c6810 RSI: 0000000000005a05 RDI: 000000000000001d
Jan 24 13:42:09 truenas kernel: RBP: 00007ffd0b4c9e00 R08: 000000000394ae40 R09: 00007efd16423d80
Jan 24 13:42:09 truenas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd0b4c6810
Jan 24 13:42:09 truenas kernel: R13: 00000000038d5d30 R14: 0000000004455310 R15: 00007ffd0b4c9e24
Jan 24 13:42:09 truenas kernel:
Jan 24 13:42:09 truenas kernel: task:zpool state:D stack:0 pid:1406 ppid:1376 flags:0x00004002
Jan 24 13:42:09 truenas kernel: Call Trace:
Jan 24 13:42:09 truenas kernel:
Jan 24 13:42:09 truenas kernel: __schedule+0x349/0x950
Jan 24 13:42:09 truenas kernel: schedule+0x5b/0xa0
Jan 24 13:42:09 truenas kernel: io_schedule+0x46/0x70
Jan 24 13:42:09 truenas kernel: cv_wait_common+0xaa/0x130 [spl]
Jan 24 13:42:09 truenas kernel: ? __pfx_autoremove_wake_function+0x10/0x10
Jan 24 13:42:09 truenas kernel: txg_wait_synced_impl+0xc0/0x110 [zfs]
Jan 24 13:42:09 truenas kernel: txg_wait_synced+0x10/0x40 [zfs]
Jan 24 13:42:09 truenas kernel: spa_config_update+0x43/0x160 [zfs]
Jan 24 13:42:09 truenas kernel: spa_import+0x5d6/0x690 [zfs]
Jan 24 13:42:09 truenas kernel: zfs_ioc_pool_import+0x15b/0x180 [zfs]
Jan 24 13:42:09 truenas kernel: zfsdev_ioctl_common+0x680/0x790 [zfs]
Jan 24 13:42:09 truenas kernel: ? __kmalloc_node+0xc6/0x150
Jan 24 13:42:09 truenas kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
Jan 24 13:42:09 truenas kernel: __x64_sys_ioctl+0x97/0xd0
Jan 24 13:42:09 truenas kernel: do_syscall_64+0x59/0xb0
Jan 24 13:42:09 truenas kernel: ? exc_page_fault+0x77/0x170
Jan 24 13:42:09 truenas kernel: entry_SYSCALL_64_after_hwframe+0x78/0xe2
Jan 24 13:42:09 truenas kernel: RIP: 0033:0x7f11f42c3c5b
Jan 24 13:42:09 truenas kernel: RSP: 002b:00007ffe7246f300 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 24 13:42:09 truenas kernel: RAX: ffffffffffffffda RBX: 0000560caae35960 RCX: 00007f11f42c3c5b
Jan 24 13:42:09 truenas kernel: RDX: 00007ffe7246f3c0 RSI: 0000000000005a02 RDI: 0000000000000003
Jan 24 13:42:09 truenas kernel: RBP: 00007ffe72473ab0 R08: 00007f11f4399450 R09: 00007f11f4399450
Jan 24 13:42:09 truenas kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000560caad10dd0
Jan 24 13:42:09 truenas kernel: R13: 00007ffe7246f3c0 R14: 00007f11e4004cb0 R15: 0000560caad3fec0
Jan 24 13:42:09 truenas kernel:
Jan 24 13:42:09 truenas kernel: Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

Mike_Horist · January 24, 2025, 8:15pm

middlewared log :

[2025/01/24 13:31:37] (INFO) PoolService.import_on_boot():434 - Start bring up of NVMe/RoCE
[2025/01/24 13:31:37] (INFO) PoolService.import_on_boot():443 - No JBOFs need to be configured
[2025/01/24 13:31:37] (DEBUG) PoolService.import_on_boot():464 - Creating ‘/data/zfs’ (if it doesnt already exist)
[2025/01/24 13:31:37] (DEBUG) PoolService.import_on_boot():471 - Creating ‘/data/zfs/zpool.cache’ (if it doesnt already exist)
[2025/01/24 13:31:37] (DEBUG) PoolService.import_on_boot_impl():309 - Importing ‘Spinner’ with guid: ‘3436693175471485359’
[2025/01/24 13:33:38] (INFO) InterfaceService.configure():20 - Configuring interface ‘eno1’
[2025/01/24 13:33:38] (DEBUG) InterfaceService.configure():99 - eno1: adding 192.0.1.10/255.255.255.0
[2025/01/24 13:33:39] (INFO) InterfaceService.configure():20 - Configuring interface ‘eno2’
[2025/01/24 13:33:39] (DEBUG) InterfaceService.configure():99 - eno2: adding 192.0.2.10/255.255.255.0
[2025/01/24 13:33:39] (INFO) InterfaceService.sync():1748 - Interfaces in database: eno1, eno2
[2025/01/24 13:33:39] (INFO) InterfaceService.unconfigure():138 - Unconfiguring interface ‘enp2s0’
[2025/01/24 13:33:40] (INFO) RouteService.sync():83 - Adding IPv4 default route to 192.0.1.1
[2025/01/24 13:46:52] (WARNING) SystemDatasetService.sysdataset_path():112 - /var/db/system: mountpoint not found
[2025/01/24 13:46:52] (ERROR) middlewared.render_body():31 - Netdata configuration file could not be generated
[2025/01/24 13:46:52] (ERROR) ServiceService.start():203 - Service ‘netdata’ not running after start
[2025/01/24 13:46:52] (DEBUG) middlewared.upssched_event():306 - Shutdown not initiated as ups.status (OL CHRG LB) indicates ups is ONLINE (OL).
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task alert.flush_alerts to run every 3600 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task alert.process_alerts to run every 60 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task auditbackend._AuditBackendService__lifecycle_cleanup to run every 86400 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task auth.twofactor.remove_expired_secrets to run every 86400 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task certificate.renew_certs to run every 86400 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task certificateauthority.crl_generation to run every 86400 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task docker.state.periodic_check to run every 86400 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task kerberos.keytab.check_updated_keytab to run every 3600 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task kmip.sync_keys to run every 86400 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task mail.send_mail_queue to run every 600 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task pool.dataset.sync_db_keys to run every 86400 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task rate.limit.cache_clear to run every 600 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task reporting.netdataweb_expire to run every 600 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task service.check_deprecated_services to run every 3600 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task smb.sharesec.check_share_info_tdb to run every 3600 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task truecommand.health_check to run every 1800 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task vmware.delete_pending_snapshots to run every 10800.0 seconds
[2025/01/24 13:48:14] (DEBUG) middlewared._setup_periodic_tasks():1118 - Setting up periodic task zettarepl.flush_state to run every 3600 seconds
[2025/01/24 13:48:14] (ERROR) middlewared.job.run():522 - Job <bound method DirectoryServices.setup of <middlewared.plugins.directoryservices.DirectoryServices object at 0x7f8018aaa350>> failed: CallError(‘Skipping directory service setup due to SMB service being unconfigured’)
[2025/01/24 13:52:20] (WARNING) SystemDatasetService.sysdataset_path():112 - /var/db/system: mountpoint not found

Mike_Horist · January 24, 2025, 8:25pm

Also what I’m seeing is hte jobs
pool.dataset.sync_db_keys and pool.import_on_boot are both hanging , staying at 0%

Mike_Horist · January 24, 2025, 9:02pm

Final update, I found in the middlewared.log that it took over 2 hours last night to mount my pool (looking at previous times it’s rebooting, it was very very quick as in under a minute), is it maybe doing a rebuild? I wish I knew how to see what is happening not just looking at the jobs in the UI sitting at 0%

neofusion · January 24, 2025, 9:07pm

Do you have a different drive you could install TrueNAS to?
Maybe your boot-pool has been corrupted.

Be careful not to install over your data pool drives.

If your data pool (or pools) are encrypted using key encryption, make sure you save a copy of the key before it’s too late.

Mike_Horist · January 24, 2025, 9:24pm

Good suggestion.

I do have some spare NVME drives (I believe the system boot drive is NVME) that I could toss in , then I would need to import the pool correct?

The hard drive lights are all blinking like crazy, so it does seem that it’s doing something (plus the IOWait , it’s not 60-90% now, but is staying at a constant 11-13% with no activity going on.

I’ll leave it up for a bit longer (I kept rebooting to see if I could see anyting on startup, so it never got to the point of finishing importing the pool)

Fortunately I don’t have encryption enabled (this is a personal home setup) .

Thanks!

neofusion · January 24, 2025, 9:35pm

That depends on if you get that far.
The first step is simply to see if you, in a fresh install, can get zpool status to not stall out. The output of zpool status is instrumental to know what’s going on.

If TrueNAS fails, maybe because it tries to auto import it during boot, perhaps a different livebooted Linux based OS (ubuntu maybe?) can see something.

Mike_Horist · January 25, 2025, 2:03am

To Provide an update as of now in case it helps anyone

I don’t know if i’m out of the water yet, but, after 3 hours without me meddling (trying to change things, rebooting) the pool finally loaded.

my NFS and Samba services are not running, but I’m guessing they’ll start upon a good reboot.

since I can ssh into the box, and access all of the data, I’m in process of copying everything to my 2nd NAS (my old Thecus N8800 … not as much storage since it’s only 8x3TB and in RAID6, but has enough space for me to scp the files between the devices)

Once I have a good backup, I’ll try rebooting, and if that fails… I’ll build a new instance of TrueNAS on a spare NVME drive and try to import the pool to that.

I’m still wishing I could find a way to at least see what it was doing for those 3 hours (the 2 jobs mentioned that were at 0% eventually were marked failed … but I can access the data so that’s the important part)

So , as of now my advice to anyone facing something similar… be patient and wait …