I use Truenas Scale Dragonfish-24.04.2. Within a week, I had a problem with accessing network shares for the second time.
It seems that the problem starts with trying to add an attachment to some program, e.g. Thunderbird. The user opens the network drive using the “Open” option, then wants to enter some directory. At this point, the Open window in Windows hangs and blocks. The strange thing is that this never happens when we open a directory using Microsoft File Explorer, always using Open in some program. On other computers, the same resources are available unless someone uses the Open Window again. Apart from the time of the failure, we can use the Open tool in Windows normally.
When I execute the command “ps -aux | grep smbd” during a failure on TRUENAS, I see processes with status “D”. I also observe a significant increase in “System Load Average”, without an increase in CPU load. I also checked iostat -x. I do not see anything disturbing here either. Restarting samba doesn’t help because you can’t kill smbd processes with status “D”. I have to restart Truenas.
Unfortunately, I didn’t find anything disturbing in the logs. Maybe with the exception of the entries:
[2024/09/30 09:21:14.021888, 0] …/…/source3/smbd/smb2_trans2.c:2394(smbd_do_qfsinfo)
get_user_quota: access_denied service [dzialy] user [WENUS\xxx]
Trunas had been working properly for at least 3 years before.
Please give me some tips on how to diagnose this problem.
I find log in syslog:
Sep 30 09:21:55 ds-wenus kernel: INFO: task spl_delay_taskq:649 blocked for more than 120 seconds.
Sep 30 09:21:55 ds-wenus kernel: Tainted: P OE 6.6.32-production+truenas #1
Sep 30 09:21:55 ds-wenus kernel: “echo 0 > /proc/sys/kernel/hung_task_timeout_secs” disables this message.
Sep 30 09:21:55 ds-wenus kernel: task:spl_delay_taskq state:D stack:0 pid:649 ppid:2 flags:0x00004000
Sep 30 09:21:55 ds-wenus kernel: Call Trace:
Sep 30 09:21:55 ds-wenus kernel:
Sep 30 09:21:55 ds-wenus kernel: __schedule+0x349/0x950
Sep 30 09:21:55 ds-wenus kernel: schedule+0x5b/0xa0
Sep 30 09:21:55 ds-wenus kernel: schedule_timeout+0x151/0x160
Sep 30 09:21:55 ds-wenus kernel: wait_for_completion_state+0x156/0x220
Sep 30 09:21:55 ds-wenus kernel: call_usermodehelper_exec+0x16e/0x1a0
Sep 30 09:21:55 ds-wenus kernel: zfsctl_snapshot_unmount+0xda/0x1a0 [zfs]
Sep 30 09:21:55 ds-wenus kernel: snapentry_expire+0x65/0x100 [zfs]
Sep 30 09:21:55 ds-wenus kernel: taskq_thread+0x1e1/0x350 [spl]
Sep 30 09:21:55 ds-wenus kernel: ? __pfx_default_wake_function+0x10/0x10
Sep 30 09:21:55 ds-wenus kernel: ? __pfx_taskq_thread+0x10/0x10 [spl]
Sep 30 09:21:55 ds-wenus kernel: kthread+0xe8/0x120
Sep 30 09:21:55 ds-wenus kernel: ? __pfx_kthread+0x10/0x10
Sep 30 09:21:55 ds-wenus kernel: ret_from_fork+0x34/0x50
Sep 30 09:21:55 ds-wenus kernel: ? __pfx_kthread+0x10/0x10
Sep 30 09:21:55 ds-wenus kernel: ret_from_fork_asm+0x1b/0x30
Sep 30 09:21:55 ds-wenus kernel:
I also see hundreds of entries similar to:
mnt-pool-dzialy-.zfs-snapshot-auto\x2d2024\x2d09\x2d13_07\x2d30.mount: Deactivated successfully.
There are as many entries as snapshots. I don’t see such entries on my other Truenas. What mechanism is this related to? Does it have anything to do with the “Enable Shadow Copies” mechanism running in smb?