NFS Kernel panic on Truenas Scale ElectricEel-24.10.0.2

Hi,

I am having constant kernel panics on TrueNas scale ElectricEel-24.10.0.2.

The following messages are constantly printed in the kernel log.

receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000099d7accb xid ca9f48e2
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000ce3c3c3b xid 98b994d2
receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000099d7accb xid cb9f48e2

Subsequently, the nfs hangs requiring a system reboot to make it functional again.

Dec 03 06:08:08.635064 nas kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000099d7accb xid cd9f48e2
Dec 03 06:08:53.824917 nas kernel: rpc-srv/tcp: nfsd: sent 963716 when sending 1048688 bytes - shutting down socket
Dec 03 06:08:53.825363 nas kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000078fe01fa xid 1f1428df
Dec 03 06:11:13.088816 nas kernel: rpc-srv/tcp: nfsd: sent 201028 when sending 1048688 bytes - shutting down socket
Dec 03 06:11:13.089309 nas kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000078fe01fa xid 341428df
Dec 03 06:11:13.092418 nas kernel: receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 0000000078fe01fa xid 351428df
Dec 03 06:14:35.839497 nas kernel: INFO: task nfsd:345490 blocked for more than 120 seconds.
Dec 03 06:14:35.843823 nas kernel:       Tainted: P           OE      6.6.44-production+truenas #1
Dec 03 06:14:35.843935 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 03 06:14:35.844005 nas kernel: task:nfsd            state:D stack:0     pid:345490 ppid:2      flags:0x00004000
Dec 03 06:14:35.844102 nas kernel: Call Trace:
Dec 03 06:14:35.844177 nas kernel:  <TASK>
Dec 03 06:14:35.844240 nas kernel:  __schedule+0x349/0x950
Dec 03 06:14:35.844305 nas kernel:  schedule+0x5b/0xa0
Dec 03 06:14:35.844378 nas kernel:  schedule_timeout+0x151/0x160
Dec 03 06:14:35.844452 nas kernel:  wait_for_completion+0x86/0x170
Dec 03 06:14:35.844580 nas kernel:  __flush_workqueue+0x144/0x440
Dec 03 06:14:35.844660 nas kernel:  ? __queue_work+0x1bd/0x410
Dec 03 06:14:35.844712 nas kernel:  nfsd4_destroy_session+0x1ce/0x2b0 [nfsd]
Dec 03 06:14:35.844795 nas kernel:  nfsd4_proc_compound+0x356/0x680 [nfsd]
Dec 03 06:14:35.844846 nas kernel:  nfsd_dispatch+0xee/0x200 [nfsd]
Dec 03 06:14:35.844919 nas kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd]
Dec 03 06:14:35.844993 nas kernel:  svc_process_common+0x2f5/0x6f0 [sunrpc]
Dec 03 06:14:35.845118 nas kernel:  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
Dec 03 06:14:35.845176 nas kernel:  ? __pfx_nfsd+0x10/0x10 [nfsd]
Dec 03 06:14:35.845239 nas kernel:  svc_process+0x131/0x180 [sunrpc]
Dec 03 06:14:35.845303 nas kernel:  nfsd+0x84/0xd0 [nfsd]
Dec 03 06:14:35.845366 nas kernel:  kthread+0xe5/0x120
Dec 03 06:14:35.845466 nas kernel:  ? __pfx_kthread+0x10/0x10
Dec 03 06:14:35.845547 nas kernel:  ret_from_fork+0x31/0x50
Dec 03 06:14:35.845600 nas kernel:  ? __pfx_kthread+0x10/0x10
Dec 03 06:14:35.845649 nas kernel:  ret_from_fork_asm+0x1b/0x30
Dec 03 06:14:35.845698 nas kernel:  </TASK>
Dec 03 06:14:35.845760 nas kernel: INFO: task nfsd:345491 blocked for more than 120 seconds.
Dec 03 06:14:35.848613 nas kernel:       Tainted: P           OE      6.6.44-production+truenas #1
Dec 03 06:14:35.848713 nas kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

I have a relatively simple setup with nfs server on truenas running on ESXi with the IT mode controller passed through to the VM. I am exporting this nfs share to a debian bookworm host running plex.

NFSv3 or v4 being used?

@Dhiru see if my response in TrueNAS NFS random crash - #28 by Stebu helps

Thanks! I have used the fix and so far there haven’t been any kernel panics. However these log messages are still there. I don’t mind these logs as long as the nfs server works.

[386722.796590] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000567155e5 xid e6f2ecc8
[386722.798274] receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt 00000000567155e5 xid e7f2ecc8```