Switched to TrueNAS Core 13.3, so far so good. Uptime: 18 days, 2:31 as of 16:17
This is the first time the uptime longer than one week without NFS died.
Anyway, until the upstream bug get solved, avoid TrueNAS Scale, the NFS feature is not useable.
Use TrueNAS Core instead, it works well.
I not done already and if possible please open a ticket on this.
Please provide a debug (ixdiagnostic).
Also, if possible, please provide reproduction steps.
Thanks, but I have switched to TrueNAS CORE 13.3 and it works great now, so it may difficult to get debug log now unless I switch back to TrueNAS SCALE.
I can only reply based on my memories now.
First, this hangs only occurs on nfs v4, nfs v3 works. But nfsv3 causes random bus error on file read/write, so I have to use nfs v4.
Second, in my memory, I can get the clients in the client list in the GUI, there are three types of client.
Working clients
Died clients
Connecting Clients
In a random tick, the nfs server just dies(nfsd becomes D state and logs in dmesg). Then:
Existing clients may continue working, or died at any time. After it died, the ālast handshakeā no longer update at server GUI.
No any new connection. If I connect to the server via new client, The client just hangs, and I can see it in server GUI, but stuck in a state(not establish state).
By the way, it seems this problem are not only on TrueNAS, but I saw some similar bug reports on debian or other linux based operation systems.
Itās very annoying, especially since I have to reboot the whole server (which takes forever, as killing the NFS-Server does not work and when the systemd timeout is reached it just increases more and more).
I mostly end up resetting the server after waiting for about 5 minutes which I really donāt like doing.
I only currently need this for my mediastack VM running docker. I am going to migrate these containers to EE natively as soon as I update so I hope I wonāt see this issue again.
I encountered the same issue as others in this thread regarding running TrueNAS-SCALE-24.04.2 with NFS shares. After some time, both the NFS clients and the NFS server would hang, and I noticed the nfsd hung_task_timeout_secs message in the kernel.log/syslog/dmesg.
With TrueNAS, ARC is configured to use all available memory by default. You can check the current values by running sudo arc_summary in the shell. I adjusted the zfs_arc_max value to 75-85% of my total available memory, and since making this change, I havenāt experienced the issue againāitās been 33 days without any crashes.
To set the zfs_arc_max, go to System Settings => Advanced => Init/Shutdown Scripts => Add.
Type: Command Command: echo 51539607552 >> /sys/module/zfs/parameters/zfs_arc_max When: Post Init Enabled: Check mark
1GB = 1024 x 1024 x 1024 = 1073741824 48GB = 48 x 1GB = 51539607552
No, I havenāt looked into submitting a bug ticket for this. To do so, Iād need to create a new jira account, and honestly, Iāve already gone through the process of setting up a forum account just to share what worked for me, hoping it might help others.
I just wanted to chime in to say that I am experiencing the same issue (AMD EPYC 7453 processor on a Gigabyte MZ32-AR0 Rev. 3 motherboard). I encountered it on both 24.04 and 24.10. Reverting back to Cobia (23.10.2) with an otherwise identical configuration resolves the issue. (Unfortunately, because I reverted I wonāt have any further useful information for troubleshooting.)