Hi TrueNAS Community,
I wanted to share a workaround I found for a possible race condition affecting the NFS service startup on TrueNAS Core. I’ve experienced issues where the NFS service would not function correctly after the system boot process, but restarting it manually afterward works without problems.
Issue Description:
I have several TrueNAS Core systems that are used for NFS only, and only one exhibits this behavior. When the NFS service is set to start automatically, everything appears to be working normally, but sometimes NFS clients are unable to (re)connect, which I suspect is due to a race condition or startup timing issue with NFS services.
However, all of the NFS services appear to be running normally when I check via SSH:
service rpcbind status; service mountd status; service nfsd status; service statd status; service lockd status
I get the following:
rpcbind is running as pid 3596.
mountd is running as pid 3620.
nfsd is running as pid 3633 3634.
statd is running as pid 3648.
lockd is running as pid 3666.
When I have a client successfully connected to an NFS share, and I reboot the TrueNAS Core NFS box, it will report the usual “server “servername” not responding, still trying” until the NFS box has finished booting. However, the client will simply not read/write to the share (e.g., ‘df’ will hang) until I manually stop and restart the NFS service via the TrueNAS Core GUI (sometimes it takes a few tries of switching the service off and back on again). Afterward, everything works normally again without doing anything on the client.
Workaround:
By disabling the automatic start of the NFS service and instead using an init command to start the services, I was able to resolve/workaround the issue.
Steps to Resolve/Workaround:
- Disable Automatic NFS Service Start:
- Navigate to Services in the TrueNAS GUI.
- Disable the NFS service from starting automatically.
- Create an Init Command:
- Go to Tasks.
- Scroll down to Init/Shutdown Scripts.
- Click Add to create a new init script.
- Set Type to Command.
- In the Command field, enter the following command:
sleep 5; service rpcbind onestart; sleep 1; service mountd onestart; sleep 1; service nfsd onestart; sleep 1; service statd onestart; sleep 1; service lockd onestart
- Set When to Post Init.
- Enable the script by setting Enabled to Checked.
- Set Timeout to 30 Seconds.
- Save the init script.
Explanation:
The init command introduces a 5-second delay before starting the NFS-related services, followed by a 1-second delay between each service start. While it doesn’t seem to be absolutely necessary to include the delays, I’ve included them out of an abundance of caution since I suspect the issue may be timing-related.
System Details:
- TrueNAS-13.0-U6.2
- Supermicro X9SRL-F Motherboard, v3.3 BIOS
- Intel Xeon E5-2697 v2 CPU
- Dynatron R24 2U Server CPU Cooler
- 8x 32GB SK Hynix HMT84GL7AMR4C-RD ECC RAM
- 2x 120GB Kingston SSD as mirrored boot
- 48x Western Digital 8TB HDD
- 2x 4TB TeamGroup SSD as L2ARC
- Intel 800GB 910 Series SSD as SLOG
- Adaptec 71605 SAS Controller in HBA Mode
- Chenbro NR40700 4U Chassis w/4 PSU
- Intel X540-T2 Network Card
Other than the NFS service issue, the system is otherwise working perfectly. All temperatures are well within normal operating limits, eg.:
Questions:
While this workaround seems to resolve the issue, what steps can I take to further troubleshoot the issue of NFS not working properly when it is started automatically via TrueNAS Core?
Cheers,
Greg