We’re considering using TrueNAS for sharing some VM storage to a small group of ESXi hypervisors. For testing, I’ve been trying to create or move virtual machines to and from an NFS41 datastore mounted to an ESXi version 8 host.
Often the clone / relocate operations fail with: “the virtual disk is either corrupted or not a supported format”.
I wasn’t sure where the problem was, so I asked Broadcom/VMware for help. They asked me to update to the latest versions of everything and perform various extra tests.
Their detailed analysis of logs generated by the host indicated this:
The host’s NFS 4.1 client checks for changeID to figure out if it needs to flush or update its cache. This relates to getAttr/setAttr operations or responses. If the current changeID received is greater than the previous changeID stored in the cache, it updates its entry in the cache with the current reponse’s value. We suspect the storage server is not updating the changeID.
We strongly recommend engaging the storage vendor on this - since tests with NFS version 3 in the same environment work without issues.
Any chance I could get access to the host vmkernel logs and the vmware.log of an impacted machine to see the detailed errors? I’ll send you a DM with a drop location if that’s okay, and you’ve confirmed there’s no sensitive data or PII in there.
NFSv4 has been somewhat problematic with VMware in my experience due to the stateful nature (even outside of TrueNAS) - I tend to use NFSv3 or iSCSI with those initiators, preferring the latter because of the VAAI support available in TrueNAS.
Can I ask if your test TrueNAS machine is virtualized as well?
Yup, I’ll try to extract those files from the host’s support bundle. Otherwise if the host has already overwritten the NFS test log entries, I’ll run the tests again
For our use-case, stuff like performance isn’t a concern, I think NFS3 will work fine. That said, given how long NFS41 has been around, it feels like NFS41 really should work fine too
I think I’ve found the vmkernel log file, but vmware.log files associated with VMs don’t have recent log entries inside. Is there another file I should look for instead?
@Lx0 let’s start with the vmkernel.log - I’ve been trying to reproduce this on my own (small) cluster here but oddly enough I can’t seem to provoke the same kind of failure.