I’m currently using a new TrueNAS M40-HA Scale Enterprise system and I’m supporting an ESXi environment. What I’m trying to understand is what my expectations should be in regards to failover time from controller 1 to controller 2.
I have a zvol dataset shared via iSCSI with ALUA enabled, I’ve added the share into ESXi and made sure multi-path is enabled for either controllers path and it appears that to failover causes around a 30 second latency period of the disk being unavailable to the underlying OS’s. Control plane stays up and I can ping the VM continuously but if I’m on the device it’s just frozen up for around 30ish seconds. Is this intended and is there any fine tuning anyone’s performed to decrease this time required for the secondary controller to come online and the secondary I/O path becoming available sooner?
Any insight would be greatly appreciated.