TrueNAS Scale HA Failover Expectations

ataeschner-KIT · August 7, 2024, 6:58pm

I’m currently using a new TrueNAS M40-HA Scale Enterprise system and I’m supporting an ESXi environment. What I’m trying to understand is what my expectations should be in regards to failover time from controller 1 to controller 2.

I have a zvol dataset shared via iSCSI with ALUA enabled, I’ve added the share into ESXi and made sure multi-path is enabled for either controllers path and it appears that to failover causes around a 30 second latency period of the disk being unavailable to the underlying OS’s. Control plane stays up and I can ping the VM continuously but if I’m on the device it’s just frozen up for around 30ish seconds. Is this intended and is there any fine tuning anyone’s performed to decrease this time required for the secondary controller to come online and the secondary I/O path becoming available sooner?

Any insight would be greatly appreciated.

ericloewe · August 7, 2024, 7:06pm

Good question, but you should ask your support contact directly. HA knowledge is thin on the ground here…

ataeschner-KIT · August 7, 2024, 7:53pm

Roger that! I wanted to get familiar with the community and see if there were any SME’s on HA out there. I’ll ping support and update this thread with any general recommendations or info that they provide.

awalkerix · August 7, 2024, 8:00pm

Enterprise HA is an enterprise-licensed feature only. There will be no community consumers of this feature. Please submit support and HA-related questions to the support team.

ataeschner-KIT · August 7, 2024, 8:15pm

Roger. Reached out to our support contact and received answers we were looking for.