Truenas X10-HA issues (2 different units)

We have 2 separate X10-HA units (clustered controllers) that did not survive an upgrade to v13.

Some history: The units have been taking out of a production environment to be used in a test set-up. Disks have been cleaned and the config reset. Both units came up and the cluster function restored after setting up a CARP interface and creating a pool.

One unit was still running V11 and the other V12. We performed an upgrade on both units. After the first node upgrade finished, it asked to perform a failover. After the failover it signalled to finish the update, which completed as expected and the cluster came back running either version 12 and eventually version 13.0-U6.8. We removed the pool because we wanted a different config of the disks, which required us to stop failover mode. After disabling failover mode, removing the pool and re-enabling the failover mode, neither of the 2 X10-HA units recovered. Both have a secundairy unit that won’t come back up. The unit can be pinged and if it is the only unit still connected, it even listens on the cluster IP/VIP, but no GUI.

We also tried the IPMI recovery procedure using the USB cable with 3.5m jack plug, but it fails to switch from SES to IPMI mode.

I know this is very specific, but does anyone have any experience on recovering this or gaining access to the IPMI console?

Extra info: When accessing the GUI on the secondary after it refuses business, it displays this message:

Waiting for Active TrueNAS controller to come up…

We even removed the primary unit to see if that would wake it up, recycled power many times, but to no avail.

The struggle continues. Since the X10-HA does not have a display port, no visual feedback occurs. We are stuck in a state where the X10 unit A and B seem to start, as we can ping them on their IP’s. The cluster IP also comes up. But neither of the 3 IP’s show a login prompt.

Unit A shows: Checking HA status
Active IP Addresses: CLUSTER-IP

Unit B shows: Connecting to TrueNAS … Make sure the TrueNAS system is powered on and connected to the network.

The cluster IP shows: Connecting to TrueNAS … Make sure the TrueNAS system is powered on and connected to the network.

We have tried a new switch with a default config. We removed the SFP interface in the hopes one of the other interfaces would come up and show a console.

The IPMI console does not work on either unit. After $%^0 and two enters, nothing happens and the console seems stuck. After a $%^2 with two enters, i return to the SES console.

Am i missing steps? How do i gain access?

My next move may be to buy an Display card for PCI-E and put in in place ofd the SFP+ card.
If i unplug power an plug it back in and quickly switch from SES to IPMI, i can see some info on the screen:

BIOS drive C: is disk0
/
FreeBSD/x86 boot
Default: zfs:freenas-boot/ROOT/13.0-U6.8:/boot/zfsloader
boot:

After that is is unresponsive again

Update to whomever gets to deal with this as well:

I purchased a small AST2400 PCI-e VGA card. With visibility back to 100% I could see what went wrong. One node was still on 12.x while the other went on to 13.x. The 12.x unit did not respond as it was half functional and the 13.x unit threw errors all around about files not existing.

Using the boot-menu I changed both units back to 12.0-U8.1, reset the config on both units and set them up each with new configs, which was mainly the interface IP configuration and creation of a pool. That took some pulling and pushing, but eventually they both noticed each other.

Hint: VHID needs to be different on each node, when setting up the failover interface and make sure you have not used the Failover group ID on another unit as well. Mine seem to have collided with an OpnSense firewall cluster. This would not have happened if I had a separate, dedicated interface for CARP.

I retried the upgrade to 13.0-U6.8 afterwards. It took a while and some patience. The main thing to watch here is the switching of the nodes. The GUI just gives you a small box telling you it still performing and upgrade. You need to confirm with Continue, wait a very long while (which I could see now due to the VGA card) and repeat the exact action again: confirm with Continue.

Note: And after all these upgrades the terminal connection using the USB/MiniJack connection still does not switch over from SES to the IPMI console using $%^0.

1 Like