I have become the owner of a pair of X20 systems. The inventory of each system: 2 x controllers, single PCIE 10Gbit ethernet card with 2 ports, 64 Gig memory (4 x 16Gb DDR4 ECC on SODIMM), and dual SAS-3 external connectors on each controller. Dual power supplies and 9 x 12Gb Drives make up the rest of the system. They look great and all that. The controllers appear to be XyRatex based.
My problem - I cannot communicate with any of the ports on any controller. I’ve tried the 2 different RS232 ports (on 3.5mm connectors) - the console port at 115200 Baud n81 and the OOBM port at 38400 Baud n81 - per the startup manual. I’ve tried using the MGMT ethernet port, both with DHCP and with the static 192.168.100.100/101 addresses. I’ve tried pulling the coin CMOS battery on each controller and letting the controllers sit for half an hour before restoring the battery. I’ve also tried looking for LLDP traffic on both the MGMT port as well as the 2 10Gb ports. I’ve also tried looking in ARP for resolved addresses on the switch ports (1Gb and 10Gb). Nothing has yielded any hints or useful results.
The console port is most curious - it emits data that suggests a mismatched baud rate. I’ve tried the “$%^0” sequence and get back random garbage. I’ve also tried monitoring the data coming through the monitor port after rebooting the system - still nothing but random garbage.
I’ve observed that the system appears to actually be loading up - it takes the 10Gb ports about a minute or so to activate after the system reboots, and the hard drives show minimal “scanning” activity every couple of minutes (as observed on the drive activity lights). I just can’t find any way to configure ports so I can erase everything on the system.
Not yet tried: plugging in a USB2 VGA port and keyboard into the controller so I can attempt a BIOS takeover of the controller/BIOS/BMC/IPMI, or try loading a new OS image on the 128 Gig M2.SATA drive on the controller.
Resolved - in a way. I found the IP address of the management ports and was able to get into the web UI. Now I have a different problem that I’ll post about in a new topic shortly.
SmallBarky suggested that I would be better to post the second problem here, so here it goes…
I can access the new system via the WebUI now, but every IP address that is assigned (found via nmap) - the WebUI reports that the node is a “Standby Node” and I can’t log in.
I can SSH into the system and run various commands like midclt and /etc/netcli, nothing however resolves the problem (getting the node out of “standby”).
If it matters, the system is running TrueNAS-11.2-U8.1 - FreeBSD based.
The error I get when trying to disable failover via midclt:
root@X20-1A[~]# midclt call failover.update ‘{“disabled”: true}’
[ENOMETHOD] Method “update” not found in “failover”
Traceback (most recent call last):
File “/usr/local/lib/python3.6/site-packages/middlewared/main.py”, line 1066, in _method_lookup
methodobj = getattr(serviceobj, method_name)
AttributeError: ‘FailoverService’ object has no attribute ‘update’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/usr/local/lib/python3.6/site-packages/middlewared/main.py”, line 166, in call_method
result = await self.middleware.call_method(self, message)
File “/usr/local/lib/python3.6/site-packages/middlewared/main.py”, line 1087, in call_method
serviceobj, methodobj = self._method_lookup(message[‘method’])
File “/usr/local/lib/python3.6/site-packages/middlewared/main.py”, line 1068, in _method_lookup
raise CallError(f’Method “{method_name}” not found in “{service}”', CallError.ENOMETHOD)
middlewared.service_exception.CallError: [ENOMETHOD] Method “update” not found in “failover”
Do you know what version of TrueNAS is installed. HA features are only on the Enterprise versions and most of us on the forums won’t have experience with those. It may take a bit to hear from someone else or a TrueNAS employee.
I am the one who did the wipe of the system - my employer was disposing of the systems.
None the less, the problem I am hitting now is that the controllers both think they are the standby node and won’t present the login UI. I can ssh into both nodes and the basic problem as far as I can see is the with the CARP states. Because of the CARP state mess, I can’t disable HA with hactl - Node status: Faulted and failover status unavailable with secondary message “nodes CARP states do not agree”. At this point, I would be fine with completely disabling HA just to get to a functional UI.
Okay, I guess my disconnect is why you need to get into the current UI. Is the current configuration preventing you from reinstalling using a USB stick?
If the configuration is stored at least partially on the controllers themselves (?), perhaps that can be modified using the BIOS or IPMI.
My initial hope was to use the X20s (I got 2 of them) as they are - create a new ZFS on each of them and then share the volumes out at NFS and maybe S3. That is why I was trying to find the IP addresses on them. It took a while but I finally found that and I was briefly able to access the UI. That quickly went to crap when all 4 controllers marked themselves as standby units and I could not get them to shake that idea.
Plan-b was to get IPMI access to the controllers via the management ethernet port (or even terminal access via the RS232 port). That is not going well either - even though I do have SSH access to all 4 controllers, none of them detects the IPMI devices and ipmitool therefore doesn’t do anything useful.
Plan-c is to get an actual USB-RS232 cable with a 3.5mm connector on the end and see if I can get that to work for console access. If that happens, I will reload each controller with CentOS and reuse the hard drives elsewhere - using the controllers as KVM hosts. As of now, I await the delivery of the RS232 cable (Amazon is your friend). Previous attempts at a hand-brew RS232 cable yielded problems like scrambled data - possibly due to a flakey PL2302 based USB-RS232 converter. We’ll see how this ends up…
When you do get past your blocking issues, consider restarting with a new install of the OS.
Manually updating from a relatively ancient TrueNAS 11.2 version to current seems like a time sink with little payoff.