Wg0 link going up/down repeatedly caused reboot?

Hi,
I was wondering if anyone had any insight on a certain running theory.
We have multiple custom built Supermicro TrueNAS systems that rebooted around the same time this morning, even in two different locations. We also recently started using TrueCommand and WireGuard on our SANs. I noticed when looking at /var/log/messages that it was showing the wg0 link going up and down repeatedly right before the SANs rebooted and the Supermicro IPMI showed Watchdog 2 Timer Interrupt and Power Cycle assertion. Would it be possible that the links going up and down would cause Watchdog to push a reboot or is it just another symptom?

Appreciate any knowledge in the matter.

That is definitely an odd failure, and seems a bit crazy. While waiting for an answer, maybe file a bug report. Multiple machines and in different locations, odd. Out of curiosity, were they in the same timezone? Just trying to tie this together.

1 Like

Thanks for the reply.

They are actually not in the same timezone. On the topic of time, I noticed as well that the SANs were not set to the correct time for the timezone they were in until after they rebooted. I’m not the one who set up the TrueCommand so I’m not sure if it was TrueCommand that changed the time. All in all, not sure if the bad time settings could have thrown something off as well.

Watchdog doing watchdog things is entirely within the realm of possibility.

Paste results of ipmitool mc watchdog get but I suspect you’ll need to disable the watchdog in the BIOS as well as possibly with the JWD1 jumper (I believe that’s what Supermicro labels it as) to tell it to stop trying to be helpful.

1 Like

root@ARKStor0[~]# ipmitool mc watchdog get
Watchdog Timer Use: SMS/OS (0x44)
Watchdog Timer Is: Started/Running
Watchdog Timer Actions: Power Cycle (0x03)
Pre-timeout interval: 34 seconds
Timer Expiration Flags: 0x00
Initial Countdown: 137 sec
Present Countdown: 127

So it’s definitely getting something that’s making it think it wants to restart, because your Present Countdown is less than the Initial Countdown value.

I believe ipmitool mc watchdog off should be the Supermicro command to tell it to stop, but you’ll more likely want to also disable it in your BIOS and potentially also with the physical jumper on the board.

1 Like