Hi all, new member here - thanks to the admins for approving my membership. I’ve been reading lots of forum content as I started my TrueNAS journey a couple of weeks ago - so many thanks to all those contributors, and I have some IT experience but am a relative beginner in the Linux space.
My system:
TrueNAS Scale Dragonfish-24.04.2.2 as barebone host (not using Virtualisation)
Motherboard - Supermicro X11SSH-LN4F
Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz
32GB matched DDR4 ECC 2400 RAM in 4 DIMMS
2 x 8TB WD Red NAS drives in mirrored pool
1 x 290GB drive as single drive pool for temp use
My problem:
After a few weeks of enjoying networked storage and SMBs for a separate stand-alone Plex Media Server without any issues. I’ve recently been experiencing TrueNAS and the underlying Debian OS locking up, with loss of the SMBs, no access to web GUI and the console not responding, although the IPMI GUI is still functional through it’s own NIC.
A check of var/log/messages - for TrueNAS and /var/log/syslog for Debian doesn’t produce any clues for the failures - if anyone else can point me at other useful logs, I can continue my fault finding, but, as my system is about as vanilla as you can get, I’ll run some memory and CPU soak tests and in the meantime, I was interested in trying to get the IPMI Watchdog working as the Supermicro website/manual suggests it should.
My understanding is there are two parts:a hardware Watchdog component - enabled in the BIOS and which should respect a board jumper - JWD1, and a software component built into the Debian OS - ipmitool.
From the descriptions I’ve read on here and other IPMI related posts, I understood that enabling the Watchdog in BIOS will result in the system restarting at 5mins after the Watchdog timer runs down, that concurs with my checks after booting up.
I understood that using ipmitool commands within the Debian shell, either at the console or from the TrueNAS system options, could interrogate and report on this BIOS Watchdog timer and reset it, and from the command help.
I can see that there are three main functions that I could get working:
ipmitool mc watchdog off - turns off the timer
ipmitool mc watchdog reset - resets it to the default or custom time setting
ipmitool mc watchdog get - reports on the status of the timer, e.g.
I think there’s an ipmitool mc watchdog set, with multiple options, but I couldn’t get the syntax to work, however the get command gives useful output:
user@truenas[~]$ ipmitool mc watchdog get
Watchdog Timer Use: SMS/OS (0x04)
Watchdog Timer Is: Stopped
Watchdog Timer Logging: On
Watchdog Timer Action: Power Cycle (0x03)
Pre-timeout interrupt: None
Pre-timeout interval: 0 seconds
Timer Expiration Flags: (0x10)
* SMS/OS
Initial Countdown: 300.0 sec
Present Countdown: 0.0 sec
However, the get report above does not reflect the BIOS timer if Watchdog is enabled in the BIOS, and the BIOS Watchdog happily restarts at 5mins after boot up regardless of what is set by ipmitool.
I was expecting to see the get report display the current BIOS default timer countdown as being the default 5mins less the boot-up time, e.g. 150secs, because as my system takes around 150secs to boot, but the ipmitool get command only shows the timer as set/reset by the ipmitool command.
It also seems that the BIOS timer takes priority and there’s no way to extend or reset it from the ipmitool, likewise, if the WATCHDOG is disabled in BIOS and I use ipmitool to set a timer, it will also implement the action in the Watchdog settings when its timer runs out regardless of the BIOS setting, in my case I’ve kept the default 300secs and for it to power cycle.
I have used the raw command to choose OS/SMS for the Timer Use: and Powercycle for my action, as I couldn’t get the ipmitool mc watchdog set command to work.
I think watchdog and the IPMI feature on the Supermicro boards are fantastic features and accessing the IPMI web gui separately to the host OS is really useful, but can’t understand why I can’t get the watchdog feature to work as I understand it should.
I’ve searched through this forum for Watchdog queries and other fora for IPMI/Watchdog questions, but haven’t found anyone that describes the same understanding of how it should work as I do, so I’m wondering if my understanding is wrong?
If anyone can help, it will help me better understand the capabilities and limitations of Debian and IPMI features?
Very many thanks