I have several VMs running in TrueNAS-13.0-U6.1 and I am encountering a problem where I am seeing a watchdog soft lock on CPU cores similar to this:
Oct 23 06:09:51 homeassistant kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 17646s! [runc:[2:INIT]:70901]
However, it is not consistently this, it is sometimes other problems, sometimes related to Docker, sometimes it is not related to Docker. Sometimes it is journalId (or journald I can’t recall).
For information. TrueNAS is running on an i7-12700K with 64GB of RAM. It is running 10 Virtual Machines. Each VM runs 2 Docker Containers. Watchtower and a Service (Speedtracker, UptimeKuma, HomeAssistant etc).
The VMs are all up to date and running Ubuntu 22.04.4 LTS.
They are all allocated 2 Cores 4 Threads and 2 GB of RAM. The drives are all clean none are using more than 40% of the available space. I have checked RAM usage, I have checked CPU usage everything seems fine - until it isn’t.
The issue with the soft lock is that I cannot SSH into the machine and even using it via VNC is next to impossible, so it is very difficult to see what is going on.
I have attempted to log what is happening, but it just provides the lock up message, it doesn’t detail any reason why it happened. If there is a way to get more detailed logs let me know and I will set it up.
I post here, because this isn’t specific to a single VM, if this was one VM doing this repeatedly, I would put it down to that docker container or that service being a problem, but given that there are 10 VMs and they have all at one time or another had this watchdog lock up issue, I feel that perhaps it is something to do with the way that I created the VMs, the way they are set up, a failure to take a specific step, a failure to note some specific detail of TrueNAS that says you must do X or must not do Y that I am hoping someone can assist me with.
I searched the forum and found nothing related to this which did surprise me somewhat.