How to debug Truenas Scale as it becomes unresponsive from time to time

Hi there,

I am using Truenas Scale for the last months and after some time I got used to it. Normally all runs smoothly and I have no problems.

Lately and to be honest maybe since I started using Truenas Scale gets unresponsive from time to time. I can’t connect to the WebGUI anymore but also the wired ethernet connection is not “active” connected to the Router anymore and even when I connect a monitor/keyboard it doesn’t show anything (doesn’t get a signal).

This all happens from time to time and sometimes I have a uptime of 3 days but somtimes even 14 days… my longest uptime streak was around 2 weeks.

I first thought it’s something with my Router but as I don’t get any response from the monitor/keyboard as well I am thinking it has todo with the System/Hardware itself.

My question now, how can I debug this and see some “older” log files before the system freezes? Are they saved somewhere? Because I found some commands but they show only the most recent log?

I am really a bit out of thoughts what todo and only a power circle helps in the end to reboot the computer. It’s all new hardware and it runs all smoothly till it freezes. Temps are all okay I check them frequently.

Thanks a lot if you have any idea how to debug a problem like that.

What version are you running?

TrueNAS-SCALE-23.10.2

Have you checked the logs in the debug file, Settings > Advanced >Save debug

Thanks a lot that is what I was searching.

I opened the Error log and it shows error when the system froze.

Now the question is what happened

May 29 08:22:14 truenas kernel: ixgbe 0000:04:00.0: Adapter removed
May 29 08:22:15 truenas kernel: ixgbe 0000:04:00.1: Adapter removed
May 29 09:10:00 truenas kernel: hid-generic 0003:1532:028D.0004: No inputs registered, leaving
May 29 09:10:01 truenas kernel: Error: Driver ‘pcspkr’ is already registered, aborting…
May 29 09:10:01 truenas kernel:
May 29 09:10:01 truenas kernel: NVRM: The NVIDIA GeForce GT 710 GPU installed in this system is
NVRM: supported through the NVIDIA 470.xx Legacy drivers. Please
NVRM: visit Unix Drivers | NVIDIA for more
NVRM: information. The 535.54.03 NVIDIA driver will ignore
NVRM: this GPU. Continuing probe…
May 29 09:10:29 truenas blkmapd[2602]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
May 29 09:10:29 truenas systemd[1]: Failed to start nslcd.service - LSB: LDAP connection daemon.
May 29 09:10:32 truenas libvirtd[2934]: invalid argument: cannot find architecture arm
May 29 09:10:32 truenas haproxy[4946]: backend be_20 has no server available!
May 29 09:10:33 truenas haproxy[4946]: backend be_32 has no server available!
May 29 09:10:33 truenas haproxy[5267]: backend be_20 has no server available!
May 29 09:11:00 truenas kernel: NVRM: The NVIDIA GeForce GT 710 GPU installed in this system is
NVRM: supported through the NVIDIA 470.xx Legacy drivers. Please
NVRM: visit Unix Drivers | NVIDIA for more
NVRM: information. The 535.54.03 NVIDIA driver will ignore
NVRM: this GPU. Continuing probe…
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:22 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:24 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:24 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:27 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:27 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:27 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:27 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:27 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available
May 29 09:11:27 truenas kernel: IPVS: rr: UDP 172.17.0.10:53 - no destination available

I think it’s actually this one as the rest was my attempt to connect my display to the server.

May 29 08:22:14 - truenas kernel: ixgbe 0000:04:00.0: Adapter removed

So it seems this is connected to my NIC which I installed. Guess I have to start searching there or use the internal LAN again which worked without problems. I got me an Intel x520 Dual NIC but guess it has problems or is the cheap chinese variant everyone warning about.

@ABain again thanks a lot I was exactly searching that error log to debug and find some clues :slight_smile:

1 Like

Hi everyone,

I had the same symptoms for the second time a couple of days ago: Server going dark, no network activity, no output on the DP, not reachable, all the reporting stops there when I look back. Only way to get it back is a hard reset.
As I said, this was the second time, the first time was a couple of weeks earlier. Aside from that the server was running smoothly since mid July.

Other than @ChristBKK I did not find anything in the logs: The last message in the messages log is 15 hours earlier. Nothing on this day at all in kern.log.
In error log the last entry is over nine hours before the system going dark at approximately 15:30:

Sep  6 02:26:42 wutzi systemd[1]: Failed to start apt-daily.service - Daily apt download activities.
Sep  6 06:00:43 wutzi systemd[1]: Failed to start apt-daily-upgrade.service - Daily apt upgrade and clean activities.

In syslog the last message before going dark is:

Sep  6 15:17:01 wutzi CRON[1702789]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep  6 15:20:07 wutzi systemd[1]: Starting sysstat-collect.service - system activity accounting 

So, I don’t actually see anything wrong there.
Does anyone have an idea where to look next?
Did you find the cause of your problem, @ChristBKK ?

The system is a repurposed Lenovo ThinkServer with a Xeon E3-1225 v6, 64GB ECC Ram, a LSI 9211-8i flashed to IT mode (originally Dell PERC H310) and 3 SSDs (2 in a mirror for apps and scratch, 1 for the boot disk) and 7 2,5’’ HDDs (5 in a raidz2, 2 in a mirror).

Kind regards,
ht

Hi again,

it happened again, not even one week from the last time.
And again I could not find anything in the logs to find out what happened. I have turned the log level to “debug”, but only afterwards. Hopefully that logs something of interest if it happens again.

Has anyone else seen something like this, the system just going completely dark but without turning off or logging any errors?

Kind regards,
ht