TrueNAS SCALE - UPS service spams logs in "netclient" mode

vom · April 19, 2024, 6:41pm

I have 23.10.2 running. My UPS setup is using nut in “netclient” mode (slave). So there is no UPS connected to TrueNAS via USB/serial etc. It simply is a network client to the machine that is connected to the UPS via USB.

upsmon seems to be running correctly, the logs look okay there.

‘ups -c ups’ on the master shows the TrueNAS IP as a client.

However it seems on TrueNAS - nut-driver constantly tries to restart. I believe this is part of the nut logic that ‘scans’ for attached UPSes (nut-driver-enumerator). This results in massive journal log spam.

Here is a snippet:

Apr 19 13:49:26 truenas systemd[1]: Starting nut-driver@ups.service - Network UPS Tools - device driver for ups...
░░ Subject: A start job for unit nut-driver@ups.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A start job for unit nut-driver@ups.service has begun execution.
░░ 
░░ The job identifier is 11337276.
Apr 19 13:49:26 truenas nut-driver@ups[3920850]: FATAL: The '/etc/nut/ups.conf' file does not exist or is not readable
Apr 19 13:49:26 truenas nut-driver@ups[3920849]: FATAL: Could not find a NUT device section for service unit ups
Apr 19 13:49:26 truenas systemd[1]: nut-driver@ups.service: Control process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ An ExecStart= process belonging to unit nut-driver@ups.service has exited.
░░ 
░░ The process' exit code is 'exited' and its exit status is 1.
Apr 19 13:49:26 truenas systemd[1]: nut-driver@ups.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ The unit nut-driver@ups.service has entered the 'failed' state with result 'exit-code'.

Also I notice the job restart counter:

Apr 19 13:49:26 truenas systemd[1]: nut-driver@ups.service: Scheduled restart job, restart counter is at 153341.

For now I’ve manually stopped the nut-driver@ups.service - but I imagine that won’t survive a reboot.

I’m guessing there maybe needs to be something in middleware that can tell by the config options that we are a pure netclient only and therefore disable / stop this service ?

Also let me know if this warrants opening a Jira bug.

Thanks.

Davvo · April 20, 2024, 8:38am

You can setup a simple post boot script.

vom · April 20, 2024, 3:01pm

Actually just disabling this in systemd survived a reboot so I guess this is the workaround for now.

Setarcos · May 6, 2024, 8:41pm

+1: I’m still seeing this with 24.04.0

The netdata error.log has a set of these added every second:

 --- BEGIN TRACE ---
Error: Connection failure: Connection refused
 --- END TRACE ---
2024-05-06 09:33:32: charts.d: : nut_ups: command 'upsc -l ' failed with code 1:
 --- BEGIN TRACE ---
Error: Connection failure: Connection refused
 --- END TRACE ---
2024-05-06 09:33:32: charts.d: : nut_ups: command 'upsc ix-dummy-ups ' failed with code 1:
 --- BEGIN TRACE ---
Error: Connection failure: Connection refused
 --- END TRACE ---
2024-05-06 09:33:32: charts.d: : nut_ups: command 'upsc ix-dummy-ups ' failed with code 1:
 --- BEGIN TRACE ---
Error: Connection failure: Connection refused
 --- END TRACE ---

BobBenton · May 8, 2024, 4:34pm

Same issue here on two machines running 23.10.2. One master, one slave. Disabled the nut-driver service as above. Stopped the log spam even after reboot.
I haven’t tested a ups power loss to see if it’s still actually working as expected, but will try to get that done today.

BobBenton · May 22, 2024, 6:23pm

Finally got it tested with power outage. Both server and client working as they should.

MilhouseVH · December 6, 2024, 2:55am

The log spam, which continues on 24.10 at the rate of one new error logged per second, is from the UPS graph module, and the failure explains why the UPS graphs are all blank for a remote UPS.

/usr/lib/netdata/charts.d/nut_ups.chart.sh doesn’t support a UPS when it is in netclient (slave) mode - it attempts to query for the UPS name using upsc -l which fails as the host is not specifed (it needs to use upsc -l host when in netclient mode, where host is the remote UPS Master)

Hopefully this can be fixed so that a slave UPS can be graphed, or at least the upsc -l spam needs to be stopped - my 24.10 system has only been up for 6 hours and there are already over 30,000 upsc -l errors in /var/log/netdata/error.log.

MilhouseVH · December 6, 2024, 3:15am

I would file a bug report because I’m pretty sure I can point out the file and line where the problem is, but every attempt to open a Jira account has unfortunately failed. \o/

Anyway, if a developer could take a look at /usr/lib/netdata/charts.d/nut_ups.chart.sh and adapt it to also support netclient mode that would be super.

I think I’ll have to disable UPS monitoring because all this spam is going to trash /var/log…

MilhouseVH · December 14, 2024, 9:08pm

For anyone that is interested, I’ve documented a fix for Slave UPS in 24.10 - this should work until an upstream fix arrives, if it ever arrives.

https://ixsystems.atlassian.net/browse/NAS-132924?focusedCommentId=289352