Tried to add the nvidia_smi conf file to /etc/netdata/python.d but can’t seem to get the Nvidia monitoring to show up.
Anyone know how to get it to work?
Tried to add the nvidia_smi conf file to /etc/netdata/python.d but can’t seem to get the Nvidia monitoring to show up.
Anyone know how to get it to work?
Got it working finally. For anyone running into the same issue:
Edit the file /usr/lib/netdata/conf.d/python.d.conf and add:
nvidia_smi: yes
For some reason a lot of directories were read only, including /usr so I had to run:
zfs set readonly=off boot-pool/ROOT/24.04.0/usr
Ran debugging to see if it worked (plugin located in /usr/lib/netdata/plugins.d):
./python.d.plugin nvidia_smi debug
And that worked so just had to restart the Netdata service and it started showing up.
I had to use
nvidia_smi: yes
in /usr/lib/netdata/conf.d/python.d.conf for it to work.
Did you test if it survives a reboot?
Yes it did
so first I need to turn off “readonly”
edit the pyhtond.d.conf file and then, put readonly back?
And then restart the netdata service? via cli?
Yes
Not really needed, just leave it as is.
Correct
Thanks to clear instructions from @duderuud and the observation from @Trexx I was able to get it working too, clearly proving that it’s idiot proof.
To add a little more detail for people who (like me) might be nervous about making a mistake: I SSH’d into TrueNAS as root and then
(xx.xx.x is 24.04.2 in the case of the latest TrueNAS scale)
# zfs set readonly=off boot-pool/ROOT/xx.xx.x/usr
# nano /usr/lib/netdata/conf.d/python.d.conf
which in my case said this (yours will probably be the same, or at least similar):
enabled: yes
default_run: no
cputemp: yes
smart_log: yes
k3s_stats: yes
Then I added nvidia_smi: yes
at the end and saved it (note the underscore rather than the hyphen between nvidia
and smi
).
I then protected as RO the file system once again (better safe than sorry although I appreciate that it might not be needed):
# zfs set readonly=on boot-pool/ROOT/xx.xx.x/usr
Then I tried the aforementioned script
#./python.d.plugin nvidia_smi debug
which filled the terminal up with many pages of information referring to nvidia and so on. I did ctrl-C to kill it.
Then I restarted the netdata service as mentioned and this is how I did it:
# systemctl restart netdata
At this point I visited the netdata screen “reporting|netdata” (button on top RHS of the reporting screen and saw there was an “nvidia smi” entry in the RH column which lists the modules:
which then showed me this sort of screen:
which meant it had worked.
Relief!
Hi. Apologies for re-opening a new thread, but I am looking at this solution while running 24.10.2
, and I am getting stuck on zfs set readonly=off boot-pool/ROOT/24.10.2/usr
. TrueNAS SCALE doesn’t seem to want to do that with:
cannot mount ‘boot-pool/ROOT/24.10.2/usr’: Read-only file system
property may be set but unable to remount filesystem
Two questions:
Hello, I am currently experinging the same issue. When I run the command to set Read-Only to Off, I get the following message, and I am still unable to edit the python.d.conf file.
admin@truenas[~]$ sudo zfs set readonly=off boot-pool/ROOT/24.10.2/usr
cannot mount 'boot-pool/ROOT/24.10.2/usr': Read-only file system
property may be set but unable to remount filesystem
I will truly appreciate any kind of assistance.
A post in the Netdata community forums helped me. (Can’t post a link - maybe because I’m new? Search for "Nvidia-smi use to monitor quadro gpus" site:netdata.cloud
.
It took some futzing, so these steps might not be exact. but I think this is what worked:
cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config python.d.conf
Add: nvidia_smi: yes
Restart the service (I think this created the default conf file):
sudo systemctl restart netdata
Modify the configuration file:
sudo ./edit-config python.d/nvidia_smi.conf
Add/uncomment and modify these lines:
loop_mode: yes
poll_seconds: 1
Restart the service (again):
sudo systemctl restart netdata