How do I enable the Nvidia monitoring in Netdata

Tried to add the nvidia_smi conf file to /etc/netdata/python.d but can’t seem to get the Nvidia monitoring to show up.

Anyone know how to get it to work?

Got it working finally. For anyone running into the same issue:

Edit the file /usr/lib/netdata/conf.d/python.d.conf and add:
nvidia-smi: yes

For some reason a lot of directories were read only, including /usr so I had to run:
zfs set readonly=off boot-pool/ROOT/24.04.0/usr

Ran debugging to see if it worked (plugin located in /usr/lib/netdata/plugins.d):
./python.d.plugin nvidia_smi debug

And that worked so just had to restart the Netdata service and it started showing up.

I had to use

nvidia_smi: yes

in /usr/lib/netdata/conf.d/python.d.conf for it to work.

Did you test if it survives a reboot?

Yes it did

1 Like

so first I need to turn off “readonly”

edit the pyhtond.d.conf file and then, put readonly back?

And then restart the netdata service? via cli?

Yes
Not really needed, just leave it as is.
Correct

Thanks to clear instructions from @duderuud and the observation from @Trexx I was able to get it working too, clearly proving that it’s idiot proof.

To add a little more detail for people who (like me) might be nervous about making a mistake: I SSH’d into TrueNAS as root and then

(xx.xx.x is 24.04.2 in the case of the latest TrueNAS scale)

# zfs set readonly=off boot-pool/ROOT/xx.xx.x/usr

# nano /usr/lib/netdata/conf.d/python.d.conf which in my case said this (yours will probably be the same, or at least similar):

enabled: yes
default_run: no
cputemp: yes
smart_log: yes
k3s_stats: yes

Then I added nvidia_smi: yes at the end and saved it (note the underscore rather than the hyphen between nvidia and smi).

I then protected as RO the file system once again (better safe than sorry although I appreciate that it might not be needed):

# zfs set readonly=on boot-pool/ROOT/xx.xx.x/usr

Then I tried the aforementioned script

#./python.d.plugin nvidia_smi debug

which filled the terminal up with many pages of information referring to nvidia and so on. I did ctrl-C to kill it.

Then I restarted the netdata service as mentioned and this is how I did it:

# systemctl restart netdata

At this point I visited the netdata screen “reporting|netdata” (button on top RHS of the reporting screen and saw there was an “nvidia smi” entry in the RH column which lists the modules:

which then showed me this sort of screen:

which meant it had worked.

Relief!

1 Like