How do I enable the Nvidia monitoring in Netdata

Tried to add the nvidia_smi conf file to /etc/netdata/python.d but can’t seem to get the Nvidia monitoring to show up.

Anyone know how to get it to work?

Got it working finally. For anyone running into the same issue:

Edit the file /usr/lib/netdata/conf.d/python.d.conf and add:
nvidia_smi: yes

For some reason a lot of directories were read only, including /usr so I had to run:
zfs set readonly=off boot-pool/ROOT/24.04.0/usr

Ran debugging to see if it worked (plugin located in /usr/lib/netdata/plugins.d):
./python.d.plugin nvidia_smi debug

And that worked so just had to restart the Netdata service and it started showing up.

I had to use

nvidia_smi: yes

in /usr/lib/netdata/conf.d/python.d.conf for it to work.

Did you test if it survives a reboot?

Yes it did

1 Like

so first I need to turn off “readonly”

edit the pyhtond.d.conf file and then, put readonly back?

And then restart the netdata service? via cli?

Yes
Not really needed, just leave it as is.
Correct

Thanks to clear instructions from @duderuud and the observation from @Trexx I was able to get it working too, clearly proving that it’s idiot proof.

To add a little more detail for people who (like me) might be nervous about making a mistake: I SSH’d into TrueNAS as root and then

(xx.xx.x is 24.04.2 in the case of the latest TrueNAS scale)

# zfs set readonly=off boot-pool/ROOT/xx.xx.x/usr

# nano /usr/lib/netdata/conf.d/python.d.conf which in my case said this (yours will probably be the same, or at least similar):

enabled: yes
default_run: no
cputemp: yes
smart_log: yes
k3s_stats: yes

Then I added nvidia_smi: yes at the end and saved it (note the underscore rather than the hyphen between nvidia and smi).

I then protected as RO the file system once again (better safe than sorry although I appreciate that it might not be needed):

# zfs set readonly=on boot-pool/ROOT/xx.xx.x/usr

Then I tried the aforementioned script

#./python.d.plugin nvidia_smi debug

which filled the terminal up with many pages of information referring to nvidia and so on. I did ctrl-C to kill it.

Then I restarted the netdata service as mentioned and this is how I did it:

# systemctl restart netdata

At this point I visited the netdata screen “reporting|netdata” (button on top RHS of the reporting screen and saw there was an “nvidia smi” entry in the RH column which lists the modules:

which then showed me this sort of screen:

which meant it had worked.

Relief!

1 Like

Hi. Apologies for re-opening a new thread, but I am looking at this solution while running 24.10.2, and I am getting stuck on zfs set readonly=off boot-pool/ROOT/24.10.2/usr. TrueNAS SCALE doesn’t seem to want to do that with:

cannot mount ‘boot-pool/ROOT/24.10.2/usr’: Read-only file system
property may be set but unable to remount filesystem

Two questions:

  • Has the solution presented here persisted across upgrades? (Both minor and train switches)
  • Has anyone had success with this method on Electric Eel?
3 Likes

Hello, I am currently experinging the same issue. When I run the command to set Read-Only to Off, I get the following message, and I am still unable to edit the python.d.conf file.

admin@truenas[~]$ sudo zfs set readonly=off boot-pool/ROOT/24.10.2/usr
 
cannot mount 'boot-pool/ROOT/24.10.2/usr': Read-only file system
property may be set but unable to remount filesystem

I will truly appreciate any kind of assistance.

A post in the Netdata community forums helped me. (Can’t post a link - maybe because I’m new? Search for "Nvidia-smi use to monitor quadro gpus" site:netdata.cloud.

It took some futzing, so these steps might not be exact. but I think this is what worked:

cd /etc/netdata 2>/dev/null || cd /opt/netdata/etc/netdata
sudo ./edit-config python.d.conf

Add: nvidia_smi: yes

Restart the service (I think this created the default conf file):

sudo systemctl restart netdata

Modify the configuration file:

sudo ./edit-config python.d/nvidia_smi.conf

Add/uncomment and modify these lines:

loop_mode: yes
poll_seconds: 1

Restart the service (again):

sudo systemctl restart netdata
2 Likes