Step-by-Step Guide: Modifying `replicas` in the NVIDIA Device Plugin Configuration

REDUNDANT · October 3, 2024, 1:00pm

This guide will help you change the replicas setting in the NVIDIA device plugin configuration for Kubernetes, specifically on a TrueNAS Scale system using k3s. We’ll walk you through locating and editing the configuration step by step.

Step 1: Locate the NVIDIA Device Plugin ConfigMap

The NVIDIA device plugin uses a configuration stored in a Kubernetes ConfigMap. You need to retrieve the contents of this ConfigMap first.

Open a terminal session and run the following command to fetch the configuration of the NVIDIA device plugin:
```
k3s kubectl get configmap -n kube-system nvidia-device-plugin-config -o yaml
```

The output should show you the nvdefault.yaml file embedded inside the ConfigMap. Look for this section in the output:

data:
  nvdefault.yaml: |
    version: v1
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: false
        resources:
        - name: nvidia.com/gpu
          replicas: 5

Step 2: Edit the ConfigMap to Modify the `replicas` Setting

Now that you’ve found the configuration, you need to change the replicas value from 5 to 1.

To edit the ConfigMap, use the following command:

k3s kubectl edit configmap -n kube-system nvidia-device-plugin-config

This will open the ConfigMap in the default text editor (usually vim). Inside the editor, find this section:
```
replicas: 5
```
Change the value of replicas from 5 to 1:
```
replicas: 1
```

Step 3: Save and Exit the Editor

After you’ve modified the replicas value, save and close the file.

If you’re using vim, do the following:
1. Press Esc to enter command mode.
2. Type :wq and hit Enter to write the changes and quit the editor.

Step 4: Restart the NVIDIA Device Plugin DaemonSet

To apply the new configuration, you need to restart the NVIDIA device plugin. The easiest way to do this is by deleting the existing pod(s) associated with the DaemonSet. Kubernetes will automatically recreate them with the updated configuration.

Run this command to delete the existing NVIDIA device plugin pods:

k3s kubectl delete pod -n kube-system -l name=nvidia-device-plugin-ds

The DaemonSet will automatically recreate the pod(s) with the updated configuration, which will now use replicas: 1.

Step 5: Verify the Change

Once the new pod is running, you can check the logs to ensure that the new configuration with replicas: 1 is being applied.

Get the name of the newly created pod:

k3s kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds

Check the logs of the new pod:

k3s kubectl logs -n kube-system <new-nvidia-pod-name>

In the logs, you should see the updated configuration reflecting
replicas: 1.

If Changes Don’t Work: Disable Time-Slicing

If you encounter issues with the pod crashing after changing the replicas, try disabling time-slicing. This can allow for just one GPU to be allocated without conflicts.

Steps to Disable Time-Slicing

Edit the ConfigMap:

Open the ConfigMap for editing again:

k3s kubectl edit configmap -n kube-system nvidia-device-plugin-config

Locate the Time-Slicing Section:

Find the timeSlicing section within the nvdefault.yaml data. It should look similar to this:

sharing:
  timeSlicing:
    renameByDefault: false
    failRequestsGreaterThanOne: false
    resources:
    - name: nvidia.com/gpu
      replicas: 1

Modify the Time-Slicing Configuration:

You can remove the timeSlicing section entirely or set it to disable time-slicing by adjusting its parameters. Here’s how you can disable it:

sharing:
  # Remove the timeSlicing section entirely or modify it to:
  # timeSlicing: {}  # Empty section to effectively disable.
  renameByDefault: false
  failRequestsGreaterThanOne: false
  resources:
  - name: nvidia.com/gpu
    replicas: 1

Save and Exit:

Save your changes and exit the editor.

Restart the NVIDIA Device Plugin Pods:

Delete the current pod(s) again to ensure they pick up the new configuration:

k3s kubectl delete pod -n kube-system -l name=nvidia-device-plugin-ds

Verify the Changes:

After the new pod is up, check the logs to ensure it’s running properly:

k3s kubectl logs -n kube-system <new-nvidia-pod-name>

Topic		Replies	Views
FIX: NVIDIA Pod in Crash-Loop on TrueNAS Scale k3s Setup Apps and Virtualization SCALE , Hardware	3	490	October 10, 2024
Old nvidia k8s-device-plugin? Apps and Virtualization SCALE , Apps	2	124	June 14, 2024
Modify Scale to passtrought single gpu TrueNAS General	1	280	September 25, 2025
Yet another GPU passthrough - TrueNas Scale Apps and Virtualization	5	2056	July 17, 2024
Nvidia-device-plugin-daemonset CrashLoopBackOff Apps and Virtualization SCALE	1	282	May 9, 2024