Linux Jails (containers/vms) with Incus

yorick · February 17, 2025, 6:56am

Oh good point. So yes, create the dataset. Then drop into TrueNAS cli and sudo chown -R 2147000001:2147000001 /mnt/<pool>/<dataset> if you need it owned by root in Incus - for a Docker data-root for example.

Or sudo chown -R 2147001001:2147001001 /mnt/<pool>/<dataset> if you want it owned by uid 1000, default first user in Debian or Ubuntu.

Adjust offset to whatever uid you need in Incus.

This assumes the dataset is fresh and really for Incus and Incus alone.

If you need it in Incus but also accessible in a share, then I’d guess the best bet is an advanced ACL - owner and group with offset to make Incus happy, then ACL handles access rights from the TrueNAS side.

dasunsrule32 · February 17, 2025, 3:50pm

If you’re doing type: disk, then I’m doing that in the op in the cloud init config. I’m just creating a separate dedicated dataset for jails rather than using .ix-virt. I also addressed the reasons behind breaking out earlier in this thread.

Yes, I could leave the OS dataset in .ix-virt and continue to break out the data disks like I do in the OP. It’s just a preference to keep everything in one central location to K.I.S.S. and clean. It’s also to separate unsupported vm’s/containers in their own dataset.

In the OP, the security change is because I’m running mine in a VM currently and it requires nesting.

dasunsrule32 · February 17, 2025, 4:02pm

I’m going to work on migrating my bare metal box to TNS 25.04 this week since the BETA is out now and start migrating all my jailmaker machines over to Incus. See what we get with Nvidia… Wheee.

yorick · February 17, 2025, 4:07pm

Thanks! Looking at it, you do

  data:
    path: /mnt/data
    source: /mnt/pool/data/apps
    shift: true
    type: disk

I’m using the TrueNAS CE UI instead, and that doesn’t use shift. Fresh dataset into /mnt/docker, then chown it on the TrueNAS side so root owns it on the Incus side, and set Docker data-root to use it.

Basically I’m staying in the UI and middleware, instead of handling Incus from CLI.

I see the entry where you talk about easier snapshots and such. I think for me, my requirements are simpler, and so KISS means just sticking with what TrueNAS does and mounting a dataset in where the Docker data-root will live. Almost 100% UI, and will definitely survive TrueNAS upgrades.

I can see the merit of what you’re doing for more complex use cases.

dasunsrule32 · February 17, 2025, 4:09pm

That’s ONLY for the data datasets, NOT the OS datasets. Since there are multiple apps running on a single host and multiple permissions required, it’s just easier to set shift and set the permissions down the tree on said datasets. It’s up to you how you want to handle it and I get it.

I may look at expanding this into a script ala jailmaker or collection of clout-init configs as well. I may possibly add a config to incorporate the Incus Management UI as well. So I’m going a slightly different direction then IX on this one.

yorick · February 17, 2025, 4:13pm

Hmm, I’m assuming those apps are in systemd then, or use Docker bind mounts?

I use Docker volumes, and permissions are handled correctly for different containers with different users, without shift and with Docker handling the contents of the dataset that’s in /mnt/docker

Different approaches I think. I avoid bind mounts for the most part, because permissions get so ugly. I only run lxc at all because this one particular app is a little more complex and doesn’t lend itself well to the Custom App flow in TrueNAS.

dasunsrule32 · February 17, 2025, 4:15pm

Yeah, I don’t use Docker Volumes. I try to simplify data as well rather than having to go and search for my data, it’s right where I left it.

dasunsrule32 · February 17, 2025, 4:27pm

Meh, started the update to TNS 25.04… I couldn’t wait…

LarsR · February 17, 2025, 5:53pm

I could persuade my boss to let me keep a decommissioned desktop PC from work for a few weeks before they threw it in the trash so I could play around with fangtooth. Playtime starts Saturday.

dasunsrule32 · February 17, 2025, 6:26pm

~~Outside of having to passthrough an extra Nvidia library and fix DNS on some of my jails, update worked without issue.~~

Scratch that, I just needed to update to the latest nvidia-container-toolkit and everything worked fine.

Stux · February 18, 2025, 3:28am

BTW, I found a recommendation from stgraber to use block based file systems for docker in incus

but, the benefit of passing in a dataset is that you have full visibility of it from outside the container, and the neat thing is with Incus you can actually delegate zfs to the container so that it actually has access to zfs features on its dataset.

yorick · February 18, 2025, 8:59am

Interesting, so, a zvol. Hmm.

dasunsrule32 · February 18, 2025, 3:30pm

Will take a look as we move forward. Added to OP TODO list to investigate.

dasunsrule32 · February 18, 2025, 4:11pm

Must be Debian not backporting the OCI changes?

incus remote add docker https://docker.io --protocol=oci
Error: Invalid protocol: oci

EDIT: Argh… oci has been backported to LXC 6.0.3, but NOT Incus 6.0.3 as of yet… Oh well. Soon TM…

dasunsrule32 · February 18, 2025, 4:19pm

That’s a start:

devices:
  gpu0:
    gputype: physical
    pci: 0000:2b:00.0
    type: gpu

skittlebrau · February 20, 2025, 6:56am

I’m keen to hear that goes, although I’m using an Intel Arc A380 which does make things a little easier.

With the change to Incus for VMs and LXCs, I hope this means we eventually get an easy supported method for backup and restore. In the interim I guess it’s simple enough to script backups for storage when the need arises. I know I can snapshot, but I still like to have a couple of backups archived that aren’t necessarily on systems that run TrueNAS.

incus stop <instance_name>
incus export <instance_name> [<file_path>]
incus start <instance_name>

Something like that should work right? I’ve never used incus before, so does ‘incus stop’ wait for the instance to shutdown gracefully or would I need to add an arbritrary ‘sleep 60’ line in there for a bash script?

dasunsrule32 · February 21, 2025, 5:35pm

So using the default store for the OS disks is easy enough for bringing up machines and having custom disks attached to the local datasets works fine. Machines should persist this way as well since we’re working within the IX default configuration. It was pretty seamless in bringing up a new LXC container. You can see how root would work using default which would drop the OS disks under /mnt/pool/.ix-virt. This should be fine for now. I’m still hoping that IX will support custom datasets outside the one’s managed by middlewared.

incus launch images:debian/bookworm/cloud docker-gpu < docker-init.yaml

incus config show docker-gpu
---
  root:
    path: /
    pool: default
    type: disk

incus storage show default
config:
  source: sol/.ix-virt
  zfs.pool_name: sol/.ix-virt
description: ""
name: default
driver: zfs
used_by:
- /1.0/images/90c2420799a6692fe6031df29d6bca195278bf70a038184f68a892b6ca7857e2
- /1.0/instances/docker-gpu
- /1.0/profiles/default
status: Created
locations:
- none

incus exec docker-gpu bash
root@docker-gpu:~# ls -la /mnt/
total 27
drwxr-xr-x  6 root   root    6 Feb 21 12:30 .
drwxr-xr-x 17 root   root   21 Feb 21 00:28 ..
drwxr-xr-x  3 root   root    3 May  3  2024 containers
drwxr-xr-x 35 debian docker 35 Feb  5 16:31 data
drwxr-xr-x  8 root   root    8 Jan 21 13:44 db
drwxr-xr-x  4 root   root    4 Nov 22 13:19 media
root@docker-gpu:~# ls -la /opt/
total 19
drwxr-xr-x  5 root root  5 Feb 21 12:31 .
drwxr-xr-x 17 root root 21 Feb 21 00:28 ..
drwx--x--x  4 root root  4 Feb 21 12:31 containerd
drwxr-xr-x  3 root root  3 Feb 21 12:32 dockge
drwxr-xr-x 33 root root 33 Jan 20 10:57 stacks
root@docker-gpu:~# df -h
Filesystem                                 Size  Used Avail Use% Mounted on
sol/.ix-virt/containers/docker-gpu         1.1T  940M  1.1T   1% /
none                                       492K  4.0K  488K   1% /dev
udev                                        32G     0   32G   0% /dev/zfs
efivarfs                                   128K   34K   90K  28% /sys/firmware/efi/efivars
tmpfs                                      100K     0  100K   0% /dev/incus
boot-pool/ROOT/25.04-BETA.1/var/lib/incus   85G  132M   85G   1% /dev/nvidia0
tmpfs                                      100K     0  100K   0% /dev/.incus-mounts
sol/containers                             1.1T  128K  1.1T   1% /mnt/containers
sol/data/apps                              1.1T  3.3M  1.1T   1% /mnt/data
sol/database                               1.1T  128K  1.1T   1% /mnt/db
sol/media                                  1.1T  128K  1.1T   1% /mnt/media
sol/data/stacks                            1.1T  512K  1.1T   1% /opt/stacks
tmpfs                                       32G     0   32G   0% /dev/shm
tmpfs                                       13G  228K   13G   1% /run
tmpfs                                      5.0M     0  5.0M   0% /run/lock
overlay                                    1.1T  940M  1.1T   1% /var/lib/docker/overlay2/33f18859e5f907f93def753056fa6e57c78e339a94994d9d797ec0bc64c05f4b/merged

I will work on the GPU next… automating that part is going to be a little more involved for NVIDIA based GPU’s, amd and intel will likely be just adding the GPU to the config and going to town… but basically I will need a script to do the following:

Call initial cloud-init to install everything including nvidia drivers
Enable nvidia configs

incus config set docker-gpu nvidia.runtime=true

Start container again
Run nvidia-ctk runtime configure --runtime=docker
Restart docker systemctl restart docker
Profit?

I’m open to suggestions as well for creating machines. I’m thinking a simple BASH script that will let you pick a gpu or non-gpu instance and create the instances and run all the needed commands. I’m thinking something simple like and let it go to town:

incus-create -n docker-gpu -g nvidia -c docker-init-template.yaml

So far though, this has basically replaced jailmaker with a single cloud-init config.

dasunsrule32 · February 21, 2025, 7:43pm

Success!

root@docker-gpu:~# nvidia-container-cli list
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
/dev/nvidia-modeset
/dev/nvidia0
/usr/bin/nvidia-smi
/usr/bin/nvidia-debugdump
/usr/bin/nvidia-persistenced
/usr/bin/nvidia-cuda-mps-control
/usr/bin/nvidia-cuda-mps-server
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.550.142
/usr/lib/x86_64-linux-gnu/libcuda.so.550.142
/usr/lib/x86_64-linux-gnu/libcudadebugger.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.550.142
/usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.550.142
/lib/firmware/nvidia/550.142/gsp_ga10x.bin
/lib/firmware/nvidia/550.142/gsp_tu10x.bin

root@docker-gpu:~# nvidia-smi 
Fri Feb 21 14:45:28 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.142                Driver Version: 550.142        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     Off |   00000000:2B:00.0 Off |                  N/A |
|  0%   48C    P0             62W /  280W |       0MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

root@docker-gpu:~# cat /etc/docker/daemon.json 
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }

root@docker-gpu:~# docker info
Client: Docker Engine - Community
 Version:    28.0.0
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.21.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.33.0
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 1
  Running: 1
  Paused: 0
  Stopped: 0
 Images: 1
 Server Version: 28.0.0
 Storage Driver: overlay2
  Backing Filesystem: zfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: true
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
 runc version: v1.2.4-0-g6c52b3f
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.12.9-production+truenas
 Operating System: Debian GNU/Linux 12 (bookworm)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.73GiB
 Name: docker-gpu
 ID: 690cc0ce-0eda-484c-b2e3-b9cc2a1742c7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false

Now I need to script and document it.

dasunsrule32 · February 21, 2025, 8:45pm

Ran into some additional issues with the GPU and starting gpu docker containers:

nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown

Setting the following in /etc/nvidia-container-runtime/config.toml seems to have worked around the issue in LXC. I’m not 100% sure if this is a proper fix at this time…

no-cgroups = true

and mounting the GPU path:

  nvidia-gpu:
    path: /proc/driver/nvidia/gpus/0000:2b:00.0
    source: /proc/driver/nvidia/gpus/0000:2b:00.0
    type: disk

Working on some additional issues with storage not passing through properly that were working previously in the nightly phase that are not working in BETA-1.

This mounted the extra datasets missing below in the tree.

recursive: true

dasunsrule32 · February 21, 2025, 8:50pm

This brings up in issue in the TrueNAS Web UI… it doesn’t like passing through that directory for the nvidia-gpu disk… but as far as I can tell it’s a necessary evil until either LXC is fixed or nvidia fixes their stuff…

 Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/ws_handler/rpc.py", line 310, in process_method_call
    result = await method.call(app, params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py", line 49, in call
    return self._dump_result(app, methodobj, result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py", line 52, in _dump_result
    return self.middleware.dump_result(self.serviceobj, methodobj, app, result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 785, in dump_result
    return serialize_result(new_style_returns_model, result, expose_secrets)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/api/base/handler/result.py", line 13, in serialize_result
    return model(result=result).model_dump(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for VirtInstanceDeviceListResult
result.7.DISK.source
  Value error, Only pool paths are allowed [type=value_error, input_value='/proc/driver/nvidia/gpus/0000:2b:00.0', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/value_error