Linux Jails (containers/vms) with Incus

Maybe my question will seem stupid, but what is “OP”?

“OP”= original post, so the very first post that started this thread

1 Like

Also, “Original Poster”

1 Like

Something funky with the stable update. Getting userns_idmap issues. Had to roll back. Will try to roll forward tomorrow and debug…

Ok, here is the logs for documentation sake, will keep looking:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/ws_handler/rpc.py", line 323, in process_method_call
    result = await method.call(app, params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py", line 49, in call
    return await self._dump_result(app, methodobj, result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py", line 52, in _dump_result
    return self.middleware.dump_result(self.serviceobj, methodobj, app, result)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 791, in dump_result
    return serialize_result(new_style_returns_model, result, expose_secrets)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/api/base/handler/result.py", line 13, in serialize_result
    return model(result=result).model_dump(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 4 validation errors for VirtInstanceQueryResult
result.list[VirtInstanceQueryResultItem].2.userns_idmap.uid
  Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.list[VirtInstanceQueryResultItem].2.userns_idmap.gid
  Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.VirtInstanceQueryResultItem
  Input should be a valid dictionary or instance of VirtInstanceQueryResultItem [type=model_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.int
  Input should be a valid integer [type=int_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.9/v/int_type

Only thing added userns_idmap wise are:

incus config show apps1|grep -A 8 raw.idmap
  raw.idmap: |-
    uid 1000 1000
    uid 568 568
    uid 373 373
    uid 1001 1001
    gid 1000 1000
    gid 568 568
    gid 373 373
    gid 1001 1001

As long as I don’t shut that instance down or reboot the host, it works on initial boot, after that not so much because Incus doesn’t start on reboot and has the Error in the Web UI.

Also, attempting to restart an instance gets the following:

midclt call virt.instance.restart incus-ui -j
Status: (none)
Total Progress: [________________________________________] 0.00%
[EINVAL] ALL: Value error, Timeout should be set if force is disabled

[-t TIMEOUT]
midclt -t 20 call virt.instance.restart incus-ui -j
Total Progress: [________________________________________] 0.00%Status: (none)
Total Progress: [________________________________________] 0.00%
[EINVAL] ALL: Value error, Timeout should be set if force is disabled

Attempting to kick instance with incus:

incus restart dns   
Error: Failed to run: /usr/libexec/incus/incusd forkstart dns /var/lib/incus/containers /run/incus/dns/lxc.conf: exit status 1
Try `incus info --show-log dns` for more info

incus info --show-log dns
Name: dns
Status: ERROR
Type: container
Architecture: x86_64
Created: 2025/03/13 14:12 EDT
Last Used: 2025/04/16 12:35 EDT

Log:

lxc dns 20250416163557.661 ERROR    conf - ../src/lxc/conf.c:lxc_map_ids:3708 - newuidmap failed to write mapping "newuidmap: uid range [373-374) -> [373-374) not allowed": newuidmap 757189 0 2147000001 373 373 373 1 374 2147000375 194 568 568 1 569 2147000570 431 1000 1000 1 1001 1001 1 1002 2147001003 457750
lxc dns 20250416163557.662 ERROR    start - ../src/lxc/start.c:lxc_spawn:1788 - Failed to set up id mapping.
lxc dns 20250416163557.663 ERROR    lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc dns 20250416163557.666 ERROR    start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "dns"
lxc dns 20250416163557.666 WARN     start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 757189
lxc dns 20250416163627.836 ERROR    conf - ../src/lxc/conf.c:run_buffer:322 - Script exited with status 1
lxc dns 20250416163627.837 ERROR    start - ../src/lxc/start.c:lxc_end:944 - Failed to run "lxc.hook.stop" hook
lxc 20250416163627.126 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250416163627.126 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250416163627.126 ERROR    af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250416163627.126 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc 20250416163627.126 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc 20250416163627.126 ERROR    commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"

And a second time results in nothing useful:

incus start dns
Error: Failed to run: /usr/libexec/incus/incusd forkstart dns /var/lib/incus/containers /run/incus/dns/lxc.conf: exit status 1
Try `incus info --show-log dns` for more info

incus info --show-log dns
Error: stat /proc/-1: no such file or directory

Side note: This is also why I reported the bug for the Web UI, the UI shouldn’t be crashing and not allowing access to the instances when something isn’t quite right with one. It should continue to function and ignore the affected instance showing errors on that instance and not everything. Not very reliable. That ticket got closed out…

Catching up on this after moving to Fangtooth Release yesterday. I’ve created an instance named “virtualmin” using the Debian (Bookworm) image from the GUI. The disk size was set to 500 GB on creation, but the size of the disk in the running GM is about 10GB - I remember briefly seeing an error on the console during install that the root partition failed to grow, but it vanished before I could capture it.

I did try and google up how to expand the root partition, but I’m running into a roadblock - when I browse to /mnt/data/.ix-virt, I get an error the directory does not exist. When I do a zfs list, I clearly see data/.ix-virt, and its sub-directories, but I can’t seem to move to them to run the commands to expand my partition.

So, two questions:

  1. Is there an easy way to expand the root volume of my Debian VM, either from within the VM, or from the Host’s command line?

  2. Why can’t I browse / move into the /mnt/data/.ix-virt folder on my system, even with root permissions, when I can clearly see it in zfs list?

That’s because it’s mounted in a different location and basically owned by the incus process.

What does the Root Disk size say for this instance in the disks widget? And does the Change button work for expanding it?

https://www.truenas.com/docs/scale/25.04/scaleuireference/instancesscreens/#disks-widget

The size is listed as 500 GB. The issue is not the size of the volume, it’s the size that the OS sees. I was able to capture the message when I created a new instance, using the Debian image from linuximages.org:

Booting `Debian GNU/Linux'

Loading Linux 6.1.0-33-amd64 ...
Loading initial ramdisk ...
[    2.806283] ima: Can not allocate sha384 (reason: -2)
[    2.833677] ima: Can not allocate sha512 (reason: -2)
[    7.621761] platform regulatory.0: firmware: failed to load regulatory.db (-2)
[    7.625668] firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
[    7.631301] platform regulatory.0: firmware: failed to load regulatory.db (-2)
[FAILED] Failed to start incus-grow…[0m - Incus - grow root partition.

Debian GNU/Linux 12 virtualmin ttyS0

virtualmin login: 

I was able to expand the root volume following this guide:

https://www.reddit.com/r/linuxadmin/comments/1f34twg/how_to_increase_root_filesystem_standard_partition/

I’m still having issues. Not sure what is up currently. I nixed apps1, but it started down the line of the next instance with the same issue.

Trying to build a VM on Incus to test out what might be happening and not having much luck.

incus rm truenas-vm -f
Error: Failed deleting instance "truenas-vm" in project "default": Failed to create instance delete operation: Instance is busy running a "start" operation

incus stop truenas-vm -f
Error: The instance is already stopped

incus rm truenas-vm -f  
Error: Failed deleting instance "truenas-vm" in project "default": Failed to create instance delete operation: Instance is busy running a "start" operation

Job is stuck, trying to kick middlewared… service isn’t restarting properly.

Ughh… why was support for cloud variants removed in Stable, not just Debian, all images? This is the breaking change… this removes the ability to customize easily at this point in time until these features are implemented in the UI.

Stable:

RC-1:

Log:

pydantic_core._pydantic_core.ValidationError: 4 validation errors for VirtInstanceQueryResult
result.list[VirtInstanceQueryResultItem].2.userns_idmap.uid
  Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.list[VirtInstanceQueryResultItem].2.userns_idmap.gid
  Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.VirtInstanceQueryResultItem
  Input should be a valid dictionary or instance of VirtInstanceQueryResultItem [type=model_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.int
  Input should be a valid integer [type=int_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.9/v/int_type

At least I know it’s not a hardware issue now… I started going down that trail.

@awalkerix @kris @HoneyBadger ^^

Maybe this is a change upstream, outside of TrueNAS’s control?

I don’t think so. I’m pretty sure that is done in their filter in middlewared. I can pull cloud images just fine manually, but booting up breaks everything and I can’t even reboot in stable, etc. This one is a big deal, whether intentionally or unintentionally.

incus image ls images: debian/bookworm cloud architecture=x86_64
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+
|          ALIAS           | FINGERPRINT  | PUBLIC |              DESCRIPTION               | ARCHITECTURE |      TYPE       |   SIZE    |     UPLOAD DATE      |
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+
| debian/12/cloud (3 more) | 439dc746737f | yes    | Debian bookworm amd64 (20250417_05:24) | x86_64       | VIRTUAL-MACHINE | 382.87MiB | 2025/04/16 20:00 EDT |
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+
| debian/12/cloud (3 more) | c76f0591077a | yes    | Debian bookworm amd64 (20250417_05:24) | x86_64       | CONTAINER       | 130.79MiB | 2025/04/16 20:00 EDT |
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+

What is incus config for this instance? It looks like configuration was changed to something our backend isn’t expecting (possibly via incus shell commands).

The only issue I see is the variant. Only default is available when searching in the UI, not cloud. I’m using variants of the cloud-init scripts in the OP.

Everything I’ve created are with the cloud images because that is needed to use the cloud-init scripts. They are preconfigured with cloud-init whereas default is not:

incus launch images:debian/bookworm/cloud docker1 < docker-init.yaml

Here is one that I’m using which has been scrubbed:

description: Docker Nvidia
devices:
  gpu0:
    gputype: physical
    pci: 0000:2b:00.0
    type: gpu
  eth0:
    name: eth0
    nictype: bridged
    parent: br5
    type: nic
  root:
    path: /
    pool: default
    type: disk
  containers:
    path: /mnt/containers
    source: /mnt/sol/containers
    recursive: true
    type: disk
  data:
    path: /mnt/data
    source: /mnt/sol/data/apps
    recursive: true
    type: disk
  database:
    path: /mnt/db
    source: /mnt/sol/database
    recursive: true
    type: disk
  media:
    path: /mnt/media
    source: /mnt/sol/media
    recursive: true
    type: disk
  stacks:
    path: /opt/stacks
    source: /mnt/sol/data/stacks
    recursive: true
    type: disk
config:
  # Start instances on boot
  boot.autostart: "true"

  # Load needed kernel modules
  linux.kernel_modules: br_netfilter,nvidia_uvm

  # Enable required security settings
  security.nesting: "true"
  security.syscalls.intercept.mknod: "true"
  security.syscalls.intercept.setxattr: "true"

  # Nvidia configs
  nvidia.driver.capabilities: "compute,graphics,utility,video"
  nvidia.runtime: "true"

  cloud-init.network-config: |
    #cloud-config

    network:
      version: 2
      ethernets:
        eth0:
          addresses:
            - 192.168.x.x/24
          gateway4: 192.168.x.x
          nameservers:
            addresses: [192.168.x.x]
            search:
              - domain.lan
              - domain.co
              - domain.cc
              - apps.domain.cc

  cloud-init.user-data: |
    #cloud-config

    # Enable docker sysctl values
    write_files:
      - path: /etc/sysctl.d/20-docker.conf
        content: |
          net.ipv4.conf.all.forwarding=1
          net.bridge.bridge-nf-call-iptables=1
          net.bridge.bridge-nf-call-ip6tables=1
      - path: /etc/systemd/system/fix-gpu-pass.service
        owner: root:root
        permissions: '0755'
        content: |
          [Unit]
          Description=Symlink for LXC/Nvidia to Docker passthrough
          Before=docker.service

          [Service]
          User=root
          Group=root
          ExecStart=/bin/bash -c 'mkdir -p /proc/driver/nvidia/gpus && ln -s /dev/nvidia0 /proc/driver/nvidia/gpus/0000:2b:00.0'
          Type=oneshot

          [Install]
          WantedBy=multi-user.target
      - path: /etc/systemd/system/docker.service.d/override.conf
        owner: root:root
        permissions: '0755'
        content: |
          [Service]
          ExecStart=
          ExecStart=/usr/bin/dockerd -H fd:// -H tcp://192.168.x.x:2375

    # Set timezone
    timezone: US/Eastern

    # apt update and apt upgrade
    package_update: true
    package_upgrade: true

    # Install apt repos and packages needed
    apt:
      preserve_sources_list: true
      sources:
        docker.list:
          source: deb [arch=amd64] https://download.docker.com/linux/debian $RELEASE stable
          keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
          filename: docker.list
        nvidia-container-toolkit.list:
          source: deb https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
          keyid: C95B321B61E88C1809C4F759DDCAE044F796ECB0
          filename: nvidia-container-toolkit.list
    packages:
      - apt-transport-https
      - apt-utils
      - ca-certificates
      - curl
      - gpg
      - wget
      - host
      - netcat-openbsd
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - docker-buildx-plugin
      - docker-compose-plugin
      - nvidia-container-toolkit
      - needrestart

    # create groups
    groups:
      - docker
      - apps: [root]
      - pgadmin

    # create users
    users:
      - default
      - name: apps
        primary_group: apps
        uid: 568
        groups: docker
        lock_passwd: true
      - name: pgadmin
        primary_group: pgadmin
        uid: 5050
        groups: apps,docker
        lock_passwd: true

    # Add default auto created user to docker group
    system_info:
      default_user:
        groups: [docker,apps]

    # additional configuration
    runcmd:
      - 'echo "-----------------------------"'
      - 'echo "Configuring system uid/gid..."'
      - 'echo "-----------------------------"'
      - 'groupmod -g 568 apps'
      - 'groupmod -g 500 docker'
      - 'usermod -u 1000 debian'
      - 'groupmod -g 1000 debian'
      - 'groupmod -g 5050 pgadmin'
      - 'echo "-----------------------------"'
      - 'echo " Configuring fix-gpu-pass... "'
      - 'echo "-----------------------------"'
      - 'systemctl daemon-reload'
      - 'systemctl enable fix-gpu-pass'
      - 'systemctl start fix-gpu-pass'
      - 'echo "-----------------------------"'
      - 'echo "    Configuring nvidia...    "'
      - 'echo "-----------------------------"'
      - 'nvidia-ctk runtime configure --runtime=docker'
      - 'nvidia-ctk config --set nvidia-container-cli.no-cgroups -i'
      - 'echo "-----------------------------"'
      - 'echo "    Restarting docker...     "'
      - 'echo "-----------------------------"'
      - 'systemctl restart docker'
      - 'echo "    Installing dockge...     "'
      - 'echo "-----------------------------"'
      - 'mkdir -p /opt/dockge'
      - 'cd /opt/stacks/dockge'
      - 'docker compose up -d'

I also started testing out setting up a new instance in my VM on stable and it looks like the pool name used to be default in RC-1, now it’s actually the pool name. Testing that out now…