Maybe my question will seem stupid, but what is “OP”?
“OP”= original post, so the very first post that started this thread
Also, “Original Poster”
Something funky with the stable update. Getting userns_idmap issues. Had to roll back. Will try to roll forward tomorrow and debug…
Ok, here is the logs for documentation sake, will keep looking:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/api/base/server/ws_handler/rpc.py", line 323, in process_method_call
result = await method.call(app, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py", line 49, in call
return await self._dump_result(app, methodobj, result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py", line 52, in _dump_result
return self.middleware.dump_result(self.serviceobj, methodobj, app, result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 791, in dump_result
return serialize_result(new_style_returns_model, result, expose_secrets)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/api/base/handler/result.py", line 13, in serialize_result
return model(result=result).model_dump(
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/pydantic/main.py", line 212, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 4 validation errors for VirtInstanceQueryResult
result.list[VirtInstanceQueryResultItem].2.userns_idmap.uid
Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.list[VirtInstanceQueryResultItem].2.userns_idmap.gid
Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.VirtInstanceQueryResultItem
Input should be a valid dictionary or instance of VirtInstanceQueryResultItem [type=model_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.int
Input should be a valid integer [type=int_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
For further information visit https://errors.pydantic.dev/2.9/v/int_type
Only thing added userns_idmap
wise are:
incus config show apps1|grep -A 8 raw.idmap
raw.idmap: |-
uid 1000 1000
uid 568 568
uid 373 373
uid 1001 1001
gid 1000 1000
gid 568 568
gid 373 373
gid 1001 1001
As long as I don’t shut that instance down or reboot the host, it works on initial boot, after that not so much because Incus doesn’t start on reboot and has the Error in the Web UI.
Also, attempting to restart an instance gets the following:
midclt call virt.instance.restart incus-ui -j
Status: (none)
Total Progress: [________________________________________] 0.00%
[EINVAL] ALL: Value error, Timeout should be set if force is disabled
[-t TIMEOUT]
midclt -t 20 call virt.instance.restart incus-ui -j
Total Progress: [________________________________________] 0.00%Status: (none)
Total Progress: [________________________________________] 0.00%
[EINVAL] ALL: Value error, Timeout should be set if force is disabled
Attempting to kick instance with incus
:
incus restart dns
Error: Failed to run: /usr/libexec/incus/incusd forkstart dns /var/lib/incus/containers /run/incus/dns/lxc.conf: exit status 1
Try `incus info --show-log dns` for more info
incus info --show-log dns
Name: dns
Status: ERROR
Type: container
Architecture: x86_64
Created: 2025/03/13 14:12 EDT
Last Used: 2025/04/16 12:35 EDT
Log:
lxc dns 20250416163557.661 ERROR conf - ../src/lxc/conf.c:lxc_map_ids:3708 - newuidmap failed to write mapping "newuidmap: uid range [373-374) -> [373-374) not allowed": newuidmap 757189 0 2147000001 373 373 373 1 374 2147000375 194 568 568 1 569 2147000570 431 1000 1000 1 1001 1001 1 1002 2147001003 457750
lxc dns 20250416163557.662 ERROR start - ../src/lxc/start.c:lxc_spawn:1788 - Failed to set up id mapping.
lxc dns 20250416163557.663 ERROR lxccontainer - ../src/lxc/lxccontainer.c:wait_on_daemonized_start:878 - Received container state "ABORTING" instead of "RUNNING"
lxc dns 20250416163557.666 ERROR start - ../src/lxc/start.c:__lxc_start:2107 - Failed to spawn container "dns"
lxc dns 20250416163557.666 WARN start - ../src/lxc/start.c:lxc_abort:1036 - No such process - Failed to send SIGKILL via pidfd 17 for process 757189
lxc dns 20250416163627.836 ERROR conf - ../src/lxc/conf.c:run_buffer:322 - Script exited with status 1
lxc dns 20250416163627.837 ERROR start - ../src/lxc/start.c:lxc_end:944 - Failed to run "lxc.hook.stop" hook
lxc 20250416163627.126 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250416163627.126 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250416163627.126 ERROR af_unix - ../src/lxc/af_unix.c:lxc_abstract_unix_recv_fds_iov:218 - Connection reset by peer - Failed to receive response
lxc 20250416163627.126 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc 20250416163627.126 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_init_pid"
lxc 20250416163627.126 ERROR commands - ../src/lxc/commands.c:lxc_cmd_rsp_recv_fds:128 - Failed to receive file descriptors for command "get_state"
And a second time results in nothing useful:
incus start dns
Error: Failed to run: /usr/libexec/incus/incusd forkstart dns /var/lib/incus/containers /run/incus/dns/lxc.conf: exit status 1
Try `incus info --show-log dns` for more info
incus info --show-log dns
Error: stat /proc/-1: no such file or directory
Side note: This is also why I reported the bug for the Web UI, the UI shouldn’t be crashing and not allowing access to the instances when something isn’t quite right with one. It should continue to function and ignore the affected instance showing errors on that instance and not everything. Not very reliable. That ticket got closed out…
Catching up on this after moving to Fangtooth Release yesterday. I’ve created an instance named “virtualmin” using the Debian (Bookworm) image from the GUI. The disk size was set to 500 GB on creation, but the size of the disk in the running GM is about 10GB - I remember briefly seeing an error on the console during install that the root partition failed to grow, but it vanished before I could capture it.
I did try and google up how to expand the root partition, but I’m running into a roadblock - when I browse to /mnt/data/.ix-virt, I get an error the directory does not exist. When I do a zfs list
, I clearly see data/.ix-virt, and its sub-directories, but I can’t seem to move to them to run the commands to expand my partition.
So, two questions:
-
Is there an easy way to expand the root volume of my Debian VM, either from within the VM, or from the Host’s command line?
-
Why can’t I browse / move into the /mnt/data/.ix-virt folder on my system, even with root permissions, when I can clearly see it in
zfs list
?
That’s because it’s mounted in a different location and basically owned by the incus process.
What does the Root Disk size say for this instance in the disks widget? And does the Change button work for expanding it?
https://www.truenas.com/docs/scale/25.04/scaleuireference/instancesscreens/#disks-widget
The size is listed as 500 GB. The issue is not the size of the volume, it’s the size that the OS sees. I was able to capture the message when I created a new instance, using the Debian image from linuximages.org:
Booting `Debian GNU/Linux'
Loading Linux 6.1.0-33-amd64 ...
Loading initial ramdisk ...
[ 2.806283] ima: Can not allocate sha384 (reason: -2)
[ 2.833677] ima: Can not allocate sha512 (reason: -2)
[ 7.621761] platform regulatory.0: firmware: failed to load regulatory.db (-2)
[ 7.625668] firmware_class: See https://wiki.debian.org/Firmware for information about missing firmware
[ 7.631301] platform regulatory.0: firmware: failed to load regulatory.db (-2)
[FAILED] Failed to start incus-grow…[0m - Incus - grow root partition.
Debian GNU/Linux 12 virtualmin ttyS0
virtualmin login:
I was able to expand the root volume following this guide:
I’m still having issues. Not sure what is up currently. I nixed apps1, but it started down the line of the next instance with the same issue.
Trying to build a VM on Incus to test out what might be happening and not having much luck.
incus rm truenas-vm -f
Error: Failed deleting instance "truenas-vm" in project "default": Failed to create instance delete operation: Instance is busy running a "start" operation
incus stop truenas-vm -f
Error: The instance is already stopped
incus rm truenas-vm -f
Error: Failed deleting instance "truenas-vm" in project "default": Failed to create instance delete operation: Instance is busy running a "start" operation
Job is stuck, trying to kick middlewared
… service isn’t restarting properly.
Ughh… why was support for cloud
variants removed in Stable, not just Debian, all images? This is the breaking change… this removes the ability to customize easily at this point in time until these features are implemented in the UI.
Stable:
RC-1:
Log:
pydantic_core._pydantic_core.ValidationError: 4 validation errors for VirtInstanceQueryResult
result.list[VirtInstanceQueryResultItem].2.userns_idmap.uid
Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.list[VirtInstanceQueryResultItem].2.userns_idmap.gid
Input should be a valid dictionary or instance of IdmapUserNsEntry [type=model_type, input_value=None, input_type=NoneType]
For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.VirtInstanceQueryResultItem
Input should be a valid dictionary or instance of VirtInstanceQueryResultItem [type=model_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
For further information visit https://errors.pydantic.dev/2.9/v/model_type
result.int
Input should be a valid integer [type=int_type, input_value=[{'id': 'apps1', 'name': ...': 0}}, 'memory': None}], input_type=list]
For further information visit https://errors.pydantic.dev/2.9/v/int_type
At least I know it’s not a hardware issue now… I started going down that trail.
Maybe this is a change upstream, outside of TrueNAS’s control?
I don’t think so. I’m pretty sure that is done in their filter in middlewared
. I can pull cloud images just fine manually, but booting up breaks everything and I can’t even reboot in stable, etc. This one is a big deal, whether intentionally or unintentionally.
incus image ls images: debian/bookworm cloud architecture=x86_64
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+
| ALIAS | FINGERPRINT | PUBLIC | DESCRIPTION | ARCHITECTURE | TYPE | SIZE | UPLOAD DATE |
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+
| debian/12/cloud (3 more) | 439dc746737f | yes | Debian bookworm amd64 (20250417_05:24) | x86_64 | VIRTUAL-MACHINE | 382.87MiB | 2025/04/16 20:00 EDT |
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+
| debian/12/cloud (3 more) | c76f0591077a | yes | Debian bookworm amd64 (20250417_05:24) | x86_64 | CONTAINER | 130.79MiB | 2025/04/16 20:00 EDT |
+--------------------------+--------------+--------+----------------------------------------+--------------+-----------------+-----------+----------------------+
What is incus config for this instance? It looks like configuration was changed to something our backend isn’t expecting (possibly via incus shell commands).
The only issue I see is the variant. Only default
is available when searching in the UI, not cloud
. I’m using variants of the cloud-init
scripts in the OP.
Everything I’ve created are with the cloud images because that is needed to use the cloud-init
scripts. They are preconfigured with cloud-init
whereas default
is not:
incus launch images:debian/bookworm/cloud docker1 < docker-init.yaml
Here is one that I’m using which has been scrubbed:
description: Docker Nvidia
devices:
gpu0:
gputype: physical
pci: 0000:2b:00.0
type: gpu
eth0:
name: eth0
nictype: bridged
parent: br5
type: nic
root:
path: /
pool: default
type: disk
containers:
path: /mnt/containers
source: /mnt/sol/containers
recursive: true
type: disk
data:
path: /mnt/data
source: /mnt/sol/data/apps
recursive: true
type: disk
database:
path: /mnt/db
source: /mnt/sol/database
recursive: true
type: disk
media:
path: /mnt/media
source: /mnt/sol/media
recursive: true
type: disk
stacks:
path: /opt/stacks
source: /mnt/sol/data/stacks
recursive: true
type: disk
config:
# Start instances on boot
boot.autostart: "true"
# Load needed kernel modules
linux.kernel_modules: br_netfilter,nvidia_uvm
# Enable required security settings
security.nesting: "true"
security.syscalls.intercept.mknod: "true"
security.syscalls.intercept.setxattr: "true"
# Nvidia configs
nvidia.driver.capabilities: "compute,graphics,utility,video"
nvidia.runtime: "true"
cloud-init.network-config: |
#cloud-config
network:
version: 2
ethernets:
eth0:
addresses:
- 192.168.x.x/24
gateway4: 192.168.x.x
nameservers:
addresses: [192.168.x.x]
search:
- domain.lan
- domain.co
- domain.cc
- apps.domain.cc
cloud-init.user-data: |
#cloud-config
# Enable docker sysctl values
write_files:
- path: /etc/sysctl.d/20-docker.conf
content: |
net.ipv4.conf.all.forwarding=1
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
- path: /etc/systemd/system/fix-gpu-pass.service
owner: root:root
permissions: '0755'
content: |
[Unit]
Description=Symlink for LXC/Nvidia to Docker passthrough
Before=docker.service
[Service]
User=root
Group=root
ExecStart=/bin/bash -c 'mkdir -p /proc/driver/nvidia/gpus && ln -s /dev/nvidia0 /proc/driver/nvidia/gpus/0000:2b:00.0'
Type=oneshot
[Install]
WantedBy=multi-user.target
- path: /etc/systemd/system/docker.service.d/override.conf
owner: root:root
permissions: '0755'
content: |
[Service]
ExecStart=
ExecStart=/usr/bin/dockerd -H fd:// -H tcp://192.168.x.x:2375
# Set timezone
timezone: US/Eastern
# apt update and apt upgrade
package_update: true
package_upgrade: true
# Install apt repos and packages needed
apt:
preserve_sources_list: true
sources:
docker.list:
source: deb [arch=amd64] https://download.docker.com/linux/debian $RELEASE stable
keyid: 9DC858229FC7DD38854AE2D88D81803C0EBFCD88
filename: docker.list
nvidia-container-toolkit.list:
source: deb https://nvidia.github.io/libnvidia-container/stable/deb/$(ARCH) /
keyid: C95B321B61E88C1809C4F759DDCAE044F796ECB0
filename: nvidia-container-toolkit.list
packages:
- apt-transport-https
- apt-utils
- ca-certificates
- curl
- gpg
- wget
- host
- netcat-openbsd
- docker-ce
- docker-ce-cli
- containerd.io
- docker-buildx-plugin
- docker-compose-plugin
- nvidia-container-toolkit
- needrestart
# create groups
groups:
- docker
- apps: [root]
- pgadmin
# create users
users:
- default
- name: apps
primary_group: apps
uid: 568
groups: docker
lock_passwd: true
- name: pgadmin
primary_group: pgadmin
uid: 5050
groups: apps,docker
lock_passwd: true
# Add default auto created user to docker group
system_info:
default_user:
groups: [docker,apps]
# additional configuration
runcmd:
- 'echo "-----------------------------"'
- 'echo "Configuring system uid/gid..."'
- 'echo "-----------------------------"'
- 'groupmod -g 568 apps'
- 'groupmod -g 500 docker'
- 'usermod -u 1000 debian'
- 'groupmod -g 1000 debian'
- 'groupmod -g 5050 pgadmin'
- 'echo "-----------------------------"'
- 'echo " Configuring fix-gpu-pass... "'
- 'echo "-----------------------------"'
- 'systemctl daemon-reload'
- 'systemctl enable fix-gpu-pass'
- 'systemctl start fix-gpu-pass'
- 'echo "-----------------------------"'
- 'echo " Configuring nvidia... "'
- 'echo "-----------------------------"'
- 'nvidia-ctk runtime configure --runtime=docker'
- 'nvidia-ctk config --set nvidia-container-cli.no-cgroups -i'
- 'echo "-----------------------------"'
- 'echo " Restarting docker... "'
- 'echo "-----------------------------"'
- 'systemctl restart docker'
- 'echo " Installing dockge... "'
- 'echo "-----------------------------"'
- 'mkdir -p /opt/dockge'
- 'cd /opt/stacks/dockge'
- 'docker compose up -d'
I also started testing out setting up a new instance in my VM on stable and it looks like the pool name used to be default
in RC-1, now it’s actually the pool name. Testing that out now…