Dragonfish 24.04.1.1 in a VM, hangs on boot, can stall on shutdown

24.04.0 should also work i think

Sorry, what is the remaining specific issue here?

Having a mock-up Ubuntu VM stops Truenas Scale from shutting-down!?

If the VM is not configured to Auto Start, libvirt will NOT touch it.

Can’t be more clear when you do see in your video during shutdown:
“libvirt-guests.sh[…]”: NO RUNNING GUESTS.

FYI, in my secondary/test Scale server, where I tested the nightly build, I do have several other VMs, including ubuntu, that do start/run, but as testers, none of them auto-start (similar to that condition?).

Ah, there it is…

It’s unfortunate that some people simply cannot take the time to check/gather logs.

Guess a video is good enough for everybody else to waste their time (better than yours, I guess), instead of doing themselves a favor and distil/transcribe that precious information for everybody else to see.

From the video, which should be in the logs that you continue to ignore:

“Failed to unmount /usr: Device or resource busy”.

Post your config file…

clean install. make a vm. shutdown.

as per video. as per description of video, as per debug ticket issue description.

for people who don’t believe how simple the repro is, I made a video proving it

Yes. I didn’t include the “how to clean install truenas” bit… but if you want, you can refer to my previous video, or I can provide the clip to that.

I got it.

(You could have saved all of us a lot of time by being a bit more professional when you expect free help).

Issue:
TrueNAS Scale VM shutdown issue with:
Fresh install of TrueNAS Scale 24.x running as a VM under Fresh install TrueNAS Scale 24.x.

Setup:
In the VM, configure any VM (example below, but seems that the VM definition itself is the issue), does not need to be a valid VM config, does not need to be started.

Issue:
The VM does not shutdown quickly/properly (hangs).
VM would shutdown quickly, AFTER I delete the “mock-up” VM.

VM config:
All Defaults, except:
Linux Guest OS.
No auto-start.
CPU Mode: Host Model.
Create new disk image - virtio - zvol-created-earlier - 10 GiB.
Attach NIC: only available (ens3).

Tests done:

  • Deleted Disk, deleted NIC. Shutdown hangs.
  • Deleted Display. VM with zero devices. Shutdown hangs.

I tested with TrueNAS Dragonfish-24.04.2-MASTER-20240607-013916 as a VM under Dragonfish-24.04.2-MASTER-20240607-013916 as Hypervisor.

Logs:

middlewared.log:
[2024/06/09 20:47:51] (DEBUG) middlewared._event_system_shutdown():145 - VM(s) stopped successfully
[2024/06/09 20:47:51] (DEBUG) EtcService.generate():429 - No new changes for /etc/nginx/nginx.conf
[2024/06/09 20:47:51] (INFO) middlewared.terminate():2039 - Terminating
[2024/06/09 20:47:51] (WARNING) middlewared.plugins.service
.services.base.systemd_unit():171 - nginx reload failed with code 4: “Failed to reload
nginx.service: Transaction for nginx.service/reload is destructive (nginx.service has ‘stop’ job queued, but ‘reload’ is included in transaction)
.\nSee system logs and ‘systemctl status nginx.service’ for details.\n”
[2024/06/09 20:48:01] (ERROR) ServiceService.stop():271 - Service ‘libvirtd’ running after stop
[2024/06/09 20:48:01] (DEBUG) EtcService.generate():429 - No new changes for /etc/nginx/nginx.conf
[2024/06/09 20:49:21] (ERROR) middlewared.__terminate():2065 - Failed to terminate vm
Traceback (most recent call last):
File “/usr/lib/python3.11/asyncio/tasks.py”, line 490, in wait_for
return fut.result()
^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/vm/lifecycle.py”, line 104, in terminate
await self.middleware.call(‘vm.deinitialize_vms’, {‘stop_libvirt’: False})
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1564, in call
return await self._call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1417, in _call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 187, in nf
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/vm/lifecycle.py”, line 82, in deinitialize_vms
await self.middleware.call(‘service.reload’, ‘http’)
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1564, in call
return await self.call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1417, in call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 187, in nf
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 47, in nf
res = await f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/service.py”, line 359, in reload
await service_object.reload()
File "/usr/lib/python3/dist-packages/middlewared/plugins/service
/services/pseudo/misc.py", line 68, in reload
await systemd_unit(“nginx”, “reload”)
File "/usr/lib/python3/dist-packages/middlewared/plugins/service
/services/base.py", line 169, in systemd_unit
result = await run(“systemctl”, verb, unit, check=False, encoding=“utf-8”, stderr=subprocess.STDOUT)

File “/usr/lib/python3/dist-packages/middlewared/utils/init.py”, line 68, in run
stdout, stderr = await proc.communicate(input_)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/asyncio/subprocess.py”, line 195, in communicate
stdin, stdout, stderr = await tasks.gather(stdin, stdout, stderr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/asyncio/subprocess.py”, line 175, in _read_stream
output = await stream.read()
^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/asyncio/streams.py”, line 682, in read
block = await self.read(self._limit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/asyncio/streams.py”, line 689, in read
await self._wait_for_data(‘read’)
File “/usr/lib/python3.11/asyncio/streams.py”, line 522, in _wait_for_data
await self._waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 2063, in __terminate
await asyncio.wait_for(self.create_task(service.terminate()), timeout)
File “/usr/lib/python3.11/asyncio/tasks.py”, line 492, in wait_for
raise exceptions.TimeoutError() from exc
TimeoutError

====================
error:
Jun 9 20:52:59 truenas blkmapd[2139]: exit on signal(15)
Jun 9 20:53:36 truenas libvirtd[2626]: internal error: connection closed due to keepalive timeout
Jun 9 20:53:36 truenas libvirtd[2626]: internal error: connection closed due to keepalive timeout
Jun 9 20:53:39 truenas libvirtd[2626]: internal error: connection closed due to keepalive timeout
Jun 9 20:53:39 truenas libvirtd[2626]: internal error: connection closed due to keepalive timeout
Jun 9 20:54:29 truenas systemd[1]: Failed unmounting var-log.mount.
Jun 9 20:54:29 truenas systemd[1]: Failed unmounting var.mount.
Jun 9 20:54:54 truenas kernel: Error: Driver ‘pcspkr’ is already registered, aborting…
Jun 9 20:55:17 truenas smartd[2139]: Configuration file /etc/smartd.conf parsed but has no entries
Jun 9 20:55:17 truenas blkmapd[2173]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
Jun 9 20:55:18 truenas libvirtd[2596]: invalid argument: cannot find architecture arm
Jun 9 20:56:00 truenas blkmapd[2173]: exit on signal(15)
Jun 9 20:56:38 truenas libvirtd[2596]: internal error: connection closed due to keepalive timeout
Jun 9 20:56:38 truenas libvirtd[2596]: internal error: connection closed due to keepalive timeout
Jun 9 20:56:40 truenas libvirtd[2596]: internal error: connection closed due to keepalive timeout
Jun 9 20:56:40 truenas libvirtd[2596]: internal error: connection closed due to keepalive timeout
Jun 9 20:57:30 truenas systemd[1]: Failed unmounting var-log.mount.
Jun 9 20:57:30 truenas systemd[1]: Failed unmounting var.mount.

====================
syslog:
Jun 9 20:54:29 truenas systemd[1]: opt.mount: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Unmounted opt.mount - /opt.
Jun 9 20:54:29 truenas systemd[1]: root.mount: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Unmounted root.mount - /root.
Jun 9 20:54:29 truenas systemd[1]: var-local-ca\x2dcertificates.mount: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Unmounted var-local-ca\x2dcertificates.mount - /var/local/ca-certificates.
Jun 9 20:54:29 truenas systemd[1]: var-db-system.mount: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Unmounted var-db-system.mount - /var/db/system.
Jun 9 20:54:29 truenas systemd[1]: Unmounting var.mount…
Jun 9 20:54:29 truenas umount[8874]: umount: /var: target is busy.
Jun 9 20:54:29 truenas systemd[1]: var.mount: Mount process exited, code=exited, status=32/n/a
Jun 9 20:54:29 truenas systemd[1]: Failed unmounting var.mount.
Jun 9 20:54:29 truenas systemd[1]: mnt.mount: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Unmounted mnt.mount - /mnt.
Jun 9 20:54:29 truenas systemd[1]: Stopped target local-fs-pre.target - Preparation for Local File Systems.
Jun 9 20:54:29 truenas systemd[1]: Reached target umount.target - Unmount All Filesystems.
Jun 9 20:54:29 truenas systemd[1]: systemd-tmpfiles-setup-dev.service: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Stopped systemd-tmpfiles-setup-dev.service - Create Static Device Nodes in /dev.
Jun 9 20:54:29 truenas systemd[1]: systemd-sysusers.service: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Stopped systemd-sysusers.service - Create System Users.
Jun 9 20:54:29 truenas systemd[1]: systemd-remount-fs.service: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Stopped systemd-remount-fs.service - Remount Root and Kernel File Systems.
Jun 9 20:54:29 truenas systemd[1]: Reached target shutdown.target - System Shutdown.
Jun 9 20:54:29 truenas systemd[1]: Reached target final.target - Late Shutdown Services.
Jun 9 20:54:29 truenas systemd[1]: systemd-poweroff.service: Deactivated successfully.
Jun 9 20:54:29 truenas systemd[1]: Finished systemd-poweroff.service - System Power Off.
Jun 9 20:54:29 truenas systemd[1]: Reached target poweroff.target - System Power Off.
Jun 9 20:54:29 truenas systemd[1]: Shutting down.
Jun 9 20:54:30 truenas systemd-shutdown: Syncing filesystems and block devices.
Jun 9 20:54:30 truenas systemd-shutdown: Sending SIGTERM to remaining processes…
Jun 9 20:54:30 truenas systemd-journald[543]: Received SIGTERM from PID 1 (systemd-shutdow).
Jun 9 20:54:30 truenas systemd-journald[543]: Journal stopped

(Look like all those “Failed unmounting” are “normal” as I see them when the VM shuts down quickly, AFTER I delete the “mock-up” VM).

Thankyou for confirming, that really helps.

I’m not expecting free help.

I’m reporting the issue. With a repro, workaround and video. I’m expecting the bug to be fixed.

2 Likes

Sigh.

This follow on bug has been closed as “unable to reproduce”

I just confirmed it again on a clean install of Dragonfish-24.04.3-MASTER-20240620-013913 hosted on Dragonfish 24.04.1.1

For anyone else who’d like to play at home…

on any version of TrueNAS Scale, create a TrueNAS VM, for the VM you need to use 24.04.0, or something with a more recent kernel than 24.04.1.1 (ie a nightly), because 24.04.1.x hangs in a vm

In that truenas VM, create a vm… don’t even install an os, connect an iso or start it… just create it.

now shutdown your vm (the one that has a non-started vm created)

You’ll find it times out waiting for vms which aren’t started to shutdown.

restart your vm…

delete the dummy vm you created…

shutdown…

and now it shuts down fast.

Truenas-fan has very clear repro instructions here:

I’ve pushed a fix for this here: NAS-129481 / 24.10 / dont hang shutdown for vms that arent started by yocalebo · Pull Request #13909 · truenas/middleware · GitHub

1 Like

Can confirm this is fixed in Dragonfish-24.04.3-MASTER-20240623-013913

With the new version, shutdown occurs within a couple of seconds, instead of after a 90s timeout!

:champagne:

Looks like this has been backported to 24.04.2 too. which will be good.

It makes a huge difference when starting/stopping TrueNAS VMs to change devices etc for testing.