Dragonfish 24.04.1.1 in a VM, hangs on boot, can stall on shutdown

Dumb question and I’m not sure if I have an idea but what does “setting root” mean. Set password, set shell? …I really dont have much of an idea but I’ll look something up and see if I can pull on that thread a little bit to see what happens and get back to you.

You think your config has something to do with it. Have you tried viewing with the DB viewer?

I mean setting a “root user” account password in the installer.

I had been testing by minimizing the changes from default.

The problem is it doesn’t fail 100% of the time, leading to false positives, so I had begun to think that using a root account vs an admin acount, makes a difference.

The latest results are I can do a clean install, with a boot device and no pool. Set either an admin password or a root password and it will hang. Typically while generating some keys.

When set to 8GiB of ram

But not 100% of the time.

It works 100% of the time with 24.4.0.

Okay, my possible dumb idea was ‘run0’ and Systemd 256. I’m going on the concept of my previous thought about “read only”.

This could be a time suck so I’d verify the versions first before going down this path.

Also, typing on my phone so not at a laptop.

1 Like

Well, you don’t really need dracut in the initrd at all anymore to have a working initrd. Except for certain kinds of exotic storage

Quoting pottering…

Wonder if ZFS counts as exotic :wink:

I shall be ordering more memory so I can embark on this journey also! See if I have the same issue if not resolved by the time I get it and install it, probably a couple weeks. I’ll probably use 16GB ram though for the VM.

I won’t be updating like you though, so it will be different than yours. I guess I could briefly try the same version as you before continuing on to Electric Eel once the docker stuff makes it to the Beta.

1 Like

Here’s another example of something that sounds very similar, but with Proxmox as the Hypervisor.

Here is a reproduction video I made.

It would be good if anyone else could reproduce…

All you need to do is create an 8GB VM with a 32GB boot zvol…

A clean install of TruenNAS will hang when running in a VM with 8GB of RAM. It works fine on all previous versions of TrueNAS (Scale, Core and FreeNAS)), including 24.04.0

In the video on a TrueNAS 23.10.2 system with 128GB of RAM and a Xeon E5-2699A v4 CPU, I setup a VM with 8GB of RAM, 4 cores, a VirtIO NIC on a bridged interface, and a VirtIO 32GB boot zvol.

I then clean install 24.04.1.1 from verified ISO, and reboot it 6 times. It hangs on startup 3 out 6 times.

I then clean install 24.04.0, and reboot it 5 times, it starts correctly 5x in a row.

The ISOs are shasum OK. I have also confirmed this behaviour on a Dragonfish 24.04.1.1 system running on a Xeon D-1541 with 32GB of RAM.

Bug Report:

NAS-129406 : Clean Install of Dragonfish 24.04.1.1 in a VM fails to boot 50% of the time, works with 24.04.0

I may have missed the real important details, among everything else, but need to ask:

Which specific vendor/version Hypervisor and any other details, not to be assumed?

I am testing on TrueNAS 23.10.2 (128GB RAM) and TrueNAS 24.04.1.1 (32GB RAM)

Those are the hypervisors. (ie Linux KVM)

The video was record on 23.10.2

Nevermind. Had to watch a video to get the important details.

Will test with raw files as all my other VMs.

1 Like

A clean install of TruenNAS will hang when running in a VM with 8GB of RAM. It works fine on all previous versions of TrueNAS (Scale, Core and FreeNAS)), including 24.04.0

In the video on a TrueNAS 23.10.2 system with 128GB of RAM and a Xeon E5-2699A v4 CPU, I setup a VM with 8GB of RAM, 4 cores, a VirtIO NIC on a bridged interface, and a VirtIO 32GB boot zvol.

I then clean install 24.04.1.1 from verified ISO, and reboot it 6 times. It hangs on startup 3 out 6 times.

I then clean install 24.04.0, and reboot it 5 times, it starts correctly 5x in a row.

The ISOs are shasum OK. I have also confirmed this behaviour on a Dragonfish 24.04.1.1 system running on a Xeon D-1541 with 32GB of RAM.

Which details did I miss?

This is why I recorded the video, to catch any details I missed.

Ok, I can confirm that TrueNAS-SCALE-24.04.1.1.iso would install under TrueNAS-SCALE-24.04.1.1 as a VM (8GB RAM, 4 core, 10GB raw disk, 1 virtio NIC, 1 Display), but in my quick testing, it hangs/freezes during boot, right after 6-9 seconds, so I can assume that this can be an issue with another version.

(Boot never went past:
Either:
middlewared: setting up plugins

or

Configure swap filesystem on boot pool.
Generating DH parameters, 2048 bit long safe prime

)

1 Like

Wait!

I did get TrueNAS-SCALE-24.04.1.1 to boot under TrueNAS-SCALE-24.04.1.1 by
changing the RAM from 8 GB to 10 GB.

After that, changed it to 9 GB and then down to 8 GB.

It booted OK a few times with 8 GB RAM, but it did freeze while booting (as above)
when I tried for the 10th, 12th time.

Oh wait, it froze after a few boots, with 9 GB RAM, also.

Not trying anymore with 10GB, there is definitely something going on with this very specific combination of hypervisor under hypervisor.

1 Like

Meanwhile, 24.04.0 will boot fine.

Thanks for confirming that’s it’s not just my system.

New 24.04.2 nightly with an updated 6.6.32 kernel.

I just tested the new nightly I was able to boot 6x in a row without hanging.

:champagne:

Will now test upgrading my demo vms (the ones I use for making videos…)

Nice, I may get to test it in my secondary server.

FYI. I did a reboot today under 24.04.1.1 and it took a looong time (8-9 min in some zfs step that never took more than 1-2 min.

Sorry, was not expecting it, so did not document it, I am sure others have seen it and already reported it.

check how many snapshots you have…

Thanks, yes, I heard the platter HD trashing, so I knew it was data related.

Funny thing is that I actually purged thousands of snapshots a few days before the reboot (i.e. system has had similar snapshot number before, and never took that long to boot?).

Hmmm 10k gives us a warning. I have ran several systems with 15-25k snapshots without the boot delay I mentioned.

(ok, platter HD has 27k + 9k snapshots, more than I usually keep, but the exact same “Data Protection” tasks in the last 6 months?, so, boot times go double when you have more than 25k snapshots?).

LOCAL_HDD_1_1a@LOCAL_NVME_1_1a-hourly-snap-task- = 27036
LOCAL_HDD_1_1a @LOCAL_NVME_2_1a-hourly-snap-task- = 9865

I know this is not your dog (zfs), but at some point, we all need clear limits, besides the ‘unlimited’ promises, right?

its a bug