Incus VM Crashing

gedavids · April 10, 2025, 4:03pm

Which version of Ubuntu are you using? The 24.04 LTS or newer? Red Hat/Oracle Linux 9 are on a pretty old kernel, 5.15. I wonder if that’s contributing to my issues?

NickF1227 · April 10, 2025, 8:30pm

I assume by this you mean you didnt run into any issues?

root@intersect:/opt/gravwell/etc# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.10
Release:	24.10
Codename:	oracular
root@intersect:/opt/gravwell/etc# uname -r
6.11.0-9-generic

gedavids · April 11, 2025, 12:54am

Indeed, much better behaved right out of the gate. I was just following the instructions for migrating an existing VM, that’s why I’ve been attaching drives this whole time.

I tried Ubuntu 24.10, and the performance is worse somehow. Even though the overall system load is even higher, it doesn’t seem to be throwing NVMe errors like the Oracle9 VM did.

NickF1227 · April 11, 2025, 3:53pm

The problem you’ve run into appears to be due to how Incus is handling the cache behavior for “disks” versus “Volumes”. In the next release the UI will encourage the users to import their existing ZVOLs into an Incus managed “Volume” where the proper ZFS driver optimizations and caching mechanisms will be in play automatically.

gedavids · April 11, 2025, 4:19pm

Then I’m happy my suffering was not in vain.

So look for this in the next point release? (ie, not the 25.04 release in a few days)

NickF1227 · April 11, 2025, 4:25pm

The PR looks to be in 25.04.0

Captain_Morgan · April 18, 2025, 1:34am

Is this the PR?

github.com/truenas/zfs

NAS-134891 / 25.04.0 / linux: zvols: correctly detect flush requests (#17131)

stable/fangtooth ← NAS-134891-ft

opened 04:24PM - 19 Mar 25 UTC

ixhamza

+1 -1

since 4.10, bio->bi_opf needs to be checked to determine all kinds of flush requ…ests. this was the case prior to the commit referenced below, but the order of ifdefs was not the usual one (newest up top), which might have caused this to slip through. this fixes a regression when using zvols as Qemu block devices, but might have broken other use cases as well. the symptoms are that all sync writes from within a VM configured to use such a virtual block devices are ignored and treated as async writes by the host ZFS layer. this can be verified using fio in sync mode inside the VM, for example with fio \ --filename=/dev/sda --ioengine=libaio --loops=1 --size=10G \ --time_based --runtime=60 --group_reporting --stonewall --name=cc1 \ --description="CC1" --rw=write --bs=4k --direct=1 --iodepth=1 \ --numjobs=1 --sync=1 which shows an IOPS number way above what the physical device underneath supports, with "zpool iostat -r 1" on the hypervisor side showing no sync IO occuring during the benchmark. with the regression fixed, both fio inside the VM and the IO stats on the host show the expected numbers. Fixes: 846b5985192467acabf5484ae610b4b37b7f8162 "config: remove HAVE_REQ_OP_* and HAVE_REQ_*" Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> ### Motivation and Context ### Description ### How Has This Been Tested? - CI Testing ### Types of changes - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Performance enhancement (non-breaking change which improves efficiency) - [ ] Code cleanup (non-breaking change which makes code smaller or more readable) - [ ] Breaking change (fix or feature that would cause existing functionality to change) - [ ] Library ABI change (libzfs, libzfs\_core, libnvpair, libuutil and libzfsbootenv) - [ ] Documentation (a change to man pages or other documentation) ### Checklist: - [ ] My code follows the OpenZFS [code style requirements](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#coding-conventions). - [ ] I have updated the documentation accordingly. - [ ] I have read the [**contributing** document](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md). - [ ] I have added [tests](https://github.com/openzfs/zfs/tree/master/tests) to cover my changes. - [ ] I have run the ZFS Test Suite with this change applied. - [ ] All commit messages are properly formatted and contain [`Signed-off-by`](https://github.com/openzfs/zfs/blob/master/.github/CONTRIBUTING.md#signed-off-by).

NickF1227 · April 18, 2025, 1:47am

github.com/truenas/webui

NAS-134583 / 25.04.0 / Updates to VM volumes (by undsoft)

stable/fangtooth ← NAS-134583-25.04.0

opened 09:11AM - 20 Mar 25 UTC

bugclerk

+5946 -4115

Automatic cherry-pick failed. Please resolve conflicts by running: git rese…t --hard HEAD~1 git cherry-pick -x 0afebbbd7784f904e38ab322492a65b5a65c4c13 git cherry-pick -x 998711f7d62908ae4adf5b9defdd62a82b203b86 git cherry-pick -x 8bbae16c1bbc35d4d324a55b30bc27eeca81efcb git cherry-pick -x 68bdd161f3c7f9d1102219ff596a99b60915c753 git cherry-pick -x 4d6b67adc97b31e2ddfe0900f951368ccd8c913b git cherry-pick -x 85fa87ace37b1c42c49b406c0fc661899eaf13b7 git cherry-pick -x c4fd1fc7895168ba7756e9cc62f8d4187b4ca715 If the original PR was merged via a squash, you can just cherry-pick the squashed commit: git reset --hard HEAD~1 git cherry-pick -x d53dfd9cfed2c2335a4c198fd3eef1b3d57ecb7e **Changes:** - It is now possible to create new volumes in Volumes dialog. - It is now possible to import zfs zvols as volumes. - It's possible to set boot priority on vm disks. - User is directed to import zvols instead of using them directly. **Testing:** Test new options in Volumes dialog and test different selections for root disk when creating a VM. ### Downstream |Affects |Reasoning |----------------|------------------------------- |Documentation |New UI options. Original PR: https://github.com/truenas/webui/pull/11767 Jira URL: https://ixsystems.atlassian.net/browse/NAS-134583

gedavids · April 18, 2025, 4:24pm

That reminds me, I should probably retest things on the release code.

tokyotexture · April 24, 2025, 9:00pm

Found this thread just searching for OOM errors as my Windows VM crashes daily after moving to Fangtooth.

It’s a Windows VM, and any heavy IO usage bogs down the Windows UI to pretty much unusable. swtpm kept me from being able to start the crashed VM again, so I just removed the tpm device thinking that might have been the issue somehow. At least now I can restart the VM when it crashes, but didn’t help at all with the OOM crashing.

While OS is different, symptoms look very similar to OP…

edit: can’t add link to my JIRA ticket for some reason… # 135499 btw.

tokyotexture · April 26, 2025, 6:13pm

Changing from nvme to virtio-scsi for my Windows VM seems to have improved things drastically. Going on 24 hours and usually my VM crashes with oom every day or so since moving to fangtooth … I’ll report back if things are stable for a few more days, but it’s looking promising.

gedavids · April 26, 2025, 9:00pm

How large is the memory footprint of your VM and your system as a whole? I’ve been chasing down a problem I can’t quite figure out still. Seems like it might be related to memory fragmentation, but I’m not sure yet.

I did some retesting, and my system on a whole is a lot better compared to before with virtio-blk devices and all my zvols imported into the incus volume management. However, if I load up the IO, things still seem less than awesome. Like I was noticing cpu steal time on my other VMs, despite what seemed to be a not fully loaded CPU. When I looked at it closer, it seemed like it was coming in waves, like every 20 seconds or so there was a blip of steal time. Seemed like the host OS was showing some blips of zfs/zvol CPU usage too. ‘top’ will show at 2-3 running processes, then a blip of 20-30 for a split second. This IO pipeline just seems poorly optimized, but I’m not sure how to articulate it such that I could present it to the devs as something actionable.

The shear number of zfs threads I see in top makes think that maybe it’s just been written for a more modern server with tons of cores and that my 6 core system is just left in context switch hell as a result. When I’m not leaning on it hard in one of these test cases, I never noticed a problem.

tokyotexture · April 26, 2025, 9:18pm

I was doing 24GiB Windows VM on a 128GB system. It is my largest VM by far (main desktop), and rest is marginal. I run a 5900x so 12 cores … which should be way overkill for what I’m doing, so all considering, there seems to be something wonky.

Right now things are still looking ok with virtio … so fingers crossed.

gedavids · April 26, 2025, 10:07pm

If it’s memory fragmentation related, it took about a week to show up for me. I was running a 32GiB VM, that I increased to 48GiB because zvols in the incus datastore don’t get data cached to ARC (other than metadata anyway). Are you running your volumes from the incus vol manager, or is it attached from another location? The latter will cause a lot of churn in the ARC that would make a memory fragmentation problem worse…in theory.

When I changed it to 48GiB the other day and tried to fire it up, it almost immediately OOM’ed, despite 60+GiB free. I had manually forced the kernel to run memory compaction first as an experiment, so I’m not filing a ticket just yet. I tried again a few minutes later and it didn’t OOM, but ZFS went crazy, the CPU went to 100% and the system load just went into orbit. I force quit the VM when the system load hit 200+ after a few minutes and then rebooted. On a clean boot all was fine.

Anyway, that’s why I’m poking at memory fragmentation, since it seems to happen after the system has been up for a while, and only seems to be a problem with these large memory VMs.

tokyotexture · April 26, 2025, 10:59pm

Interesting… I am using zvol from outside of incus as this was a VM I migrated over from pre-fangtooth, and that was how I read I was supposed to do it…

Are you saying the ARC issue would get resolved if I copy/migrate “into” incus as opposed to referencing external zvol?

DjP-iX · April 28, 2025, 1:03pm

Possibly. In 25.04.0 you are required to either clone or move the zvol into an incus-managed dataset when creating a new VM based off of it. Performance was one of the benefits I heard cited for why this decision was made.

Stux · April 28, 2025, 2:45pm

May want to investigate this option

AFAICT, the only difference is the cache properties mentioned.

NickF1227 · April 28, 2025, 5:21pm

It’s a bit more than that. If your VMs drive is mounted as an “external ZVOL” Incus is treating it no differantly than an external hard drive. If it’s imported into the Incus “pool” it’s using Incus’s ZFS driver and so you benefit from those additional optimizations.

tokyotexture · April 28, 2025, 7:15pm

I’ll find some time to convert my zvol into a incus one later on. For now, wanted to report that moving from nvme to virtio-scsi now has me at 3 days (and running) uptime on the VM, where previously it would crash almost daily. Definitely seems to have improved things on my end …

Stux · April 28, 2025, 9:24pm

This is my point.

The only practical difference is the zfs driver disables the cache when activating the zvol device before attaching it to the vm.

(I’m familiar enough with the internal workings of the zfs driver to have submitted bug fixes on it ;))