TrueNAS 25.04-RC.1 is Now Available!

Lets start a new thread in General. Post the link here.

Document what the VMs and hardware are and if there was any signs before the freeze. After that w can submit a ticket if you have the diagnostics.

1 Like

Is anyone else having IO performance issues with an Incus VMs that overwrite data? I’m having some very poor results if I fill up a sparse zvol with random data, and then try to write over that data again. The write performance is halved at least.

I don’t have an answer to your question, but I do wonder, is a sparse volume really the best pick if you have a workload that write intensive?

Well I wouldn’t think it would matter since ZFS is a copy on write filesystem. I use sparse for space efficiency, but this example is just the test case I’m exploring right now.

The data I’m gathering seems to indicate that the IO code path with an Incus VM is…not good. I want to know if anyone else is seeing these issues.

There is no such thing as a free lunch…

When data is overwritten between snapshots, the blocks are written to new space as you expect. BUT the old blocks and their space are then reclaimed. This requires metadata work. If your metadata is on HDDs, it can have a performance impact. When on RAIDZ is also slower.

The ZFS log spacemap is the mechansism doing all the work to keep track of the free space.

https://sdimitro.github.io/post/zfs-lsm-flushing/

With thick provisioned zvols, there is more space allocated and so my guess is the performance degradation might be less severe or take longer to get to a steady state. Perhaps you can validate.

There should be little performance difference from Electric Eel to Fangtooth… if there is, then it’s worth looking at.

Thanks for this explanation of how ZFS works…

I tested yesterday in Electric Eel some and confirmed I’m seeing a notable performance degradation. At best there’s a notable increase in CPU processing overhead, at worst there’s a disk IO drop off and stability issues with the VMs.

I think we’re at the point where I don’t think this will be fixed in this release, I’m just trying to get this confirmed as an issue for future point releases.

The next question is whether its related to incus or just a general view on iSCSI zvols on small RAID-Z HDD pools.

If its incus related, its an issue.
If its a degradation from 24.10… its an issue

If its just RAID-Z HDD performance, we know how to solve that. More vdevs or svdev or use flash.

We test iSCSI zvols extensively… on HDDs and NVMe. These are all working fine.

What CPU are you running? It’s possible there were some changes to speculative execution mitigations that may have negatively impacted your system.

AMD Ryzen 5 5600, and no I don’t think that was any part of the issue.

My issues seem to have mostly been something a little weird about how Incus was handling a disk it managed vs a disk that was attached from somewhere else. More in this thread, but we found some CLI tweaks that made it work a lot better for me and there are a few code changes coming in the release tomorrow that may help as well.

1 Like