NVME Datastore - SLOG Device?

Good evening,

i have a big performance problem. We use Truenas Core with a ISCSI MPIO volume as a ESXi datastore. Its connected with dual 25 GB Card to our nodes. The Datastore consists of 6x NVME in a 2 way mirror. There are about 40 VMs in that datastore. Every few minutes we have a huge datastore latency of about 100 ms.

The main problem is when a user saves huge files in a big SQL Database, then it spikes instant.

My question: Will it benefit from a slog device? Now its SYNC=Standard.

Thank you.

I could be wrong here, but I don’t think it will because your datastore is already very fast storage device. SLOG devices are typically leveraged when the main storage pool is made up of much slower devices (ie. HDD’s).

The only thing I could think of that could speed it up is if you make an SLOG device out of a slice of your RAM. Not even sure if that’s possible and likely not a good idea because your RAM is not protected against power loss.

1 Like

Is it possible to Test it? I can disable the sync for a few minutes and Test 4k random writes. I have tested this with crystaldiskmark on a VM. 4 random writes are at 16 MB which is way to slow for that Setup.

Or am i wrong?

Full hardware details may be helpful to others replying. What do you have set for your Dataset Record Size? How full is the pool? Just as much info as you can provide.

Here some Details:

Truenas core with a iscsi Volume of 16 TiB, 9 TiB used. About 50 VMs running, alot of small Database Servers and RDS.

Sync: Standart

Compression: Off

Enable Atime: Off

Deduplication: Off

Record Size: 16K

Primarycache=all

Hardware:

2x

25 Gigabit Broadcom P225P SFP28

1x

AMD EPYC 7313P (3,00 GHz, 16-Core, 128 MB)

6x

7,68 TB Kioxia CD8-R U.2 NVMe SSD

4x

64 GB (1x 65536 MB) ECC Registered RAM

Thx

Did I understand it correctly that you have only one zvol (shared via iSCSI)?

And on the ESXi side, do you mount it as a local drive and then just store vm disks as files (vmdk? I don’t have experience with ESXi)?

This is right. ITS mounted with mpio round Robin to the esxo nodes. The 4k random writes are at 20 MB/s which is very slow for that Setup.

Do you have auto TRIM enabled on your pool? This smells like the NVMes have reached steady state for write amplification, and are struggling to GC and recover free blocks. You may need to destroy your pool, recreate it with 50% overprovisioning, and reload the data from backup.

SLOG won’t help in this case, as it’s intended for slower storage than what you have. Turning SYNC off will be better, if your NVMes have PLP.

Auto trim is Not enabled. I have to add that other speeds are perfect. My nvmes have PLP. CrystalDiskMark gives me 3700MB/s on a VM, it only Lacks at the 4k random write Performance.

Can i Test it somehow? Disable sync, make another test with crystaldiskmark?

Thank you!

I can remember that the 4k random write Speed have Always been that Low. Also at the beginning when i created the Volume.

I dont know if that helps

Yes, try disabling sync and enabling auto TRIM. You’ll see a write performance hit at the initial auto TRIM, but this should stabilize over time.

Hey @Noise

Reviewing some of this, including your system specs. I’m going to jump in with one important thing and then we’ll look at performance questions.

Header for Attention!

Important bit here right off the bat - sync=standard is effectively equivalent to async/disabled on iSCSI for ESXi/VMware. Without manually setting sync=always for this ZVOL, you potentially have data that could be at risk in a sudden-shutdown scenario (eg: hardware component failure) - so you’ll want to set that as enabled.

Will this result in the requirement for a SLOG device? Possibly. You have CD8-R drives which should have PLP, but they are the -R (read intensive) models … but what you’re seeing should be within reason for sync=always.

With that out of the way, let’s get to the questions and thoughts.

Number one - can I ask why compression is disabled on the ZVOL? You’ve got more than enough CPU horsepower on the TrueNAS side to benefit from it, and the default LZ4 will early-abort on compressible data. You’d have to be able to hit a 1.33x ratio to make this effective for space-saving on disk given your volblocksize and ashift, but it could still save memory in active ARC (since that’s compressed as well) meaning more cache hits.

Generally when 4K write speed is low, what you’re fighting with is end-to-end latency. It might require a check in the BIOS unless your IPMI menu allows you to look, but investigate any “power savings” settings for PCIe link speeds, PCIe link ASPM (active state power management) and disable them. If your board is set to defaults it may be putting your Kioxia drives or the link to them into a lower-power, lower-bandwidth state - and then it’s got to “wake” them for the writes. This costs a few fractions of a second, but it’s nontrivial when you’re at bonded 25Gbps speeds. That said, you’re running sync=standard which should mean the tiny writes are hitting RAM, but it’s worth looking into this.

Network wise, you mentioned this is iSCSI MPIO - which is the correct way to do it - but did you set up the VMW_PSP_RR round-robin rule on the ESXi servers?

Since you’re on CORE, it should be:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "TrueNAS" -M "iSCSI Disk" -P "VMW_PSP_RR" -O "iops=1" -e "TrueNAS iSCSI Claim Rule"

This will tell your servers to flip links every I/O.

On the subject of networking - can you briefly describe the IP layout? The correct IP SAN topology for TrueNAS is two non-overlapping subnets, and Gandalf standing in the middle shouting YOU SHALL NOT PASS (traffic between them) - so for example:

VMware                         TrueNAS
192.168.1.101 --- Switch A --- 192.168.1.100

192.168.2.101 --- Switch B --- 192.168.2.100
---

You don’t want 192.168.1.101 to be able to see 192.168.2.100 in this layout. A vmkping -I from the first interface to the second target should fail with a no route to host message.

Welp, vm drives as files is already bad subpar (performance-wise), and using a single zvol for all VMs probably makes things even worse…

If my guess is correct, you should try to create one more zvol for this exact database vm with big files. Your latency on all other VMs should be ok after that (during big file commit).

If this would help, the next level play will be the creation of separate zvols for each vm. And mount them inside vms themselves (or mb ESXi can utilise iSCSI for a separate vm, idk). Thus, you would address the overhead that you have now: block storage (zvol) translated to file storage (ESXi datastore) translated to block storage (vm disk file) translated to file storage (filesystem inside vm).

I have started the trim (Auto Trim) and will disable the sync tomorrow for testing purposes. I will let you know!

Tomorrow i will try to disable the sync and make some performance tests. When the latency disappears does it mean i need a slog device?

The compression is disabled on the zvol cause i was reading it 2023 when i created it in a whitepaper. I can still turn it on.

The iSCSI MPIO is configured on the vcenter side, its already at 1 round-robin. The traffic is separated with vlans, they cant ping/see eachother.

In my experience, using raw files for VMs and only one zvol, works as good as using separate zvols per VM.

I did see a write-up saying that raw files could be faster, but can’t remember the specific details.
I currently have several W10, W11, Ubuntu, FreeBSD and HomeAssistant VMs and they all boot as fast, do updates and are as responsive, under both RAW files or ZVOLS.

Do you have an specific explanation, tests or facts for your claims?

At the end of the day, everything is a file…

Regarding the single zvol, I’m not 100% sure, but I suspect that sync writes to one file (vm disk) will slow down sync writes to other files. It’s pretty easy to test (with just creating a separate zvol for sync-write intensive vm).

Regarding the raw files, over a decade ago, my colleagues migrated vms from those to LVM. IOPS has increased 3-fold. The drives were HDD, though.

The Auto Trim is not starting. Should i start it manually?

Yes, you should perform a manual trim. Auto TRIM only applies for new blocks after you enabled it for the pool.

Good Morning,

i have trimmed the Pool and its now 100% complete. Now i have disabled the Sync of the volume.

Unfortunatly the latency spikes are still there (vCenter):

This is a random Virtual Machine from the vmdatastore.

And this is the actual situation with the trimmed pool and sync disabled:

After turning sync back to standard:

And the disk operations:

Disk busy:

Thank you