High iowait on data transfer

Hi all,
I have a terramaster F4-424 running truenas scale ElectricEel 24.10.0.2
Intel N95 4 cores 4 threads
32gb DDR5 ram 8gb assigned to a VM
2x NVME 256gb = Boot pool
2x 10TB HGST Ultrastart He10 + 2x 10TB Seagate Exos = pool RAIDZ2

I use this system as backup storage for another Truenas Scale using replication tasks.
I also run Proxmox Backup server VM to backup from a Proxmox server.

During either tasks I get a lot of iowait, it makes UI unusable.
Is there anything I could do to mitigate this?
Would add a cache drive to pool help help?

A good read on explaining what IOWait is and what you can do What is iowait and how does it affect Linux performance?

Tbh would need a lot more information to try help find the cause of it, but most likely something happening that means the job is waiting to complete.

1 Like

High IOWait means your CPU is waiting for data. In simpler terms, it means it is waiting on your storage. A lot of those operations are probably also sync writes, which exacerbates the problem.

Thanks for your reply.
What information would you need? I tried to add pictures of it but since I’m a new member I can do that yet.

Can you detail exactly, which is Proxmox? Which is TrueNAS. Are they the same machines? Which ones are VM’s which ones are baremetal? Not really clear from your OP.

Proxmox server is a Lenovo P520
Intel(R) Xeon(R) W-2255 CPU @ 3.70GHz
190gb DDR4 ram
nvme boot drive
nvme for VM’s and CT’s data
Contains a Truenas Scale VM - This is my main storage
2 pools of RAIDZ1 ssd’s

Terramaster F4-424 is used for backup only and runs Truenas scale with Proxmox backup server VM - 4 cores host and 8gb ram assigned
I replicate tasks from Truenas scale main storage and backup from Proxmox server but never at the same time.

Which storage are the backups stored in? I’m guessing the Terramaster storage?

So you are backing up from Proxmox NVMe to Terramaster 2x 10TB?

Also, the VM/CT storage, what kind of NVMe drives are those?

Backups are stored on 2x 10TB HGST Ultrastar He10 + 2x 10TB Seagate Exos 1 pool RAIDZ2.

VM/CT storage is Ediloca EN600 PRO SSD 2TB PCle 3.0x4

I’m not familiar with those NVMe drives, but from the price, they look like consumer SSD’s, which typically are very poor drives for Proxmox workloads due to terrible sustained fsync performance once it runs out of SLC cache because it doesn’t have PIP. Many people don’t realize this, but consumer SSD’s can actually have worse fsync performance than even good quality spinning HDD.

You can do a quick test of the drives on Proxmox with the following command:

pveperf /path-to-your-vm-storage/

Post your output here.

1 Like

Here you go

pveperf /dev/nvme1n1
CPU BOGOMIPS: 147994.00
REGEX/SECOND: 4857922
HD SIZE: 94.11 GB (udev)
open failed: Not a directory

I did the same for other nvme’s and the result is the same?

You’re using it wrong and you’re not getting the disk performance numbers. Don’t use /dev/nvme1n1. Point it to the actual mount point.

2 Likes

My bad, I never used this command before :smile:

root@:~# pveperf /mnt/pve/vms_data
CPU BOGOMIPS: 147994.00
REGEX/SECOND: 4841542
HD SIZE: 1876.71 GB (/dev/nvme1n1p1)
BUFFERED READS: 1271.43 MB/sec
AVERAGE SEEK TIME: 0.13 ms
FSYNCS/SECOND: 754.70
DNS EXT: 39.89 ms
DNS INT: 37.12 ms (
*.com)

This, is why you have high IO Wait. That drive is slow. For reference, here’s how it looks like on my enterprise SATA SSD:

root@pve1:~# pveperf /tank2/vm
CPU BOGOMIPS: 92000.00
REGEX/SECOND: 2836746
HD SIZE: 203.39 GB (tank2/vm)
FSYNCS/SECOND: 6788.17
DNS EXT: 92.81 ms
DNS INT: 120.90 ms (*.org)

As you can see, my fsync performance is 900% your drive’s value.

How do you fix it, you ask? Well, there are several ways.

The most correct way? Use enterprise SSD’s for your VM store OR add an SLOG device that uses an enterprise SSD. Even SATA ones will smoke that NVMe drive.

The cheapo poor man’s way? Disable sync on the dataset that hosts your VM’s like the following. Replace tank2/vm with your ZFS dataset path (pool_name/path-to-dataset/zvol):

zfs set sync=disabled tank2/vm

Note: Disabling sync is NOT recommended because it could lead to some data loss. Whether or not the risk is worth it is up to you.

1 Like

Thanks again for your reply.

I added a ssd SanDisk X300DC but the result is almost the same.
I believe it’s an enterprise ssd with under 20k hours and under 50k LBAs

Do you have recommendations for an ssd?

root@:~# pveperf /mnt/pve/testpve/
CPU BOGOMIPS: 147994.00
REGEX/SECOND: 4825500
HD SIZE: 879.14 GB (/dev/sda1)
BUFFERED READS: 511.76 MB/sec
AVERAGE SEEK TIME: 0.03 ms
FSYNCS/SECOND: 631.29
DNS EXT: 40.24 ms
DNS INT: 38.15 ms (
.com)

I use any of the Intel S3500/3600/3700 series. Obviously, the higher the number the better (faster). Just pick one that fits your budget. You can probably get them used for cheaper. They’ll probably still have a good service life in them even used.

Thanks for the recommendation.
I saved a 960GB Samsung NVME PM983 MZ1LB960HAJQ on ebay.
Performance :
Ssd Endurance : 1366 Tb
Internal Data Rate : 3000 Mb/s (read) / 1100 Mb/s (write)
Maximum 4kb Random Write : 38000 Iops
Maximum 4kb Random Read : 400000 Iops
Mtbf : 2,000,000 Hours

Would this also be suitable?

I don’t have personal experience with the 983, but I’ve had good experience with the 863. I’d say since the 983 seems like a higher model number, it’s likely better, so it’s probably OK.

1 Like