Hi all,
I have a terramaster F4-424 running truenas scale ElectricEel 24.10.0.2
Intel N95 4 cores 4 threads
32gb DDR5 ram 8gb assigned to a VM
2x NVME 256gb = Boot pool
2x 10TB HGST Ultrastart He10 + 2x 10TB Seagate Exos = pool RAIDZ2
I use this system as backup storage for another Truenas Scale using replication tasks.
I also run Proxmox Backup server VM to backup from a Proxmox server.
During either tasks I get a lot of iowait, it makes UI unusable.
Is there anything I could do to mitigate this?
Would add a cache drive to pool help help?
High IOWait means your CPU is waiting for data. In simpler terms, it means it is waiting on your storage. A lot of those operations are probably also sync writes, which exacerbates the problem.
Can you detail exactly, which is Proxmox? Which is TrueNAS. Are they the same machines? Which ones are VM’s which ones are baremetal? Not really clear from your OP.
Proxmox server is a Lenovo P520
Intel(R) Xeon(R) W-2255 CPU @ 3.70GHz
190gb DDR4 ram
nvme boot drive
nvme for VM’s and CT’s data
Contains a Truenas Scale VM - This is my main storage
2 pools of RAIDZ1 ssd’s
Terramaster F4-424 is used for backup only and runs Truenas scale with Proxmox backup server VM - 4 cores host and 8gb ram assigned
I replicate tasks from Truenas scale main storage and backup from Proxmox server but never at the same time.
I’m not familiar with those NVMe drives, but from the price, they look like consumer SSD’s, which typically are very poor drives for Proxmox workloads due to terrible sustained fsync performance once it runs out of SLC cache because it doesn’t have PIP. Many people don’t realize this, but consumer SSD’s can actually have worse fsync performance than even good quality spinning HDD.
You can do a quick test of the drives on Proxmox with the following command:
root@:~# pveperf /mnt/pve/vms_data
CPU BOGOMIPS: 147994.00
REGEX/SECOND: 4841542
HD SIZE: 1876.71 GB (/dev/nvme1n1p1)
BUFFERED READS: 1271.43 MB/sec
AVERAGE SEEK TIME: 0.13 ms
FSYNCS/SECOND: 754.70
DNS EXT: 39.89 ms
DNS INT: 37.12 ms (*.com)
This, is why you have high IO Wait. That drive is slow. For reference, here’s how it looks like on my enterprise SATA SSD:
root@pve1:~# pveperf /tank2/vm
CPU BOGOMIPS: 92000.00
REGEX/SECOND: 2836746
HD SIZE: 203.39 GB (tank2/vm)
FSYNCS/SECOND: 6788.17
DNS EXT: 92.81 ms
DNS INT: 120.90 ms (*.org)
As you can see, my fsync performance is 900% your drive’s value.
How do you fix it, you ask? Well, there are several ways.
The most correct way? Use enterprise SSD’s for your VM store OR add an SLOG device that uses an enterprise SSD. Even SATA ones will smoke that NVMe drive.
The cheapo poor man’s way? Disable sync on the dataset that hosts your VM’s like the following. Replace tank2/vm with your ZFS dataset path (pool_name/path-to-dataset/zvol):
zfs set sync=disabled tank2/vm
Note: Disabling sync is NOT recommended because it could lead to some data loss. Whether or not the risk is worth it is up to you.
I added a ssd SanDisk X300DC but the result is almost the same.
I believe it’s an enterprise ssd with under 20k hours and under 50k LBAs
Do you have recommendations for an ssd?
root@:~# pveperf /mnt/pve/testpve/
CPU BOGOMIPS: 147994.00
REGEX/SECOND: 4825500
HD SIZE: 879.14 GB (/dev/sda1)
BUFFERED READS: 511.76 MB/sec
AVERAGE SEEK TIME: 0.03 ms
FSYNCS/SECOND: 631.29
DNS EXT: 40.24 ms
DNS INT: 38.15 ms (.com)
I use any of the Intel S3500/3600/3700 series. Obviously, the higher the number the better (faster). Just pick one that fits your budget. You can probably get them used for cheaper. They’ll probably still have a good service life in them even used.
Thanks for the recommendation.
I saved a 960GB Samsung NVME PM983 MZ1LB960HAJQ on ebay.
Performance :
Ssd Endurance : 1366 Tb
Internal Data Rate : 3000 Mb/s (read) / 1100 Mb/s (write)
Maximum 4kb Random Write : 38000 Iops
Maximum 4kb Random Read : 400000 Iops
Mtbf : 2,000,000 Hours
I don’t have personal experience with the 983, but I’ve had good experience with the 863. I’d say since the 983 seems like a higher model number, it’s likely better, so it’s probably OK.