HI all. I’m building an new sever to replace my trusty Dell T620.
New server is based on ElectricEel-24.10-RC.1:
- Supermicro X11SPW-TF LGA 3647 Socket P Motherboard
- Intel Xeon Gold 6230 2.1GHz 27.5MB 20-Core 125W
- 4 x 64GB 4DRX4 PC4-2666V LRDIMM Samsung
- 2 x Intel Optane 128GB DDR4 PC4-2666 288p DCPMM Persistent Memory NMA1XXD128GPS
- LSI SAS9300-8i HBA
- LSI SAS9300-8e HBA
Am I wrong in thinking that intel DCPMM should give a SLOG new RAM write speeds?
I have a test pool made up of 7x2 way SAS SSD mirrors:
root@truenas[/mnt/flash]# zpool status flash
pool: flash
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
flash ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
dfb03ef0-460e-487c-947e-b98912d2dfab ONLINE 0 0 0
fdd03cb4-1337-4c7c-8aff-5069c8a556a0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
565b77a8-7225-4213-9591-07cd7e5ca576 ONLINE 0 0 0
8342d232-3534-4118-9b6c-6040ab3f979b ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
fd50dd19-1339-43b9-bc58-e6cbc66fcdac ONLINE 0 0 0
6860f72f-5257-4b06-bdc3-b6455f1d5366 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
99cfae17-c447-4f08-832d-687e508ad740 ONLINE 0 0 0
5a08d706-36cc-402f-9ea2-c65c14ee5139 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
34aaef9a-f31d-446a-ae67-a55e1beabef9 ONLINE 0 0 0
b12d1d67-da9b-4258-a293-ef6104940927 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
b7cd53dc-4453-4dfe-83ea-0e0249734e1c ONLINE 0 0 0
7536085a-7243-4f43-902a-58406403d872 ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
abf484d8-d5d2-4ba6-97d7-26aa99589e3d ONLINE 0 0 0
f999e01f-1a6f-4b3e-b49d-8af9cfe142a1 ONLINE 0 0 0
logs
mirror-10 ONLINE 0 0 0
88103c20-9249-446e-8d64-2857d761bf3c ONLINE 0 0 0
6c532345-1d3f-4926-8ab2-8d4de414a52b ONLINE 0 0 0
spares
b5b1ab22-b4c0-4b84-ba43-d41aed0c35e3 AVAIL
Lets get the baseline of the pool with sync off:
root@truenas[/mnt/flash/test]# fio --name=write --rw=write -direct=1 --ioengine=libaio --bs=4k --numjobs=16 --size=32G --runtime=600 --group_reporting
write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.33
Starting 16 processes
Jobs: 5 (f=2): [_(6),W(1),f(1),_(3),f(1),W(1),_(1),f(1),_(1)][100.0%][w=3844MiB/s][w=984k IOPS][eta 00m:00s]
write: (groupid=0, jobs=16): err= 0: pid=13929: Fri Oct 4 05:42:19 2024
write: IOPS=1088k, BW=4250MiB/s (4456MB/s)(512GiB/123369msec); 0 zone resets
slat (usec): min=2, max=29640, avg=13.42, stdev=43.49
clat (nsec): min=330, max=21491k, avg=738.77, stdev=6810.97
lat (usec): min=2, max=29645, avg=14.16, stdev=44.14
clat percentiles (nsec):
| 1.00th=[ 394], 5.00th=[ 426], 10.00th=[ 454], 20.00th=[ 470],
| 30.00th=[ 490], 40.00th=[ 532], 50.00th=[ 708], 60.00th=[ 836],
| 70.00th=[ 908], 80.00th=[ 980], 90.00th=[ 1080], 95.00th=[ 1144],
| 99.00th=[ 1304], 99.50th=[ 1384], 99.90th=[ 1624], 99.95th=[ 3952],
| 99.99th=[13632]
bw ( MiB/s): min= 3473, max= 5333, per=100.00%, avg=4262.05, stdev=19.91, samples=3912
iops : min=889155, max=1365259, avg=1091084.89, stdev=5096.65, samples=3912
lat (nsec) : 500=34.49%, 750=18.05%, 1000=30.32%
lat (usec) : 2=17.07%, 4=0.02%, 10=0.03%, 20=0.02%, 50=0.01%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=7.89%, sys=89.95%, ctx=93267, majf=0, minf=179
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,134217728,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=4250MiB/s (4456MB/s), 4250MiB/s-4250MiB/s (4456MB/s-4456MB/s), io=512GiB (550GB), run=123369-123369msec
Now with sync on:
root@truenas[/mnt/flash/test]# fio --name=write --rw=write -direct=1 --ioengine=libaio --bs=4k --numjobs=16 --size=32G --runtime=600 --group_reporting
write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.33
Starting 16 processes
Jobs: 16 (f=16): [W(16)][100.0%][w=200MiB/s][w=51.2k IOPS][eta 00m:00s]
write: (groupid=0, jobs=16): err= 0: pid=15747: Fri Oct 4 05:55:35 2024
write: IOPS=51.9k, BW=203MiB/s (213MB/s)(119GiB/600002msec); 0 zone resets
slat (usec): min=80, max=96997, avg=301.53, stdev=608.30
clat (nsec): min=776, max=40122k, avg=3883.38, stdev=8117.72
lat (usec): min=81, max=97007, avg=305.41, stdev=608.44
clat percentiles (nsec):
| 1.00th=[ 1880], 5.00th=[ 2992], 10.00th=[ 3056], 20.00th=[ 3184],
| 30.00th=[ 3312], 40.00th=[ 3536], 50.00th=[ 3792], 60.00th=[ 4016],
| 70.00th=[ 4256], 80.00th=[ 4512], 90.00th=[ 4768], 95.00th=[ 4960],
| 99.00th=[ 5856], 99.50th=[ 7200], 99.90th=[15936], 99.95th=[19840],
| 99.99th=[29568]
bw ( KiB/s): min=164940, max=303768, per=100.00%, avg=207883.21, stdev=781.43, samples=19184
iops : min=41235, max=75942, avg=51969.94, stdev=195.34, samples=19184
lat (nsec) : 1000=0.01%
lat (usec) : 2=1.10%, 4=58.00%, 10=40.63%, 20=0.22%, 50=0.05%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 50=0.01%
cpu : usr=2.29%, sys=29.88%, ctx=59251534, majf=0, minf=164
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,31167037,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=119GiB (128GB), run=600002-600002msec
Sync OFF 1088k IOPS vs Sync ON 51.9k IOPS
Can anyone help me understand how there is such a performance drop using PMEM. It has to be a config issue.
PMEM setup as a 1x2 mirror sync off:
root@truenas[/mnt/pmem/test]# fio --name=write --rw=write -direct=1 --ioengine=libaio --bs=4k --numjobs=16 --size=32G --runtime=600 --group_reporting
write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.33
Starting 16 processes
Jobs: 5 (f=4): [_(2),W(2),_(3),f(1),W(1),_(1),W(1),_(5)][98.5%][w=2435MiB/s][w=623k IOPS][eta 00m:03s]
write: (groupid=0, jobs=16): err= 0: pid=19381: Fri Oct 4 06:04:15 2024
write: IOPS=687k, BW=2684MiB/s (2814MB/s)(512GiB/195331msec); 0 zone resets
slat (usec): min=2, max=75317, avg=21.84, stdev=114.52
clat (nsec): min=344, max=35528k, avg=765.58, stdev=9771.59
lat (usec): min=2, max=75361, avg=22.60, stdev=115.47
clat percentiles (nsec):
| 1.00th=[ 406], 5.00th=[ 438], 10.00th=[ 466], 20.00th=[ 486],
| 30.00th=[ 524], 40.00th=[ 548], 50.00th=[ 588], 60.00th=[ 732],
| 70.00th=[ 836], 80.00th=[ 908], 90.00th=[ 1004], 95.00th=[ 1096],
| 99.00th=[ 2736], 99.50th=[ 4256], 99.90th=[10432], 99.95th=[14784],
| 99.99th=[47360]
bw ( MiB/s): min= 1959, max= 5115, per=100.00%, avg=2701.48, stdev=23.47, samples=6173
iops : min=501516, max=1309535, avg=691576.89, stdev=6009.59, samples=6173
lat (nsec) : 500=24.44%, 750=36.69%, 1000=28.48%
lat (usec) : 2=8.88%, 4=0.95%, 10=0.44%, 20=0.08%, 50=0.02%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=5.06%, sys=89.37%, ctx=125523, majf=0, minf=181
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,134217728,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=2684MiB/s (2814MB/s), 2684MiB/s-2684MiB/s (2814MB/s-2814MB/s), io=512GiB (550GB), run=195331-195331msec
PMEM setup as a 2 in a stripe sync off:
root@truenas[/mnt/pmem/test]# fio --name=write --rw=write -direct=1 --ioengine=libaio --bs=4k --numjobs=16 --size=32G --runtime=600 --group_reporting
write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
...
fio-3.33
Starting 16 processes
Jobs: 6 (f=4): [f(1),_(2),W(1),_(2),f(1),W(2),_(5),W(1),_(1)][99.4%][w=3198MiB/s][w=819k IOPS][eta 00m:01s]
write: (groupid=0, jobs=16): err= 0: pid=21751: Fri Oct 4 06:10:32 2024
write: IOPS=861k, BW=3362MiB/s (3526MB/s)(512GiB/155931msec); 0 zone resets
slat (usec): min=2, max=68208, avg=17.25, stdev=139.49
clat (nsec): min=352, max=33229k, avg=742.20, stdev=9781.68
lat (usec): min=3, max=68219, avg=17.99, stdev=140.40
clat percentiles (nsec):
| 1.00th=[ 442], 5.00th=[ 478], 10.00th=[ 498], 20.00th=[ 524],
| 30.00th=[ 540], 40.00th=[ 556], 50.00th=[ 596], 60.00th=[ 716],
| 70.00th=[ 828], 80.00th=[ 900], 90.00th=[ 988], 95.00th=[ 1064],
| 99.00th=[ 1400], 99.50th=[ 2352], 99.90th=[ 7840], 99.95th=[12480],
| 99.99th=[36608]
bw ( MiB/s): min= 2702, max= 4681, per=100.00%, avg=3377.29, stdev=17.98, samples=4940
iops : min=691777, max=1198349, avg=864584.21, stdev=4603.01, samples=4940
lat (nsec) : 500=10.05%, 750=52.11%, 1000=28.93%
lat (usec) : 2=8.29%, 4=0.38%, 10=0.17%, 20=0.05%, 50=0.01%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=6.31%, sys=84.74%, ctx=125002, majf=0, minf=176
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,134217728,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=3362MiB/s (3526MB/s), 3362MiB/s-3362MiB/s (3526MB/s-3526MB/s), io=512GiB (550GB), run=155931-155931msec
It’s clear that the Intel PMEM modules can preform well. But they are only preforming at a fraction of the capability as a SLOG wich make me thing that the default tuning parameter are too conservative.
If I try and change any of the zfs tunables I get permission denied errors:
root@truenas[~]# echo 25 >> /sys/module/zfs/parameters/zfs_dirty_data_max_percent
zsh: permission denied: /sys/module/zfs/parameters/zfs_dirty_data_max_percent
Any help would be appreciated. How can I change the tunables in TrueNAS Scale?
Thanks,
Simon