Happy consumer NVME upgrade

This is just a happy report to say I went with two NVMEs in a mirror (for VMs/apps) and everything just worked on default settings. Replicated zvol, updated VM, turn on… very easy, even with encryption.

I was in the market for faster VM/App response. NVME and SSD pricing is now on par and I didn’t see any point in spending thousands for a faster pool of spinning rust.

** I’m not an expert so the following might not be the correct indicative tests. This is from the console, not inside the VMs **

NVME 1 x MIRROR | 2 wide

1mb write

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
random-write: Laying out IO file (1 file / 16384MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=2172MiB/s][w=2172 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=2021800: Thu Jun 13 12:48:20 2024
write: IOPS=2107, BW=2107MiB/s (2209MB/s)(124GiB/60346msec); 0 zone resets
slat (usec): min=11, max=1241, avg=43.96, stdev=17.99
clat (nsec): min=783, max=55954k, avg=424456.14, stdev=194368.81
lat (usec): min=326, max=56012, avg=468.41, stdev=196.35
clat percentiles (usec):
| 1.00th=[ 343], 5.00th=[ 359], 10.00th=[ 371], 20.00th=[ 383],
| 30.00th=[ 392], 40.00th=[ 404], 50.00th=[ 412], 60.00th=[ 424],
| 70.00th=[ 437], 80.00th=[ 453], 90.00th=[ 490], 95.00th=[ 523],
| 99.00th=[ 635], 99.50th=[ 693], 99.90th=[ 1319], 99.95th=[ 1582],
| 99.99th=[ 2278]
bw ( MiB/s): min= 1332, max= 2264, per=100.00%, avg=2120.34, stdev=104.83, samples=119
iops : min= 1332, max= 2264, avg=2120.34, stdev=104.83, samples=119
lat (nsec) : 1000=0.01%
lat (usec) : 2=0.01%, 250=0.01%, 500=92.04%, 750=7.79%, 1000=0.04%
lat (msec) : 2=0.10%, 4=0.01%, 10=0.01%, 50=0.01%, 100=0.01%
cpu : usr=11.07%, sys=1.60%, ctx=128141, majf=3, minf=2967
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,127154,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: bw=2107MiB/s (2209MB/s), 2107MiB/s-2107MiB/s (2209MB/s-2209MB/s), io=124GiB (133GB), run=60346-60346msec

128k write

fio --bs=128k --direct=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=write --numjobs=8 --ramp_time=5 --runtime=30 --rw=write --size=10G --time_based write: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32

random-write: (g=0): rw=randwrite, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
random-write: Laying out IO file (1 file / 16384MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=2172MiB/s][w=2172 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=2021800: Thu Jun 13 12:48:20 2024
write: IOPS=2107, BW=2107MiB/s (2209MB/s)(124GiB/60346msec); 0 zone resets
slat (usec): min=11, max=1241, avg=43.96, stdev=17.99
clat (nsec): min=783, max=55954k, avg=424456.14, stdev=194368.81
lat (usec): min=326, max=56012, avg=468.41, stdev=196.35
clat percentiles (usec):
| 1.00th=[ 343], 5.00th=[ 359], 10.00th=[ 371], 20.00th=[ 383],
| 30.00th=[ 392], 40.00th=[ 404], 50.00th=[ 412], 60.00th=[ 424],
| 70.00th=[ 437], 80.00th=[ 453], 90.00th=[ 490], 95.00th=[ 523],
| 99.00th=[ 635], 99.50th=[ 693], 99.90th=[ 1319], 99.95th=[ 1582],
| 99.99th=[ 2278]
bw ( MiB/s): min= 1332, max= 2264, per=100.00%, avg=2120.34, stdev=104.83, samples=119
iops : min= 1332, max= 2264, avg=2120.34, stdev=104.83, samples=119
lat (nsec) : 1000=0.01%
lat (usec) : 2=0.01%, 250=0.01%, 500=92.04%, 750=7.79%, 1000=0.04%
lat (msec) : 2=0.10%, 4=0.01%, 10=0.01%, 50=0.01%, 100=0.01%
cpu : usr=11.07%, sys=1.60%, ctx=128141, majf=3, minf=2967
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,127154,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: bw=2107MiB/s (2209MB/s), 2107MiB/s-2107MiB/s (2209MB/s-2209MB/s), io=124GiB (133GB), run=60346-60346msec

4k write

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1 random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=118MiB/s][w=30.2k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=2974296: Thu Jun 13 21:23:43 2024
write: IOPS=29.1k, BW=114MiB/s (119MB/s)(6826MiB/60017msec); 0 zone resets
slat (nsec): min=1209, max=1374.8k, avg=2701.52, stdev=3603.58
clat (nsec): min=327, max=11684k, avg=30663.31, stdev=27824.84
lat (usec): min=16, max=11687, avg=33.36, stdev=28.46
clat percentiles (usec):
| 1.00th=[ 17], 5.00th=[ 20], 10.00th=[ 21], 20.00th=[ 21],
| 30.00th=[ 23], 40.00th=[ 25], 50.00th=[ 26], 60.00th=[ 28],
| 70.00th=[ 32], 80.00th=[ 37], 90.00th=[ 43], 95.00th=[ 57],
| 99.00th=[ 101], 99.50th=[ 123], 99.90th=[ 206], 99.95th=[ 285],
| 99.99th=[ 799]
bw ( KiB/s): min=55392, max=141464, per=100.00%, avg=116487.60, stdev=16070.24, samples=119
iops : min=13848, max=35366, avg=29121.93, stdev=4017.55, samples=119
lat (nsec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (usec) : 2=0.01%, 4=0.01%, 20=9.78%, 50=83.81%, 100=5.37%
lat (usec) : 250=0.97%, 500=0.04%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
cpu : usr=14.77%, sys=11.04%, ctx=1748912, majf=0, minf=767
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,1747419,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

In comparison to my 3 x MIRROR | 2 wide spinning rust:

128k write

fio --bs=128k --direct=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=write --numjobs=8 --ramp_time=5 --runtime=30 --rw=write --size=10G --time_based write: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32
fio-3.33
Starting 8 processes
write: Laying out IO file (1 file / 10240MiB)
write: Laying out IO file (1 file / 10240MiB)
write: Laying out IO file (1 file / 10240MiB)
write: Laying out IO file (1 file / 10240MiB)
write: Laying out IO file (1 file / 10240MiB)
write: Laying out IO file (1 file / 10240MiB)
write: Laying out IO file (1 file / 10240MiB)
write: Laying out IO file (1 file / 10240MiB)
Jobs: 8 (f=8): [W(8)][7.1%][w=193MiB/s][w=1545 IOPS][eta 07m:51s]
write: (groupid=0, jobs=8): err= 0: pid=2953199: Thu Jun 13 21:12:11 2024
write: IOPS=1342, BW=169MiB/s (177MB/s)(5093MiB/30158msec); 0 zone resets
slat (nsec): min=1420, max=340041, avg=9574.50, stdev=5100.27
clat (msec): min=115, max=313, avg=189.76, stdev=17.75
lat (msec): min=115, max=313, avg=189.77, stdev=17.75
clat percentiles (msec):
| 1.00th=[ 161], 5.00th=[ 165], 10.00th=[ 167], 20.00th=[ 171],
| 30.00th=[ 176], 40.00th=[ 186], 50.00th=[ 192], 60.00th=[ 199],
| 70.00th=[ 203], 80.00th=[ 207], 90.00th=[ 209], 95.00th=[ 215],
| 99.00th=[ 232], 99.50th=[ 234], 99.90th=[ 243], 99.95th=[ 249],
| 99.99th=[ 305]
bw ( KiB/s): min=131360, max=205621, per=99.93%, avg=172805.67, stdev=2663.41, samples=480
iops : min= 1026, max= 1606, avg=1349.83, stdev=20.81, samples=480
lat (msec) : 250=100.48%, 500=0.04%
cpu : usr=0.28%, sys=0.04%, ctx=10225, majf=0, minf=1760
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=2.3%, 16=72.7%, 32=25.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=93.7%, 8=3.4%, 16=2.2%, 32=0.6%, 64=0.0%, >=64=0.0%
issued rwts: total=0,40497,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
WRITE: bw=169MiB/s (177MB/s), 169MiB/s-169MiB/s (177MB/s-177MB/s), io=5093MiB (5341MB), run=30158-30158msec

4k write

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1 random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=2970410: Thu Jun 13 21:21:43 2024
write: IOPS=11.3k, BW=44.2MiB/s (46.4MB/s)(2858MiB/64624msec); 0 zone resets
slat (nsec): min=1511, max=1243.7k, avg=3526.70, stdev=3967.72
clat (nsec): min=558, max=31869k, avg=77175.20, stdev=82787.54
lat (usec): min=18, max=31872, avg=80.70, stdev=83.09
clat percentiles (usec):
| 1.00th=[ 25], 5.00th=[ 30], 10.00th=[ 32], 20.00th=[ 35],
| 30.00th=[ 38], 40.00th=[ 41], 50.00th=[ 46], 60.00th=[ 62],
| 70.00th=[ 102], 80.00th=[ 127], 90.00th=[ 159], 95.00th=[ 176],
| 99.00th=[ 237], 99.50th=[ 277], 99.90th=[ 545], 99.95th=[ 717],
| 99.99th=[ 1319]
bw ( KiB/s): min=17440, max=90616, per=100.00%, avg=48880.01, stdev=18046.76, samples=119
iops : min= 4360, max=22654, avg=12220.00, stdev=4511.69, samples=119
lat (nsec) : 750=0.01%, 1000=0.01%
lat (usec) : 2=0.01%, 4=0.01%, 20=0.09%, 50=54.58%, 100=14.42%
lat (usec) : 250=30.06%, 500=0.72%, 750=0.08%, 1000=0.02%
lat (msec) : 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=7.50%, sys=5.46%, ctx=733113, majf=0, minf=572
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,731549,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: bw=44.2MiB/s (46.4MB/s), 44.2MiB/s-44.2MiB/s (46.4MB/s-46.4MB/s), io=2858MiB (2996MB), run=64624-64624msec

And then a regular 2 x RAIDZ2 | 6 wide spinning rust

4k write

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1 random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][w=5868KiB/s][w=1467 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=2981435: Thu Jun 13 21:27:37 2024
write: IOPS=4870, BW=19.0MiB/s (19.9MB/s)(1170MiB/61498msec); 0 zone resets
slat (nsec): min=1300, max=1915.9k, avg=4056.40, stdev=5852.87
clat (nsec): min=645, max=25377k, avg=194289.09, stdev=246153.30
lat (usec): min=16, max=25386, avg=198.35, stdev=246.88
clat percentiles (usec):
| 1.00th=[ 21], 5.00th=[ 24], 10.00th=[ 26], 20.00th=[ 31],
| 30.00th=[ 38], 40.00th=[ 131], 50.00th=[ 151], 60.00th=[ 176],
| 70.00th=[ 243], 80.00th=[ 334], 90.00th=[ 412], 95.00th=[ 469],
| 99.00th=[ 1237], 99.50th=[ 1909], 99.90th=[ 2376], 99.95th=[ 2507],
| 99.99th=[ 3294]
bw ( KiB/s): min= 7288, max=34976, per=100.00%, avg=19936.61, stdev=7481.82, samples=119
iops : min= 1822, max= 8744, avg=4984.15, stdev=1870.45, samples=119
lat (nsec) : 750=0.01%, 1000=0.01%
lat (usec) : 2=0.01%, 4=0.01%, 20=0.67%, 50=36.69%, 100=1.80%
lat (usec) : 250=31.62%, 500=25.80%, 750=2.27%, 1000=0.08%
lat (msec) : 2=0.65%, 4=0.40%, 10=0.01%, 20=0.01%, 50=0.01%
cpu : usr=4.10%, sys=3.08%, ctx=300162, majf=0, minf=548
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,299530,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: bw=19.0MiB/s (19.9MB/s), 19.0MiB/s-19.0MiB/s (19.9MB/s-19.9MB/s), io=1170MiB (1227MB), run=61498-61498msec

For PCIE3 and Samsung 980’s, I expected well below max speeds due to many negative reports of ZFS and NVMEs but was pleasantly surprised. It’s fast enough for our usage.

I know the NVMEs are consumer grade. Five year warranty, Amazon NBD delivery (for spare) and replication to spinning rust (on top of normal backup) will help with that. Worst case scenario is I change the VM to point back to the spinning rust. If I need more space or IOPS, I can still add a few more NVME mirrors. If I just want more storage, I can switch to larger NVMEs or move to bifurcation.

With 384GB RAM, it’s arguable whether there’s any practical benefit, but it does make full reboots faster :slight_smile:

So if you’re on the fence about NVMEs for your apps/VMs, ensure you understand your hardware risk tolerance and do your NVME research (google SSD tester), i.e. don’t just buy the cheapest NVME which are usually the slowest. Plan for component failure just like you would with rust. If it fits your purpose, go for it.

Warranty is important to me. In Australia, the only two manufacturers with local warranty (and therefore, generally required by law, for manufacturer to pay shipping on warranties) are Samsung and Lexar. At the 1TB mark (current pricing), it’s not worth the shipping for warranty for any international manufacturer. Therefore, it’s either pay Samsung tax (1.5*) with a known good warranty process, be a Guinea pig for Lexar warranty, or don’t expect any warranty beyond 90 days. I’m happy to pay the Samsung tax for a proper 5yr warranty. For those in the states, you have plenty of other manufacturer options with domestic warranty.

Many thanks to those that paved my NAS journey with so much helpful information in the forums, and of course IX’s ongoing development.

2 Likes

Can you provide the model number of the PCIe to M.2 interface?

I believe they only have one model of PCIe/m2 interface so if you look up Amazon in your region, the same one should either come up else they don’t have a presence there.

Direct AU Amazon link

Or do you want the exact model printed on the board? Or something I can grab from console?