Striped SLOG has same IOPS as mirror or single drive

I’ve been trying to track down an issue I’ve been having where I don’t think I’m getting the expected performance when doing sync writes.

System:
HPE DL380 Gen9
2x Xeon E5-2680 v4
384GB DDR4 RAM
Pool drives: 21x 1.92TB Ultrastar SSD1600MR arranged as 3x RAIDZ2 vdevs (7 drives in each vdev)
SLOG drives: 2x Optane 900P 280GB
TrueNAS 13.0-U6.2

I have the Optane drives added as a striped SLOG and have been under the impression that this would give me higher IOPS than a single drive would. (I’m aware of the risks involved in having no redundancy on the SLOG. I have backups and for my use case it’s a risk I’m prepared to take)

If I create a dataset with sync=always and run this fio benchmark:

fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=16 --runtime=30 --time_based

I get these results:

No slog: 19k iops 76mb/s
1x Optane slog: 30k iops 117mb/s
2x Optane mirror slog: 29k iops 115mb/s
2x Optane stripe slog: 30k iops 118mb/s

I understand why the mirror performs the same as the single drive, but I don’t get why the stripe also has the same performance.

I decided to do some more testing and removed the Optane drives from the pool. I created some new test pools which contained only the Optane drives, ran the same fio test, and got these results:

1x Optane pool:
sync=disabled 49k iops 197mb/s
sync=always 29k iops 117mb/s
2x Optane mirror pool:
sync=disabled 47k iops 190mb/s
sync=always 28k iops 113mb/s
2x Optane stripe pool:
sync=disabled 66k iops 264mb/s
sync=always 29k iops 116mb/s

I see the same result here, where if I’m doing sync writes, the stripe has identical performance to a mirror or a single disk. It does have higher performance when doing async writes, but not sync writes.

Is this expected behaviour? I’ve always been under the impression that a stripe significantly increases your IOPS. If this is expected then does it mean that a striped slog is totally pointless and there’s literally zero reason to do it?

I’m guessing your bottleneck is not actually the Optanes.

Try using RAM disk as a slog.

(Note: for experiments, not production!)

2 Likes

Likely unrelated: the POSIX engine produces asynchronous I/O using aio_read(3) and aio_write(3). Reason tells me you should test synchronous writes, although I don’t know if this matters with the dataset’s propriety sync=always.

Good spot. I’m not overly familiar with fio and copied that command from somewhere. I’ve run it again with the default psync engine which gives different raw values but the same issue where performance-wise stripe == mirror.

Interesting. You’re right, I do have the same bottleneck with a RAM disk slog. How would I determine where the bottleneck is? I’m not maxing out the CPU during writes. Using the original posixaio fio command the CPU sits at around 70% on all cores during writes. With the psync fio engine that I retested it with it only gets to around 25-30%, so still heaps of headroom left.