Unexpectedly large downside, small upside to bs=1M?

I’m in the process of setting up TrueNAS for multipurpose use. The majority of the content will be media, though I’ll also host SMB (with various/unknown file sizes), several jails, apps, and potentially a VM on the same system. I know that SSDs are preferred for fast IOPS with apps and VMs but I’m already way over budget and will have to make do until I can create a different SSD pool at a later date. As a stop-gap, I got 3 SSD drives to use for metadata, setting up a “fusion” pool to try to increase the IOPs for smaller files.

CPU: Xeon E-2334
OS: NVMe
RAM: 64GB
Pool:

  • Data: (5) Exos 20TB in RaidZ2 (53.49 TiB)
  • Metadata: (3) Samsung 870 EVO in 3-wide mirror

I’ve read online that VMs, Apps, and databases should have bs=128k, while SMB and Media should have bs=1M, which will improve performance for large files at a slight cost to space (which can be partially negated through file compression). Since I’m setting up a fusion pool with special metadata, my expectation was that I would configure various datasets as follows:

Use case Block Size Special Metadata
VMS, Apps bs=128k bs=64k
SMB, Media bs=1M bs=512k

Given that this system will have both large and small files, I wanted to use fio to test different configurations to compare performance. I ran four sets of fio tests in read/write combinations of both bs=128k and bs=1M

# Test random reads of smaller files (bs=128k)
fio --name=random-read128k --ioengine=posixaio --rw=randread --bs=128k --size=16g --numjobs=4 --iodepth=8 --runtime=60 --time_based --end_fsync=1
# Test random reads of larger files (bs=1M)
fio --name=random-read1M --ioengine=posixaio --rw=randread --bs=1M --size=16g --numjobs=4 --iodepth=8 --runtime=60 --time_based --end_fsync=1
# Test random writes of smaller files (bs=128k)
fio --name=random-write128 --ioengine=posixaio --rw=randwrite --bs=128k --size=16g --numjobs=4 --iodepth=8 --runtime=60 --time_based --end_fsync=1
# Test random writes of larger files (bs=1M)
fio --name=random-write1M --ioengine=posixaio --rw=randwrite --bs=1M --size=16g --numjobs=4 --iodepth=8 --runtime=60 --time_based --end_fsync=1

While I did see small performance improvements between bs=128k and bs=1M, I was surprised by the huge performance decrease for small files with bs=1M. Results are below, with best speeds in bold and second best italicized.

DataSet
Block Size
Metadata Special Small
Block Size
fio randread
bs=128k
fio randread
bs=1M
fio randwrite
bs=128k
fio randwrite
bs=1M
128k None N/A 6085MiB/s 5992MiB/s 602MiB/s 558MiB/s
128k Yes 0k 8794MiB/s 8349MiB/s 575MiB/s 557MiB/s
128k Yes 64k 9791MiB/s 8347MiB/s 565MiB/s 573MiB/s
1M None N/A 1768MiB/s 9025MiB/s 253MiB/s 633MiB/s
1M Yes 128k 1973MiB/s 8876MiB/s 291MiB/s 659MiB/s
1M Yes 512k 1994MiB/s 9050MiB/s 249MiB/s 653MiB/s

It’s possible that I need to tweak my tests to get a more accurate picture of how the different pools will perform with different file sizes. But it looks to me like I’d be better off setting all datasets to have bs=128k, with special metadata bs=64k. This has a minor performance drop for large files, but a significant improvement for smaller files. (The media dataset is mostly video, but will have some small files (photos, video metadata, etc) as well)

Does anyone see anything I’ve done wrong (bad fio tests, or misreading data)? Any other suggestions?

Thanks for your help!
~Dean

Media would normally be read sequentially.

If you’re reading 128KB from a random point using 1MB bs then you have to read a random 1MB to extract the 128KB (or two 1MB blocks)

Also, when reading a random 1MB you actually have to read 2MB to get the first and 2nd halves.

This is going to slow down your overall read speed since you’ll be reading much more from disk.

So, if your use case is randomly reading frames from your media, then yeah, maybe don’t use 1MB block sizes.

Alternatively, try some streaming copies to benchmark your media speed.

1 Like

Thanks @Stux - that’s exactly what I needed! I re-ran the tests with sequential reads/writes for both 128k/64k and 1M/512k and saw the speed improvements I was expecting!

It’s good to feel more confident in the dataset configuration(s) before I start transferring data!