16 MiB recordsize?!?!?!?! <--- clickbait punctuation marks for the algorithm

winnielinnie · September 11, 2024, 8:56pm

This is one reason why a 16M recordsize (or anything above 4M) should probably be avoided. It’s still unclear if this poses an issue for modern versions of Linux and/or FreeBSD.

But I think why someone might see a diminishing return (or even degraded performance) with recordsizes above 4M is likely due to a simpler reason: parallel processing

As far as I understand with ZFS, computing hashes and (de)compression is a single-threaded per-block operation. However, ZFS can process multiple blocks in parallel, which essentially gives you the same performance benefits as outright multithreading.

Compare these scenarios.

Scenario A, 1M recordsize:
To read or write a 16 MiB file, you have 16 checksum operations.

Scenario B, 2M recordsize:
To read or write a 16 MiB file, you have 8 checksum operations.

Scenario C, 4M recordsize:
To read or write a 16 MiB file, you have 4 checksum operations.

Scenario D, 16M recordsize:
To read or write a 16 MiB file, you have 1 checksum operation.

The “sweet spot” for modern multi-core CPUs (and/or with “hyperthreading”) may indeed fall somewhere between a 1M and 4M recordsize, so that there are enough parallel operations, per processed file, to leverage the CPU more efficiently.

*The above would also apply to (de)compression and encryption.

If you have a 16 MiB file, it might sound “better” if it’s only comprised of a single 16 MiB block, and hence you set the recordsize to 16M. However, it’s likely better to have this split into 4 or 8 blocks, to leverage your CPU’s multi-core and hyperthreading features, in which it can process the checksums, compression, and/or encryption with better performance.

Someone like @HoneyBadger is going to jump in here and tell me how wrong I am. I can take it. I’m not insecure…