In TrueNAS Tech Talk Episode 12, Fangtooth BETA1 is code-frozen and preparing for February, Chris Compares Compression, and Kris gets on his soapbox to talk about the world of open source licensing, BSD vs GPL, and sheds a little more light on why TrueNAS went even more open source than before.
Kris you did a great job explaining the difference between the BSD and GPL licenses.
I do also appreciate the honesty about the “churn” that is taking place with Scale. I’m still holding off for a bit waiting for the dust to settle before I make the inevitable switch from Core to Scale so please continue to provide updates on the progress.
What if it never does?
Oh, it’ll settle for sure The “core” NAS functionality has been stable for some time now, its mostly Apps/VMs/LXC that’s been “churny” and will be settling down nicely with the release of FT. Any changes coming to those areas will be very targeted and incremental for future releases for the next long while.
I would like to add some nuance to what @HoneyBadger said about inline ZFS compression. Which was a nice overview, and really did shine a light on how LZ4 is taken for granted… being so fast that it’s used as a near zero-cost heuristic for ZSTD’s early abort!
1. To show the different decompression speeds is only half the story. It’s quite possible that using ZSTD compression on a dataset with highly compressible files on spinning HDDs can increase performance. If smaller blocks need to be pulled from disk, you’re addressing the main bottleneck. (For the CPU and RAM to handle decompression is worlds faster than having to retrieve larger amounts of uncompressed data from the spinning HDDs.) This is possibly a rare use-case, but it illustrates that with HDDs, ZSTD can theoretically go beyond “space saving” benefits.
2. Even with a dataset that stores uncompressible data (e.g, video files, as an example given by @HoneyBadger), you’ll still want to enable at minimum LZE or LZ4 compression[1]. This is especially important for datasets with large recordsizes. Any form of compression will squish the extraneous “padding” of null bytes on the last block of a file that does not fill the recordsize.
3. For @HoneyBadger to say “Zstandard” is inefficient and time-consuming. When speaking, it’s better to say “ZSTD”. “Zstandard” is 9 letters long. “ZSTD” is only 4 letters.
Preferably LZ4, for the added benefit that you still might store compressible files on the same dataset at a later date. ↩︎
I don’t know about compression but this post could definitely stand to deduplicate its @ tags
-
It’s really a matter of “test with your data” - but it’s a significant difference in the speed of decompression between LZ4 and ZSTD, reflected not just as bandwidth but latency.
-
For a dataset that stores incompressible data (like the video files in question) then ZSTD is significantly worse unless conditions are such that the you’ve got the early-abort functionality properly triggering (zstd >= 3) it will just blindly try to compress the file to no effect burn CPU cycles.
-
I actually said ZSTD, you just transparently decompressed it to “Zstandard”
You could mirror what the c++ folks do and pronounce it “stid”
Hehe.
Zed-stid.
To be fair, Facebook’s own benchmarks show that with decent multicore CPUs, ZSTD’s decompression speeds are faster than spinning HDD read speeds. This at least adds credence to “decompressing with CPU/RAM is faster than reading more data from HDDs”. (Maybe not for the higher ZSTD levels? The neat thing with ZSTD is that there’s less of a spread between its “levels” when it comes to decompression speeds, unlike with compression.)
However, I agree that unless your files compress substantially into smaller blocks, you won’t reap much benefit from ZSTD compression.
To add another spin, at least with ZSTD’s “early abort”, such uncompressible files will not even be saved as compressed blocks on the storage media.
That’s why I named LZ4 as the best option to use for the two reasons: “null byte padding” and “you might later store compressible files in the future”. I agree that ZSTD should be outright avoided for multimedia datasets.
Ie zed-stid?
I want to thank @HoneyBadger and @Stux for replying to me, @winnielinnie, on this important topic covered by @HoneyBadger on the T3 podcast hosted by @kris and @HoneyBadger.
@