New Pool and Data Compression Issues

In the process of migrating data to a brand new pool on the latest version of TrueNAS Electric Eel, I found that the data on the receiving end was not always written to the pool as compressed despite the settings being appropriately set.

The destination pool has been built using both TrueNAS BSD 13.0u6.4 and Scale (ElectricEel-24.10.1) with the same results on both.
The destination system has second gen 32 core AMD Epyc and 2x raidz2 vdevs of 6 disks each.
The settings used to create the new dataset in the webGUI under Scale are:
Dataset Preset: SMB
Quotas are left at default.
Encryption: On (Inherit)
Sync: Inherit (Standard)
Compression Level: Inherit (ZSTD-19)
Enable Atime: Inherit (Off)
ZFS Dedupe: Inherit (Off)
Case Sensitivity: Insensitive
Checksum: Inherit (On)
Read-Only: Inherit (Off)
Exec: Inherit (On)
Snapshot Directory:
Snapdev: Inherit (Hidden)
Copies: 1
Record Size: : Inherit (1MiB) – I have tried 1MiB, 4MiB, and 16MiB with the same results
ACL Mode: Restricted
Metadata (Special) Small Block Size: Inherit (0)

The original pool was originally built on TrueNAS 11.x or early 12.x with similar settings as above. The origin pool is currently a 4x disk raidz1 on ElectricEel-24.10.1 and the ZFS flags have not been updated since before migrating to Scale. The method of transfer is a Teracopy file transfer on windows over SMB via a 100Gb Ethernet link. One of the hopes of doing it this way was to obtain better over all compression for all of the data. I am open to using other tools (incl. zfs send/recv) to achieve the desired result.
The data in question is is a mix of audio, video, text files, office docs, executables, archives, etc… Data on the origin pool manages to have better compression than on the new pool with the same settings in the webgui. Data on the destination pool is often larger than the same exact data on the origin pool.
An example directory of 24 files totaling out to 9.41GB would be written out to the origin pool as 9.13GB and on the destination pool as 9.40GB.

In an effort mitigate this issue, several settings have been applied to no avail:
Under “Init/Shutdown Scripts” > Pre-Init Command
echo 0 > /sys/module/zfs/parameters/zstd_earlyabort_pass
and/or
echo 1048576 > /sys/module/zfs/parameters/zstd_abort_size
or
echo 0 > /sys/module/zfs/parameters/zstd_abort_size

CPU usage would ramp up, but the data would still be written with minimal at best to worse space savings.

  1. What is your evidence that compression is not being applied?
  2. What made you choose ZSTD-19 compression over the default LZ4?
  3. What makes you think that the majority of your media files are compressible?
  4. Changing zfs tuneables is not IMO a good idea unless you are an expert and understand exactly what you are doing and the consequences and side effects.
  5. IMO you don’t want 1MB or larger record sizes for smaller files.
  6. I don’t think the method of sending the data to the pool should make a difference to compression - SMB from Explorer or Terracopy shouldn’t make any difference. ZFS replication works at a block level, so I am not sure whether source compression overrides destination compression.
1 Like

Most of that is not really compressible, and what’s compressible likely represent a small part of the total.
It is to be expected that the same data will take up more space on raidz2 than raidz1, due to padding for the higher parity level.

With ZSTD-19? You bet!!!

1 Like

Don’t go around fiddling with module parameters without good reason.

This is why your CPU usage spiked up.

@Protopia

  1. Hashes match and the whole nine yards…
    1a. Origin File:
    Origin File
    1b. Destination File:
    Destination File
  2. Space savings are much more important than speed. Ironically, both datasets are set to zstd-19 and use 1MiB Record sizes, the Origin Dataset gets better compression and faster writes.
  3. See below images:
    OriginDSMP4
    OriginDSFLAC
    OriginDSMKV

Data written to the new DS is either larger than the original or no where close compression wise:
DestDSMP4
DestDSFLAC
DestDSMKV

  1. Ideally, better compression is the goal. None of the “Smart ZSTD compression” behaviors introduced in zfs v2.2.
  2. I’ll be on the lookout for that.
  3. There are options to modify zfs send recv behavior in regards to in transit behavior – something I am still investigating. My concerns in regards to compression still exist when the data is written to disk.

That is helpful. The above images better illustrate my issue.

I am aware and this is intended. I am mentioning this because the CPU legwork is being done the the fruits of those cycles are not being written to disk.

How would one force compression to occur? In previous versions of True NAS (Pre Scale - as far as I can recall), this was as simple as configuring the pool\dataset option from the webGUI.

I am fine with the increased utilization – My issue is that files that I would like be compressed are taking more space on the new pool than they were on the original.

Indeed. Typical Z2 vs. Z1 (Z3 would be even worse). WIth these poorly compressible files, padding actually tilts you the wrong way…

Compression IS taking place, but it’s inefficient here (except maybe on some large office documents or executables). If you have a conveniently small sample dataset to test (or just enough space), replicate it to a new dataset with no compression (untick the box for full file system replication, which would override the destination settings) and compare again.

1 Like

TIL that ZFS padding screws around with file size and adds icing to the compression ration cake…
Never did I realize that it would be like this lol :sweat_smile::joy:
Thank you for the help and the pointer.

Tried something similar… Created a new 4 disk raidz1 pool with the same settings as the origin pool and copied files to it. Now getting the same file sizes as before. So padding eh…?

1 Like

I would undo the changes you made to the module parameters, and let ZSTD early abort do its thing, for datasets that you set ZSTD-9, -19, or whatever.

If you have a dataset that you know will mainly store multimedia files? There’s no point, even with ZSTD early abort, to use any ZSTD compression level. Just leave it at the default of LZ4[1] or use ZLE.


  1. The reason to use at least LZ4 or ZLE is to squish the null bytes at the end of the last block of a file. LZ4 is so fast anyways, that it’s not really a cost to keep it enabled for the chance of saving compressible files in that dataset. ↩︎

2 Likes

@winnielinnie It is generally known that attempting to compress already compressed files like video files will make either little difference or make the files larger. Data for video or some types of image files is already highly compressed. Compression only works because there is actual redundant data in most uncompressed files, since video and many image formats are already compressed there is no or little redundant data left to compress.

Would any compression even LZ4 be advisable if the data being saved in Truenas is primarily video and/or compressed (not raw) image files?

Yes. LZ4 will not compress anything that is uncompressible, due to its early-abort heuristic. (Practically all blocks of a video file will be written to disk uncompressed, anyways.) Yet you still benefit from setting (leaving the “default”) LZ4 compression for a “multimedia” dataset because:

  1. It will remove the “padding” of null bytes at the end of the last block in a file. (Especially important when using a large recordsize.)
  2. There’s still a chance you might later save compressible files to this dataset, in which you don’t need to worry about changing a dataset property back and forth.
2 Likes