I am currently performing several tests on a TrueNAS system 25.04.2.6 to evaluate the behavior and efficiency of ZFS deduplication and compression in a scenario where the same data is shared via both SMB and NFS.
System configuration
Platform: TrueNAS
ZFS Pool: pool_main (approximately 191 GiB usable capacity as shown in GUI and zfs list)
Datasets:
pool_main/smb_client (shared via SMB to Windows Server 2019)
pool_main/nfslinux (shared via NFS to Rocky Linux)
Both datasets contain exactly the same files.
ZFS settings
Deduplication: enabled (dedup=on)
Compression: enabled (compression=lz4)
Recordsize tested: 128K, 64K, 4K
Using CLI tools, I can clearly see that both deduplication and compression are working correctly. For example:
zfs list shows a difference between used and logicalused
zdb -DD pool_main reports values such as:
dedup = 1.96
compress = 1.59
dedup * compress = 3.12
This confirms that ZFS is effectively saving space thanks to deduplication and compression.
Question / clarification
From the TrueNAS GUI, however, I can only see:
Total used space
Available space
Compression enabled/disabled
But I cannot find any dashboard, metric, or report that shows:
The actual deduplication ratio
The real space saved by deduplication
The combined effect of deduplication + compression
All this information is only visible via CLI (zdb, zfs get, etc.).
My questions are:
Is this expected behavior (i.e. the TrueNAS GUI does not expose deduplication statistics by design)?
Is there any recommended way (plugin, chart, reporting feature, API, etc.) to visualize deduplication efficiency directly from the GUI?
Are there technical reasons (performance, RAM usage, DDT scanning cost) why deduplication metrics are not exposed in real time?
In short, I would like to understand whether the lack of deduplication visibility in the GUI is a limitation of the current TrueNAS interface or an intentional design choice.
Obviously yes, though this is possibly oversight rather than purposeful design.
A key point is that dedup is a resource hog, and generally best avoided.
We’ve seen some pretty bad cases, including one where a damaged pool could not be imported for repair because ZFS needed more RAM for that than the platform could support.
If this ratio is representative of actual data, it is too low to justify dedup.
Why do you need two protocols for sharing? Linux can use SMB.
Or why don’t you use a multi-protocol share?
I have three server systems that generate physical files, each with a size greater than 64 KB, and on average around ~500 KB. Consider that each server currently has about 10 million files in its local directory.
What I didn’t mention before is that these three servers (two Linux and one Windows) actually generate the same files, as they are part of an application cluster.
I was looking for a way to centralize storage, and ZFS was recommended to me mainly for its compression and deduplication features. That’s how I ended up with TrueNAS as a solution, since it’s relatively simple to manage.
My idea was to create a single pool and three datasets (one for each server) to share, where the servers would write their files.
This is where I do not understand what’s going on.
Three servers independently generating fully identical files? Similar files?
Can’t they all share a common repository?
Different servers accessing different shares is NOT “the same data” in my book.
Anyway, your dedup ratio is too low to justify using dedup. And the amount of data does not seem right.
5 M files * 15 kB = 750 GB of data.
Did I skip some zeroes? That looks like a tiny amount by modern standards, why bother trying to save space with dedup? (Compression is free, dedup is NOT.)
These are 3 application servers in a cluster.
Each node independently generates the same attachment objects as part of the application logic. The files are bit-identical across nodes.
The goal is to move from three local storages (~5 TB logical total) to a centralized ZFS pool where each node writes to its own dataset, and let ZFS deduplicate identical blocks.
This is not about “similar data” or user shares: it is deterministic identical output from a clustered application, which is precisely a valid use-case for block-level deduplication.
Also, just to clarify the scale: the current dataset is ~5 TB per node, so ~15 TB logical in total.
With the measured dedup ratio (~3:1),
In my actual tests zdb -DD reports a dedup ratio close to 3:1, so the real physical footprint is ~5 TB vs ~15 TB logical.
This suggests that the application output is almost perfectly identical at block level, making this workload very well suited for deduplication.
I’m aware of the RAM and DDT implications of ZFS dedup, and this is why I’m testing it in a controlled environment. The main point of the thread, however, was about observability of dedup/compression metrics in TrueNAS GUI vs CLI, not about whether dedup should be used generically.
I totally do not get the logic of generating the same data thrice to end up storing everything as a single copy in a single place…
Anyway, the old rule of thumb would say that you need an extra 5 GB/TB*5 TB = 25 GB RAM, which is just about as expensive as having an extra 10 TB of HDD storage.
Whatever. It’s your system, your data, your money—and your time to set it up.
Interesting conversation, but this is the one point I do not get:
@Daniele_Grillo mentioned a dedup factor of 1.96. That’s for all practical considerations 2.
So with dedup the storage requirements are cut in half.
2 is one order of magnitude for computer scientists and a considerable saving. I mean, being able to store twice as much X while paying for only one X …
Are the extra memory resources and delayed writes really worth it, when you can achieve decent space savings for (almost) free with good compression and block-cloning?
If someone needs to pay more money to upgrade their RAM (or even add a special vdev) in order to compensate the resource requirements of their growing DDT, could they have simply upgraded their storage capacity instead and combined it with the (almost) free space savings they get from compression + block-cloning?
OK, but there might be more than an irrelevant number of situations where you have CPU and memory galore but the chassis takes only “N” drives. Factor of 2 looks pretty relevant to me.
Seriously - ignoring the trends of the last couple of weeks - CPU and memory is what has become exponentially cheaper or more powerful for the same money while storage per dollar increases way more linearly.
My Xeon-D main workhorse FreeBSD TrueNAS is still just powerful beyond any real challenge. Any new jail - yeah, spin it up. The system is twiddling its thumbs, anyway.
You’re not missing anything. From a pure storage efficiency point of view, a 2:1 dedup ratio is already a significant gain (my first test environment)
The controversy usually comes from the operational side: in ZFS, dedup has non-trivial costs (RAM for the DDT, performance impact, and recovery complexity), so many admins recommend using it only when ratios are much higher (5:1, 10:1, etc.).
In my case the data is produced by a clustered application that generates identical attachments on multiple nodes, and real-world tests showed ratios closer to 3:1. Given that, and with a special vdev for metadata/DDT, the trade-off is acceptable for our environment.
That’s exactly the trade-off, and I agree in general.
For many workloads, compression + block cloning already give most of the benefit at almost zero cost.
In my case, however, the data is produced independently by multiple nodes of a clustered application. The files are identical in content but not cloned or shared at filesystem level, so block cloning cannot help. Attachments are also already compressed, so lz4 has limited effect.
This is one of the few scenarios where ZFS dedup actually addresses a real problem that cannot be solved by other mechanisms.
Yes, I understand it may seem odd, but we have an application system cluster where each node generates the same attachments independently. Centralizing storage with ZFS deduplication allows us to save space without rewriting the application logic. The dedup ratio has been measured at ~3:1, so the RAM investment is justified for our scenario. Thanks for your input.