First, I would like to tell that I do know that dedup is a “unique” feature and should be used in moderation.
Our scenario is as follows. We have a server that hosts pretty much every used home directory. In total we have around 1000-3000 users where maybe 500 are semi active at a time.
I think we can save quite a bit of storage with deduplication on this set, as a lot of config and setup files would be same (imagine every user opens VSCode and similar).
Our array is made only with HDDs and I’m kind of worried that the performance impact from added dedup tables searches and array reads could be too big.
Do you think in this scenario, the impact on performance could be worth the actual storage saving, which could allow us to allocate more space for each user. We currently do not use ZFS on this set so evaluating if it would be ok or not might be more difficult on the spot.
In the old forum I wrote something that explains a bit about ZFS’ De-Dup, and what can make it successful. (Even though I try to discourage people from using De-Dup, the info is not meant to stop people… just get them thinking on the details.)
There is an enhancement of ZFS De-Dup coming out, Fast-DeDup. I am not sure when it will be available and if it makes ZFS’ De-Dup a better fit for some use cases. But, you can now keep an eye out for ZFS Fast De-Dup.
Just one bit would make a different file and prevent dedup, so it’s not obvious you’ll get savings from config files…
Dedup requires lots of RAM. You can alleviate some of this load (but probably not avoid a RAM upgrade entirely) with a dedup/special vdev (3-way or 4-way mirror of NVMe drives).
Budget allowing, I’d sugest you build a test server with ZFS, load a copy of your user data—or at least a representative sample—and use zdb to see what gains you could expect.
You should probably at least have a Special Metadata VDEV or a DDT VDEV (with the appropriate redundancy) if you want to do dedup on an HDD pool. The FDT Log made write performance better, but it’s still a lot for spinning disks to handle and it’s not like 1.6TB enterprise NVMe drives are prohibitively expensive.