Should I consider Dedup for our server storing home directories

First, I would like to tell that I do know that dedup is a “unique” feature and should be used in moderation.

Our scenario is as follows. We have a server that hosts pretty much every used home directory. In total we have around 1000-3000 users where maybe 500 are semi active at a time.

I think we can save quite a bit of storage with deduplication on this set, as a lot of config and setup files would be same (imagine every user opens VSCode and similar).
Our array is made only with HDDs and I’m kind of worried that the performance impact from added dedup tables searches and array reads could be too big.

Do you think in this scenario, the impact on performance could be worth the actual storage saving, which could allow us to allocate more space for each user. We currently do not use ZFS on this set so evaluating if it would be ok or not might be more difficult on the spot.

I can’t comment on using De-Dup for your purpose.

In the old forum I wrote something that explains a bit about ZFS’ De-Dup, and what can make it successful. (Even though I try to discourage people from using De-Dup, the info is not meant to stop people… just get them thinking on the details.)

There is an enhancement of ZFS De-Dup coming out, Fast-DeDup. I am not sure when it will be available and if it makes ZFS’ De-Dup a better fit for some use cases. But, you can now keep an eye out for ZFS Fast De-Dup.

Just one bit would make a different file and prevent dedup, so it’s not obvious you’ll get savings from config files…
Dedup requires lots of RAM. You can alleviate some of this load (but probably not avoid a RAM upgrade entirely) with a dedup/special vdev (3-way or 4-way mirror of NVMe drives).

Budget allowing, I’d sugest you build a test server with ZFS, load a copy of your user data—or at least a representative sample—and use zdb to see what gains you could expect.

1 Like

Would BRT/Block Clone work in this case? (just talking crap… hehe)

Wondering if it’d be worth reaching out to IX for a support call so they could advise implementation since this’d be production.

That’s definitely a good idea if this is an enterprise-licensed system.

Out of interest do you quota each user to stop them running away with all the space?

You should probably at least have a Special Metadata VDEV or a DDT VDEV (with the appropriate redundancy) if you want to do dedup on an HDD pool. The FDT Log made write performance better, but it’s still a lot for spinning disks to handle and it’s not like 1.6TB enterprise NVMe drives are prohibitively expensive.