Fast Dedup uses... magic?

If you’re reading this thread because of my impeccable click-bait game… that just proves I’m good at it! Spare me your jealousy.


On the iXsystems announcement page for Fast Dedup, it includes a new “feature” that supposedly makes the Fast Dedup Table (FDT) notably more efficient:

“Favor”?
“Potential”?

That implies some sort of automation or prediction, such as entirely skipping a new block of data from having its hash written to the table.

But the only reference to pruning I could find is a new command that will be available in OpenZFS: zpool ddtprune

This implies a manually-invoked command that simply removes “single-hit” hashes from the table.


Is there anything happening under-the-hood with Fast Dedup, in which it will actually skip new blocks that don’t have “dedup potential”?


This raises another question:

What if someone “prunes” their Fast Dedup Table, removing all single-hit hashes, but then later such blocks are indeed “dedup-able”? “Too bad, so sad?” The existing blocks (whose hashes were previously removed from the table) will forever consume the extra amount of space that could have been “zero” had they remained part of the dedup table?

1 Like

Yeah, your analysis matches my overall understanding. A lot rests on the hypothesis that dedupable data will be meaningfully correlated in time. For instance, it’s not likely that OS bits for various VMs will magically be the same after a year, but very likely that VMs deployed on the same week will have some dedup potential.

3 Likes

Long video, but worth a watch. Doesn’t explain everything, but demystifies some of the new features of fast dedup.

2 Likes

Is there a corresponding written article or blog post?

I watched the entire thing, and while I get confused by some of the technical details, it did clear up this question:

Apparently, it doesn’t simply remove “all” single-hit entries from the table. It only prunes single-hit entries that are older than 90 days. Allan Jude himself said that this is based on “intuition”, and there’s the possibility in the future to gather metrics with “ghost entries” to determine if this might result in too much pruning of blocks that would have been “de-dupable”.

So it’s a trade-off: Keep the table small, which allows more room in the ARC for actual data cache, and prioritize already “established” deduped blocks. (At the small risk that you might have lost some de-dupable blocks in the table.)


There were some other questions raised after I watched the presentation, but I’ll save it for later.


Going to slip my personal opinion in here: I think dedup (and even fast dedup) requires a really specific use-case to justify its benefits over its costs. Seriously. Watch the video. :dizzy_face: Even fast dedup has to do A LOT, with added levels of complexity and performance hits, requiring more resources and RAM from your system…

…all to possibly save some storage space…

…even greater than the savings of inline compression…

…even with the advent of block-cloning?

(Yes, yes, I know block-cloning is still disabled by default for precautionary reasons, but in principle it can save tons of space without requiring special vdevs or a massive dedup table, or any extra complexity. You just copy a file, anywhere in the pool, and you’re done.)

As for dedup and fast dedup, too many things need to line up in order for it to be justified. I would even guess that 95%+ of TrueNAS home users would only be harmed by using deduplication.

Yes, it’s nice that deduplication is getting a re-haul, but I don’t see the appeal nor excitement for it. Better handling of inline ZSTD compression and (safe) block-cloning are something to celebrate. Fast Dedup? I’d say “that’s neat” and never touch it.

1 Like

@etorix

Not from us. We will likely have publish one on the lead up to 24.10.

1 Like

It is a good watch, but yes, very complicated.

I look forward to getting some empirical data on fast dedup, I am optimistic that it will have more real world use cases than standard ZFS dedup, but do expect home lab use to be limited. I hope to be surprised though.

Imho fast dedup will make dedup way more accessibile to home users, but we will see. SCALE has other issues to address right now.

OpenZFS’s videos are always a great learning experience, love them.

I don’t think home users should even play around with Fast Dedup. Even those who “think” they need it, probably don’t.

They would still need to be diligent and assess whether they even need deduplcation in the first place.

This is in light of:

  • we already have inline compression (fast, can save space)
  • we already (“not yet”) have block-cloning (no special setup needed, can save space)
  • additional, redundant vdev to hold FDT (fast dedup table)
  • additional RAM requirements for deduplication in general
  • and so on…
1 Like

Looks like my warning Resource on not using De-Dup was not migrated over, so I can’t update it for Fast De-Dup. (And I am too lazy to re-write it here in the new forums…)

But, I will put a link to it here for anyone that runs across this thread and thinks about using ZFS De-Dup;

2 Likes

Thanks for the heads-up. I’ll get those migrated over and credit accordingly.

2 Likes

From a use case standpoint, and as it relates to home users, I can see a lot of potential here. Particularly with virtual machines and applications, where we’ll probably get better ratios than file shares.

Many (most?) home users are storing media files for later consumption (Plex). It would be entirely wasteful to run dedupe for that.

But if you can get even a 1.5x reduction on small 1TiB all flash pool for your KVM homelab VMs? Heck yeah! Granted there will be a trade off in available RAM for VMs and ARC.

2 Likes

My NAS carries mirrors for ALL of the Ubuntu repositories, ports, releases, old-releases, changelogs, and many others. There are dozens of terabytes of duplicated packages across these datasets. I also mirror CentOS, Fedora, Debian, Slackware, CPAN, Apache and dozens of others.

I recently upgraded to 25.04.2.3 so I could finally enable fast dedup on each dataset, and get some storage back.

Initially, I was using an externally attached, secondary ZFS pool as a trampoline, creating a new dataset there, rsync’ing my existing dataset off of the main pool, destroying it, creating it with dedup=verify,sha256, and then rsync’ing the data back.

At some point during the rsync, the whole system would lock up, which I found was due to oom-kill killing init, systemd, zsh, nfsd and other core processes that run services on the NAS at the same time.

This resulted in corrupting a dozen VM disks, my containers, other datasets and more. I attributed this to the external enclosure causing a +5V PD bus reset, but since oom-kill was killing systemd and thus systemd-journald, there were no logs or forensics to prove this out, only my visual inspection of the console output, which scrolled by too fast with hundreds of serializes kernel crashes that I couldn’t see which came first.

So I went back, repaired everything that arc_evict corrupted when it wedged the box, did a MOUNTAIN of arc cache/sysctl/other tuning on the box, to get it to a state where it was no longer crashing, and ARC is now behaving.

After about 300 crashes/hard lockups, after having had 0 in the last 3 years, the NAS has been up for 1 day, 19 hours, 57 minutes without a single crash.

But now I’m realizing, I only have ‘dedup’ enabled, not fast dedup, and nowhere I can find, has any instructions on how to enable fast dedup, instead of plain dedup, including this linked video.

Does anyone actually have it running, on a machine that is NOT crashing because arc_evict causes the storage to wedge into an uninterruptible state?

1 Like

Adding to my previous, with dedup enabled on my datasets, I’m seeing roughly 14% savings, according to zdb. It’s definitely worth enabling, if you know how to tune TrueNAS ARC accordingly.

I don’t think it can replace an existing dedup table.

You can check if it’s enabled or active with zpool get feature@fast_dedup mypool

It can be enabled with zpool set feature@fast_dedup=enabled mypool or zpool upgrade mypool

:warning: Upgrading a pool or enabling a feature is a one-way process.

If the feature is already enabled, but not active, I would defer to others who are more versed on how to have it replace an existing dedup table.

1 Like

It wasn’t enabled (dedup was, not fast_dedup), and my DDT is consuming about 27GB, with 85 million entries.

I’ve enabled it, and will meticulously recreate my datasets with the feature flag on, and destroy the previous versions of the dataset, and see if I see any performance or capacity improvements.

Thank you!

1 Like

How much RAM does your server have?

Board locked at 64GB, and a total of 56TB of storage in the chassis. About half of that would be deduplicated. Sadly, I cannot change the board for one that takes more memory, or swap out the chassis for a larger version, it’s dedicated to a very compact space in a dedicated rack.

Looking around, I don’t see any Mini-ITX boards that support 128GB, 4+ SATA connector ports, m.2 connectors (for the Itanium SLOG) and an Ryzen CPU.

Lots of Intel i3/i5 boards, N100 boards in that form factor, some support 128GB but lack the SATA ports, or have the SATA ports, but are underpowered CPUs.

I’ll keep looking.

If you’ll be starting over again with repopulated datasets, consider switching to BLAKE3 instead of SHA256 for the checksum. It’s faster and it’s cryptographically strong enough for the requirements of deduplication.

This can and should be chosen when creating a new dataset that will be using deduplication.