While it all was interesting, it doesn’t actually answer my question. Perhaps I should read some ZFS dev docs.
Thanks for the info anyway.
No, I’m not really confused by the term. And I did understand that it works the way you described in your ginormous acronym before I started the topic.
According to you, ZFS removes the block upon being discarded. Whatever it means.
According to the other source (which I found thanks to your TXG mention), “There are tens of thousands and sometimes millions of old versions of each object set in any given pool. Many, if not most, of these versions are partially damaged”.
I’ll try to investigate it further by myself.
Again, thanks for your time and support. Really appreciated.
If it’s not part of the live filesystem (“white sticker”) and it’s not protected by a snapshot (“color sticker”), then it is free space. It doesn’t linger there. What once was allocated space (used by the blocks) is now free space.
That’s in the context of last-resort emergency recovery. You should never find yourself in a situation where you have to resort to this.
It is possible to do an emergency import by using an (hopefully) intact TXG from very recently before your entire pool became corrupted or was damaged. This is not guaranteed, and relies on a bit of luck.[1]
That’s why I said “for all intents and purposes”. You might as well treat discarded blocks as having their space freed up immediately. Pretend that no such emergency recovery is possible, since you shouldn’t use your pool (and the data within) with the mindset of “I can use a recovery tool or emergency import if I accidentally deleted data.”
If you delete the data, and you don’t have any snapshots, then consider the data is gone forever. Don’t get caught in the weeds about “potentially recovering the data”. Use snapshots.
You can see some examples of such recovery attempts on these very forums. There was a notable one that was thankfully a success, as dire as their situation was looking. ↩︎
Yeah, it is considered free by ZFS, but drives’ LBAs of which ZFS-block (apparently) consists should still be “intact” (let’s forget about trim).
I get it. I’m trying to understand why ZFS was designed that way so that it can’t be recovered without snapshots. Because it seems like a natural feature considering prerequisites.
I get it. That’s why I set up the snapshots schedule mentioned in the first post.
In the context of forensics and potential emergency recovery. But that’s beyond the scope of ZFS or snapshots. It applies also to non-CoW filesystems, such as NTFS, Ext4, and XFS. (This is why data recovery is possible, even after “permanently deleting” files on an Ext4 filesystem on a HDD.)
No filesystem is. Nor should they be. If you want a granular level of rolling back to a point-in-time, just create a task that takes snapshots every 5 seconds.
Otherwise, you’re asking for ZFS to allow “rolling back to whatever point-in-time without snapshots”. Which basically means it has to keep every single block from every single TXG it ever created… forever…
Let’s say your available capacity is at 30%. Which previously discarded blocks should keep their TXG pointers intact? You can’t save them all, right? So which ones?
Without user purpose (intentionally creating snapshot tasks), how would ZFS or any filesystem handle this?
If it’s not in the context of forensics or emergency data recovery, then it’s not worth worrying about for normal ZFS usage.
I’m not sure what you are trying to get at here. Are you asking for a versioning filesystem (where each write to a file creates a new version of the file, Apples APFS does some amount of this)? There have been versioning filesystems in the past, I believe that the default filesystem on Digital Equipment Corp (DEC) VAX computers was one. It kept a certain number of old versions of each file.
I assume you are referring to recovering a specific file to a specific point in time and not overall recovery of an entire filesystem.
But ZFS has put that particular block on the list of free/available blocks so it will be overwritten by one of the future CoW operations.
Only blocks that are referenced by either the active state of the FS or a snapshot won’t.
You are in some way correct that in theory a rollback could be possible without explicit snapshots. But that implies that a block once written will never be reused. And that is not the case even with CoW because you do not have an infinite number of blocks for future writes.
One of the first CoW file systems, LFS (log structured file system) implemented in 4.4BSD by Margo Seltzer et al. suffered exactly from that assumption. In a research project context they implemented this FS without considering “recycling” of blocks at all. And they got impressive performance figures for writes.
For practical operation, though, you need some form of “recycling”. So they introduced a background garbage collection service.
Result: unless you had sufficient free disk space to schedule those garbage collection runs outside work hours performance dropped significantly because of all the locking of metadata structures involved.
Historically very interesting including a sometimes quite heated debate between Margo and Kirk McKusick.
But in the end “just” a research project, although I think it is safe to consider it a major influence on the design of ZFS.
No, it doesn’t. I understand that GC is necessary. I was speaking about restoring versions of blocks/files before disposal of LBA.
Let’s assume GC marked some block free. It doesn’t mean that it will be rewritten right away. Moreover, I saw the statement that ZFS scheduler (I don’t remember the exact term) prefers writing to big “free” ranges. So in the case of a single block change, the old version of this very block can virtually persist for a long period of time before it actually would be rewritten.
To make things clear. I understand that enterprise doesn’t need such a feature. Depending on which exact LBAs were disposed of by GC/Scheduler during the last run is generally not a good idea. I was more like daydreaming.