Yeah, its me again
i am still riddling about this topic.
Snapshots are fine, but become useless at the moment, when the originating pool gets lost/destroyed what so ever. Replicationtasks are based on Snapshots
Rsync the pool/dataset is coming into my mind, but - depending on the schedule - if i have to restore the complete pool, i will have to clear all Snapshots and start again on the recently restored pool and the data.
Snapshots on ZFS are based on the changes made on the pool. So Snapshots that are younger than the last complete rsync are referencing changes that havent happendâŚthats why - imho - are those snapshots not working anymore.
But i am willing to learn.
A snapshot is a complete record of the content of a dataset at the instant the snapshot was taken. Snapshot creation is an atomic event: Either the snapshot exists, and all the data it refers exists in the pool, or it doesnât exist; thereâs no such thing as a partial snapshot. (And no âsnaphot referencing changes that havenât happenedâ.)
The snapshot itself is metadataâhence uses little to no space on disk. But âsnapshot replicationâ copies the snapshot (i.e. metadata) and the data it refers to; as long as all data is not received by the destination, the snapshot does not exist yet on the destination. (And you cannot ârsyncâ a snapshot: You may rsync the data content of a snapshot, but only ZFS replication can copy the snapshot itself, i.e. metadata.)
Data referred to by a snapshot is immutable. It cannot be removed from the pool as long as the snapshot exists. (The active dataset can be modified, but its previous content is retained in the pool as long as the snapshot exists.)
For your own good, leave rsync to non-ZFS systems, and use only ZFS replication when both source and destination are ZFS. Replication is always (way) faster than rsync.
Moreover, snapshots are incredibly efficient for backup purposes because they encapsulate all the necessary changes. One of the reasons that rsync can take as long as it does is because it has to exhaustively go down every file directory to see whatâs different between dataset A and B.
Snapshots basically avoid all that traversal work because they have already captured every change. Thus, the only data that is transferred by snapshot is the changes to the respective datasets. All things being equal, snapshots are the way to go re: backup.
The only reason to use rsync is if you want a backup in whatever native file system your home computer OS uses, if it doesnât understand ZFS. For example, I have used rsync to back up my files to a HFS+ formatted volume in the past. That said, where exhaustive rsyncs can take hours, the replication will take minutes.
hm ⌠so what you are telling me, if i create a snapshot, replicate it to my TrueNas Backupserver via VPN, my house burns down, all harddrives were destroyed⌠i take a new PC with blanc discs, create a pool and then i am able to restore all the data from a 170kb file ?!
Sounds a bit ⌠impossible ?!
Impossible indeed⌠because a snapshot is not a file.
A snapshot is a collection of metadata. You cannot âsaveâ or âdownloadâ snapshot as a 170 kB item: It only exists whithin a ZFS pool, together with all the 2 TB of data it refersâcanât have the first without the second.
By the way, today the snapshot may be 170 kB on disk, but tomorrow if might use 2.3 MB because it referred to an earlier snapshot which has been deleted, and so the newer snaphsot has taken over the relevant metadata. The space used by a snapshot may change at any time; the content referred to by a snapshot, however, is immutableâthe 2 TB do not change.
If you had âreplicated the snapshotâ to external backup you have copied the full 2 TB of data alongside the metadata. The full data can be restored to a NAS by replicating from external backup. Likely time consuming but very possible.
Think of the âtruckâ as the available capacity (or pool).
Think of each âboxâ as a data block or file.
Think of any box with a âwhite stickerâ as included in the active dataset/filesystem.
Think of a âcolor tag setâ as a snapshot, which can also be viewed as a dataset/filesystem itself.
Think of a âfadedâ box as hidden/unavailable, and only exists in whatever snapshot(s) it is tagged with.
Iâve only skim read this thread as Iâm incredibly busy atm so apologies if this has already been stated but zpool checkpoints can protect against accidental/malicious deletion of datasets. @winnielinnie has made a great post regarding this.
Even the right terminology may not be entirely helpful here.
âA snapshotâ, in itself, is strictly metadata; BUT
âMounting/Broswing a snapshotâ pertains to the data referred to by the snapshot, presented as a read-only file system;
âReplicating a snapshotâ copies both data and metadata.
I hope you understand that I cannot understand what you understand or donât understand when your language appears to suggest that think that thereâs, somewhere, a tank-YYMMDD-HHmm.snapshot file, with a defined size and which snapshot file could be copied on it ownâŚ
Simply put, snapshots are a brilliant way to be able to almost treat the timeline for data on a NAS like the timeline inside a video editor. I have not had to use it often, but itâs super helpful when itâs needed.
I back up to one copy to a Synology, and another copy to an instance of Debian, because that is is all I have to back up to. ZFS replication is not an option for me. I use Rsync and it works flawlessly. I still take snapshots of selected data sets, but not really as a back up function.
So a snapshot is a reflection of the data in a pool in total? Snapshots arenât related to one another except they represent the data as it was at the time of snapshot creation? If they only contain metadata how do you recover data that no longer exists?
Snapshots are at dataset level, not necessarily poolâbut can if you snapshots the root dataset.
Snapshots of the same dataset are related to each other in that they know how to share metadata. Leave the details to ZFS.
Once again: Thereâs no such thing as a snapshot without the corresponding data.
As long as a block is referenced by at least one snapshot this block cannot be deleted or modified. (Remember: ZFS is Copy-on-Write.)
So if I create a dataset, put data on it and do an initial snapshot it, It would be a complete copy of the data. Then I modify the data and do more snapshotâs over time it the snapshot would one have the changed blocks from snapshot to snapshot. Am I right? Snapshots have always confused me.
The snapshot is metadata, not âa copy of the dataâ. Consider that the snapshot puts a little lock on each and every block of the data.
myfile.dat consist of blocks 1 to 10. You request to modify block 3. The block is locked by a snapshot . ZFS records a new block 11 with the modified data, and records that the current version of myfile.dat is 1-2-11-4-5-6-7-8-9-10. The snapshot records that myfile.dat at snapshot time was and shall forever remain 1-2-3-4-5-6-7-8-9-10. Copy-on-Write. Look this up.
Thanks its getting a little clearer I appreciate it. So lets say I create a dataset and put myfile.dat on it along with data in blocks 1-2-3-4-5-6-7-8-9-10. Then do a snapshot. it locks blocks1-10, I then modify block 4 it makes a new block 11 with the new modified data. When I do a snapshot it will lock block 11 etc⌠Am I close to the operation?