Deleting old snapshots from source and replica (replica from replication task)?

Fire-Dragon-DoL · August 1, 2024, 11:13pm

I ran my backups but the space usage on my pool is still too high (81%). I have a bunch of very, very old snapshots that I would like to delete.

This pool is occasionally replicated manually to another pool (an HDD that I plug in manually) as a backup mechanism: I plug the HDD, start the “Replication Task” (in the UI) manually, verify with rsync dry run that files have been properly copied, run a scrub on the backup pool.

My questions are:

A script to delete old snapshots? I’d be happy to delete any snapshot older than 3 days. Some of these snapshots have been created manually, so they don’t expire
What should I do with the “replicated” pool: should I delete the snapshots there too or will the replication task delete those snapshots next time I replicate? Notice that these snapshots have no expiration date, as I said, I created them manually

EDIT: I found this script, but I don’t know if it’s still functional GitHub - bahamas10/zfs-prune-snapshots: Remove snapshots from one or more zpools that match given criteria

EDIT 2: the dry run makes me hopeful

Fire-Dragon-DoL · August 2, 2024, 2:43am

Things partially worked. It properly deleted the snapshots but for some reason 3 of my jails lost everything inside /root. I can’t seem to be able to recover this data from the previous snapshot either, which is really annoying.

Stux · August 2, 2024, 3:21am

/root in the jail? or in your truenas?

/root on truenas is not backed up as part of a config, but is only carried forward through an in place upgrade.

Ergo, if you upgrade from Core → Scale via the installer, your /root will be reset.

Fire-Dragon-DoL · August 2, 2024, 3:33am

Sorry, I meant root in the jail.
I don’t understand why it SELECTIVELY did that.

so I have /mnt/MARS/iocage/jails/sftp_1/root that’s GONE, but /mnt/MARS/iocage/jails/plexmediaserver_1/root that’s there and completely functional.

The weirdest thing is that for a while, the data was there, but /etc/rc.conf was missing. Then I pressed EDIT in the UI for the sftp_1 jail, without saving, and suddenly the root of those 3 jails is gone.

I did not migrate to SCALE, I’m still on CORE. All I really wanted was free up a bit of space from my HDD because I deleted files but the space was still allocated (I was at 82%). The space did free up, but I lost a bunch of files I really wasn’t looking forward losing.

Stux · August 2, 2024, 4:11am

Deleting snapshots shouldn’t do that :-\

Fire-Dragon-DoL · August 2, 2024, 4:23am

If I have to guess, it’s related to the fact that I said “yes delete recursive” and it deleted some clones, maybe? I have no idea what ZFS snapshot clones are, so this is completely out of my knowledge.

To my understanding, deleting old snapshots shouldn’t have caused a data loss, but there it is, I lost data.

I have no idea what else I lost, but based on some sampling I shouldn’t have lost any of the important stuff.

Fire-Dragon-DoL · August 2, 2024, 4:46am

I have a wild guess, it looks like zfs destroy can be used also to destroy filesystem/volumes. I wonder if it can be used to destroy a whole dataset.

If that’s the case, I speculate a bug in the script that for some reason deleted those 3 datasets.

Fire-Dragon-DoL · August 2, 2024, 5:03am

I think I found what happened. I was lucky with the other 4 jails.
I think it’s caused by this concept of “clone jails”, so when I used “recursive”, it tried to delete the old RELEASE jails but for some reason the 4 jails that are safe were busy, so the snapshots were not deleted:

[dry-run] [1/4] removing MARS/iocage/releases/11.2-RELEASE/root@plexmediaserver_1: 5 years old (104 KB)
[dry-run] [2/4] removing MARS/iocage/releases/11.2-RELEASE/root@nextcloud_1: 5 years old (104 KB)
[dry-run] [3/4] removing MARS/iocage/releases/11.4-RELEASE/root@seedbox_3: 3 years old (104 KB)
[dry-run] [4/4] removing MARS/iocage/releases/11.4-RELEASE/root@znc_1: 3 years old (104 KB)

These are the 4 alive, the other 2 are missing. In the backup where no application was running, when I pruned the snapshots, it did delete the root of the jail too.

How do I secure the data that was preserved? I’d like to make it re-appear in the backup. Is my only option a full replication task from scratch?

oxyde · August 2, 2024, 5:25am

It happens the same to me: i have a retention of 2 weeks, but i see older snaps (like a about 5 months!)… So i start deleting them and lose 2 stopped jails (other can’t be deleted, if i understand right is because jails are in use?).

Im so worried about that i stopped to “clean” snaps, but soon or later i will end space …

Apollo · August 2, 2024, 5:56am

Keeping up with snapshots and pool available space can be tricky sometimes.
I would say, if you need to recover pool space by deleting snapshots, the first thing you need to understand is what’s involved with the content indexed by the snapshot.
From my point of view, the safer approach is to never destroy every snapshot that exist. I had some experience some years ago, on a replication Freenas server, when destroying all the snapshot of some dataset would delete every files from dataset.
The key is to keep the most recent snapshot (ie: create a recursive snapshot of the entire pool) and then you can explore options to deleting older snapshots.
If the 80% space limit is a concern, then don’t worry about that limit if you worry about integrity of your data. You can still safely reach near 100% capacity and ZFS is still going to be able to safeguard your data.

Best approach is to create manual recursive snapshot of the entire pool.

oxyde · August 2, 2024, 6:27am

the problem for me started when i tried to install an application via port in a jail, that result in like 30gb of “installing data”, that i tried like 2-3 times… this made snapshot become very big during the validity of retention.
Otherwise i never had the need to manually delete anything…

this is the actual situation… if you think about the “caddy” jail is really only a jail with caddy used to render an html page… 6gb

From my point of view, the safer approach is to never destroy every snapshot that exist. I had some experience some years ago, on a replication Freenas server, when destroying all the snapshot of some dataset would delete every files from dataset.

Yeah, for sure i don’t delete anything from now on… i was really lucky that those 2 jails weren’t used anymore

Best approach is to create manual recursive snapshot of the entire pool.

There are difference from the automatic recursive snapshot, and the manual one?

Fire-Dragon-DoL · August 2, 2024, 6:50am

That’s what I did, but the data is still missing, even if I have a snapshot that was literally created yesterday (before the snapshot pruning)

Fire-Dragon-DoL · August 3, 2024, 12:07am

I think I understand what’s going on.

zfs destroy -R destroys recursively snapshots that are clones of the destroyed snapshot, even out of the “tree”. So that’s what killed my jails. The only reason why the other 4 were still there is because those jails were in use and holding the datasets.

Now my jail ended up in a weird state where when I try to replicate it, it says it’s skipping any snapshot after the one where the problem happened.

Not sure how to address that.
I was able to “force” a replication by: erasing the backup pool, creating a snapshot with a new naming scheme and replicating ONLY the snapshots with the new naming scheme (so, just 1 snapshot). This worked in recreating the 2TB of data.
Is there any way for me to fix the “source” now?

Summary of the errors:

skipping snapshot MARS/mymedia@francesco-202408021221 because it was created after the destination snapshot (francesco-202407311813)
skipping snapshot MARS/iocage/jails/znc_1@francesco-202408021221 because it was created after the destination snapshot (francesco-202407311813)
cannot send MARS@francesco-202407311813 recursively: snapshot MARS/iocage/jails/znc_1/root@francesco-202407311813 does not exist
warning: cannot send 'MARS@francesco-202407311813': backup failed
cannot receive: failed to read from stream")

Fire-Dragon-DoL · August 3, 2024, 12:10am

Reading the documentation a bit, turns out the problem are clone snapshots.

The positive thing is that I didn’t lose any actual data, because I don’t use clone snapshots. The downside is that it killed the jails, because those by default are clonejails (which I had no idea what they were, it’s just the default).

Recreating and reconfiguring jails do require a serious time investment from my part, so it is frustrating. I did not realize how dangerous the “Recursive” option for zfs destroy is, since it can delete datasets “out of the tree” too, so when I deleted the snapshots for the jail release (11.something), it deleted my jails.

Stux · August 3, 2024, 1:57am

In Scale, there is an option to not delete dependent clones or if there is a child snapshot etc, when deleting snapshots from the gui.

Fire-Dragon-DoL · August 3, 2024, 3:52am

That would be nice. I did use a script to prune the snapshots, so I’m not sure it would have helped.

I learned about snapshots clone though and I understand that I definitely want to promote all my existing jails.

That being said, I will need to migrate to SCALE this year, so I’m not sure it matters that much, since I’ll need to rebuild all the jails either way.