I ran my backups but the space usage on my pool is still too high (81%). I have a bunch of very, very old snapshots that I would like to delete.
This pool is occasionally replicated manually to another pool (an HDD that I plug in manually) as a backup mechanism: I plug the HDD, start the “Replication Task” (in the UI) manually, verify with rsync dry run that files have been properly copied, run a scrub on the backup pool.
My questions are:
A script to delete old snapshots? I’d be happy to delete any snapshot older than 3 days. Some of these snapshots have been created manually, so they don’t expire
What should I do with the “replicated” pool: should I delete the snapshots there too or will the replication task delete those snapshots next time I replicate? Notice that these snapshots have no expiration date, as I said, I created them manually
Things partially worked. It properly deleted the snapshots but for some reason 3 of my jails lost everything inside /root. I can’t seem to be able to recover this data from the previous snapshot either, which is really annoying.
Sorry, I meant root in the jail.
I don’t understand why it SELECTIVELY did that.
so I have /mnt/MARS/iocage/jails/sftp_1/root that’s GONE, but /mnt/MARS/iocage/jails/plexmediaserver_1/root that’s there and completely functional.
The weirdest thing is that for a while, the data was there, but /etc/rc.conf was missing. Then I pressed EDIT in the UI for the sftp_1 jail, without saving, and suddenly the root of those 3 jails is gone.
I did not migrate to SCALE, I’m still on CORE. All I really wanted was free up a bit of space from my HDD because I deleted files but the space was still allocated (I was at 82%). The space did free up, but I lost a bunch of files I really wasn’t looking forward losing.
If I have to guess, it’s related to the fact that I said “yes delete recursive” and it deleted some clones, maybe? I have no idea what ZFS snapshot clones are, so this is completely out of my knowledge.
To my understanding, deleting old snapshots shouldn’t have caused a data loss, but there it is, I lost data.
I have no idea what else I lost, but based on some sampling I shouldn’t have lost any of the important stuff.
I think I found what happened. I was lucky with the other 4 jails.
I think it’s caused by this concept of “clone jails”, so when I used “recursive”, it tried to delete the old RELEASE jails but for some reason the 4 jails that are safe were busy, so the snapshots were not deleted:
[dry-run] [1/4] removing MARS/iocage/releases/11.2-RELEASE/root@plexmediaserver_1: 5 years old (104 KB)
[dry-run] [2/4] removing MARS/iocage/releases/11.2-RELEASE/root@nextcloud_1: 5 years old (104 KB)
[dry-run] [3/4] removing MARS/iocage/releases/11.4-RELEASE/root@seedbox_3: 3 years old (104 KB)
[dry-run] [4/4] removing MARS/iocage/releases/11.4-RELEASE/root@znc_1: 3 years old (104 KB)
These are the 4 alive, the other 2 are missing. In the backup where no application was running, when I pruned the snapshots, it did delete the root of the jail too.
How do I secure the data that was preserved? I’d like to make it re-appear in the backup. Is my only option a full replication task from scratch?
It happens the same to me: i have a retention of 2 weeks, but i see older snaps (like a about 5 months!)… So i start deleting them and lose 2 stopped jails (other can’t be deleted, if i understand right is because jails are in use?).
Im so worried about that i stopped to “clean” snaps, but soon or later i will end space …
Keeping up with snapshots and pool available space can be tricky sometimes.
I would say, if you need to recover pool space by deleting snapshots, the first thing you need to understand is what’s involved with the content indexed by the snapshot.
From my point of view, the safer approach is to never destroy every snapshot that exist. I had some experience some years ago, on a replication Freenas server, when destroying all the snapshot of some dataset would delete every files from dataset.
The key is to keep the most recent snapshot (ie: create a recursive snapshot of the entire pool) and then you can explore options to deleting older snapshots.
If the 80% space limit is a concern, then don’t worry about that limit if you worry about integrity of your data. You can still safely reach near 100% capacity and ZFS is still going to be able to safeguard your data.
Best approach is to create manual recursive snapshot of the entire pool.
the problem for me started when i tried to install an application via port in a jail, that result in like 30gb of “installing data”, that i tried like 2-3 times… this made snapshot become very big during the validity of retention.
Otherwise i never had the need to manually delete anything…
this is the actual situation… if you think about the “caddy” jail is really only a jail with caddy used to render an html page… 6gb
From my point of view, the safer approach is to never destroy every snapshot that exist. I had some experience some years ago, on a replication Freenas server, when destroying all the snapshot of some dataset would delete every files from dataset.
Yeah, for sure i don’t delete anything from now on… i was really lucky that those 2 jails weren’t used anymore
Best approach is to create manual recursive snapshot of the entire pool.
There are difference from the automatic recursive snapshot, and the manual one?
zfs destroy -R destroys recursively snapshots that are clones of the destroyed snapshot, even out of the “tree”. So that’s what killed my jails. The only reason why the other 4 were still there is because those jails were in use and holding the datasets.
Now my jail ended up in a weird state where when I try to replicate it, it says it’s skipping any snapshot after the one where the problem happened.
Not sure how to address that.
I was able to “force” a replication by: erasing the backup pool, creating a snapshot with a new naming scheme and replicating ONLY the snapshots with the new naming scheme (so, just 1 snapshot). This worked in recreating the 2TB of data.
Is there any way for me to fix the “source” now?
Summary of the errors:
skipping snapshot MARS/mymedia@francesco-202408021221 because it was created after the destination snapshot (francesco-202407311813)
skipping snapshot MARS/iocage/jails/znc_1@francesco-202408021221 because it was created after the destination snapshot (francesco-202407311813)
cannot send MARS@francesco-202407311813 recursively: snapshot MARS/iocage/jails/znc_1/root@francesco-202407311813 does not exist
warning: cannot send 'MARS@francesco-202407311813': backup failed
cannot receive: failed to read from stream")
Reading the documentation a bit, turns out the problem are clone snapshots.
The positive thing is that I didn’t lose any actual data, because I don’t use clone snapshots. The downside is that it killed the jails, because those by default are clonejails (which I had no idea what they were, it’s just the default).
Recreating and reconfiguring jails do require a serious time investment from my part, so it is frustrating. I did not realize how dangerous the “Recursive” option for zfs destroy is, since it can delete datasets “out of the tree” too, so when I deleted the snapshots for the jail release (11.something), it deleted my jails.