Single drive ZFS pool for cold storage? (With offline backups)

In short, the script does :

mount the pool

Unlocks the dataset

Starts the backup of each requested folder

Delete files not present in the source disks

Checks when the last scrub happened (creates a file on the disk root folder to know that)

If more then 27 days, starts a scrub

Unmount pool.

And when combining the script with cron in truenas advances settings, it sends a notification of the important infos.

Another quick question if it wouldn”t be a problem.

What”s up with data fragmentation in ZFS filesystem? I realized that TrueNAS does not contain a disk defragment tool.

Why is this? Because it works differently then NTFS and doesn”t leave empty blocks when deleting data and cause fragmentation?

Fragmentation is a “legacy” issue with NTFS and FAT. Modern *nix filesystems, including ZFS, are smarter about the way they write and delete data.[1]

A defrag tool for ZFS would violate the “write once” policy for data blocks, and it wouldn’t even be useful.

The more free space a pool has, the less likely that your writes will be saved in a heavily fragmented manner.

“Pool fragmentation” reported by ZFS is not the same as fragmentation reported on NTFS. It’s based on the size of “contiguous free space chunks” and it uses its own metrics to detemine what constitutes “fragmented free space”.


  1. This is even less relevant for SSDs, which don’t require seeking with mechanical arms. ↩︎

2 Likes

Thanks a lot. I”ve been reading in the meantime and understood this.

Just one more detail… i”ve set the system up now with two 18TB toshibas and am doing file transfers and testing. I”ve set up both a gigabit connection to the router and a 2.5GbE for wired connection. Works amazing.

One last thing to be cleared up.

If i want to be able to do an incremental replication task, where the differences from source to destination are just quickly synchronized. I absolutely depend on the snapshot pipeline?

For an example, if the data on source and destination is identical. But the destination doesn”t have any snapshots while the source has snapshot 1. There is no way to synchronize them without rewriting the destination data, because there is no matching snapshots to compare?

This means that i always have to have a situation where both source and destination contain the matching “snapshot 1”. I then do another recursive snapshot 2 on the source and then the sync task hooks up to the naming schema and syncs the differences?

So when i am manually cleaning up snapshots. I must take care that i delete all of them. Except for the last one on both of the drives. Because if i delete the last one on any of the drives i have destroyed the pipeline. Which means that i didn”t loose data on any of the drives but i have lost the ability do to an incremental sync without rewriting everything.

Am i correct?

Not sure how the destination dataset would end up with no snapshots, unless you intentionally remove them.

Correct.


Why prune all but the latest? Unless you’re deleting a lot of large files, your old snapshots won’t be retaining much space that would otherwise be freed. Remember, if you only ever write new files, all your snapshots combined will consume no additional space on the pool.

Correct.

You can use the “Hold” feature to protect the latest snapshot on either side, which will prevent automatic or inadvertent deletion.

I intentionally removed them while testing stuff out. I was thinking, let”s see if i can clean all the snapshots out and then create a new one on the main drive to see if the sync task will recognize the differences.

I soon realized that i”m an idiot and will have to re-copy the backup again now.

So, manual snapshot with a specific naming schema. Hold on the last snapshot and never delete it unless the next one was created and backup was done.

So far everything seems to be running great. I”m just having i dilemma in weather i should keep my cold backups 3 and 4 (2x 3TB drives) in NTFS and ocassionaly clone to them using syncback pro or i should convert them to ZFS too. I have a problem of not trusting this system fully yet because i don”t understand it that good and i know i can always restore from NTFS drives and i know what i”m doing with them.

The main backup system with 2x18TB toshibas is however running great.

If you’re happy with your main backup (ZFS), then it doesn’t hurt to have the secondary and tertiary backups as NTFS or a non-ZFS Linux filesystem that you can do traditional file-based syncs to. Just be aware of permissions that don’t translate from ZFS → NTFS.

Diversify.

Correct. This is what it”s going to be. Thanks.

One other nice thing about using rsync for remote backups is that once you have done an initial backup of your source to a specific destination, on additional backups you only transferring the data which has changed on the local source since the last backup occurred. That makes the source system, network, and target system loads significantly lighter.

Furthermore, you can specify specific files or directories to exclude which you don’t wish to backup (i.e: log files, temp files, etc.)

All the best,

Bill

1 Like

Another real quick one.

For TimeMachine compatibile SMB shares. My best option is to create a “Time machine” sub-dataset under the main dataset, that will have an SMB share with a time-machine type, correct?

The other option would be making my main dataset the “time-machine” type which doesn”t seem like a good idea to me?

I don’t believe that’s even possible? Aren’t there safeguards in recent versions of TrueNAS that prevent the user from changing the root dataset options and permissions?

I’m on Core, so I can’t test it out.

Well i haven”t tested to be honest. But i wen”t with creating a sub-dataset within my main dataset instead of creating a second dataset on the main drive.

It works fine and the snapshot of the main dataset also backs-up the time-machine sub dataset to the backup drive so i think i”m good. Hope i didn”t do anything dangerous.

I’ll quibble a little here.

The rsync I use for backups here has to exhaustively traverse every directory in search of changed files. That can take a long time, even if little needs to be updated.

By contrast, a replication captures all changes by keeping track of everything that changed since the last snapshot: Permissions, deletions, additions, modifications, you name it. It is the most efficient way to transmit changes from one file system to another on an incremental basis that I know of.

Compared to rsync, replications positively fly re: speed for incremental backups, especially for large data sets with little data turnover. And by limiting network traffic to just the changes since the last replication, said traffic is also minimized compared to rsync traversals.

The same amount of data may need to be exchanged in both cases but rsync has to traverse the entire metadata of the source and receiver to compare and decide what files need to get transferred. In total, that’s way more network traffic than a replication would have needed.

1 Like