Fragmentation is a âlegacyâ issue with NTFS and FAT. Modern *nix filesystems, including ZFS, are smarter about the way they write and delete data.[1]
A defrag tool for ZFS would violate the âwrite onceâ policy for data blocks, and it wouldnât even be useful.
The more free space a pool has, the less likely that your writes will be saved in a heavily fragmented manner.
âPool fragmentationâ reported by ZFS is not the same as fragmentation reported on NTFS. Itâs based on the size of âcontiguous free space chunksâ and it uses its own metrics to detemine what constitutes âfragmented free spaceâ.
This is even less relevant for SSDs, which donât require seeking with mechanical arms. âŠď¸
Thanks a lot. Iâve been reading in the meantime and understood this.
Just one more detail⌠iâve set the system up now with two 18TB toshibas and am doing file transfers and testing. Iâve set up both a gigabit connection to the router and a 2.5GbE for wired connection. Works amazing.
One last thing to be cleared up.
If i want to be able to do an incremental replication task, where the differences from source to destination are just quickly synchronized. I absolutely depend on the snapshot pipeline?
For an example, if the data on source and destination is identical. But the destination doesnât have any snapshots while the source has snapshot 1. There is no way to synchronize them without rewriting the destination data, because there is no matching snapshots to compare?
This means that i always have to have a situation where both source and destination contain the matching âsnapshot 1â. I then do another recursive snapshot 2 on the source and then the sync task hooks up to the naming schema and syncs the differences?
So when i am manually cleaning up snapshots. I must take care that i delete all of them. Except for the last one on both of the drives. Because if i delete the last one on any of the drives i have destroyed the pipeline. Which means that i didnât loose data on any of the drives but i have lost the ability do to an incremental sync without rewriting everything.
Not sure how the destination dataset would end up with no snapshots, unless you intentionally remove them.
Correct.
Why prune all but the latest? Unless youâre deleting a lot of large files, your old snapshots wonât be retaining much space that would otherwise be freed. Remember, if you only ever write new files, all your snapshots combined will consume no additional space on the pool.
Correct.
You can use the âHoldâ feature to protect the latest snapshot on either side, which will prevent automatic or inadvertent deletion.
I intentionally removed them while testing stuff out. I was thinking, letâs see if i can clean all the snapshots out and then create a new one on the main drive to see if the sync task will recognize the differences.
I soon realized that iâm an idiot and will have to re-copy the backup again now.
So, manual snapshot with a specific naming schema. Hold on the last snapshot and never delete it unless the next one was created and backup was done.
So far everything seems to be running great. Iâm just having i dilemma in weather i should keep my cold backups 3 and 4 (2x 3TB drives) in NTFS and ocassionaly clone to them using syncback pro or i should convert them to ZFS too. I have a problem of not trusting this system fully yet because i donât understand it that good and i know i can always restore from NTFS drives and i know what iâm doing with them.
The main backup system with 2x18TB toshibas is however running great.
If youâre happy with your main backup (ZFS), then it doesnât hurt to have the secondary and tertiary backups as NTFS or a non-ZFS Linux filesystem that you can do traditional file-based syncs to. Just be aware of permissions that donât translate from ZFS â NTFS.
One other nice thing about using rsync for remote backups is that once you have done an initial backup of your source to a specific destination, on additional backups you only transferring the data which has changed on the local source since the last backup occurred. That makes the source system, network, and target system loads significantly lighter.
Furthermore, you can specify specific files or directories to exclude which you donât wish to backup (i.e: log files, temp files, etc.)
For TimeMachine compatibile SMB shares. My best option is to create a âTime machineâ sub-dataset under the main dataset, that will have an SMB share with a time-machine type, correct?
The other option would be making my main dataset the âtime-machineâ type which doesnât seem like a good idea to me?
I donât believe thatâs even possible? Arenât there safeguards in recent versions of TrueNAS that prevent the user from changing the root dataset options and permissions?
Well i havenât tested to be honest. But i wenât with creating a sub-dataset within my main dataset instead of creating a second dataset on the main drive.
It works fine and the snapshot of the main dataset also backs-up the time-machine sub dataset to the backup drive so i think iâm good. Hope i didnât do anything dangerous.
The rsync I use for backups here has to exhaustively traverse every directory in search of changed files. That can take a long time, even if little needs to be updated.
By contrast, a replication captures all changes by keeping track of everything that changed since the last snapshot: Permissions, deletions, additions, modifications, you name it. It is the most efficient way to transmit changes from one file system to another on an incremental basis that I know of.
Compared to rsync, replications positively fly re: speed for incremental backups, especially for large data sets with little data turnover. And by limiting network traffic to just the changes since the last replication, said traffic is also minimized compared to rsync traversals.
The same amount of data may need to be exchanged in both cases but rsync has to traverse the entire metadata of the source and receiver to compare and decide what files need to get transferred. In total, thatâs way more network traffic than a replication would have needed.