Concept Validation - XFS ZVOL Mounted in Jail vs ZFS Dataset Passthrough for Veeam Direct Attached Storage Backup Repository with Synthetic Full Backups (FastClone)

digilur · September 16, 2024, 7:01pm

I’m putting together a Veeam Backup and Recovery solution to backup up some VMs and computers and would like to leverage my TrueNAS Core as the repository. The underlying storage I would like to use is a made up of 8x8TB 7200 RPM HDDs in a pool made up of 4x 2-way mirrored vdevs, so about 29 TB usable. That pool also acts as a fileserver which has a handful of datasets exposed as Samba shares. I’m the only user of these shares. I also have a pool which is a single 14TB backup drive I attach via eSATA, where I push periodic snapshots of my important datasets to.

I’d ideally like the image backups I create with Veeam to be immutable, incremental/synthetic full backups. I figure reflinked incremental backups will minimize my delta for my snapshots, so this must be the most efficient way to use my existing approach to file backups for my image backups. What I’m wondering is if creating a zvol on my pool which also stores my file share datasets, then formatting it as XFS with reflink, mounting it, and exposing it as direct attached storage to Veeam is the best approach to fulfill my goal? From my research it looks like there were a few caveats with this approach (Reddit - Dive into anything).

With the advent of reflink/block cloning now in OpenZFS, apparently stable since 2.2.3, and experimental support for synthetic fulls using fast clone from Veeam (OpenZFS 2.2 support for reflinks now available - R&D Forums), perhaps I’m better off using a jail that exposes a mounted zvol formatted for ZFS, or even passing through a dataset via a jail, or even using the TrueNAS host directly to provide the dataset to Veeam? I have it in my head that I should at least have a jail for some isolation so that only the one dataset is exposed and I can’t clobber my other datasets by accident. I assume there’s no performance penalty passing through a dataset like that? I assume there’s a performance gain to be had using a dataset instead of a zvol because the backups are file-based operations anyways and we are getting rid of a layer of block device management. Also wouldn’t have to worry about snapshots producing a consistent XFS filesystem. Maybe using native ZFS datasets offers other benefits as well?

When it comes to ZFS I’m in the know just enough to be dangerous category, so I appreciate anyone with more experience that could recommend which is the best approach and why, thanks!