Unexpected Replication Task Behaviors

I’m running TrueNAS Scale ElectricEel-24.10.2.1 and am trying to understand some unexpected data replication behaviors.

Background

I have multiple disks that I rotate in and out of my system as part of my backup system. Every day, a snapshot is taken on all of my datasets and this data is backed up to these disks via a Replication Task.

Yesterday, I rotated one of the disks and ran the Replication Task. TrueNAS threw an error when replicating one of the datasets in the Task:

[EFAULT] No incremental base dataset ‘[REDACTED]’ and replication from scratch is not allowed

Unless I’m misunderstanding, this isn’t surprising to me because it had been a long time since I’ve run the Replication Task on this particular backup disk and a lot of data/snapshots have come and gone since its last backup. I enabled “Replication from scratch” and executed the Replication Task. This completed successfully.

Issue 1

When I enabled “Replication from scratch”, one of the undesirable things I noticed was that when the task was replicating the data for my largest dataset, it seemed TrueNAS was copying all of the data/snapshots first, then deleting the old data/snapshots afterward. I can see that being helpful in the event the Task fails or is interrupted, but one side effect is this almost maxed out my disk (~92% full). The Task did delete the old data afterward, which brought the storage utilization back down to 63%. Is this expected behavior?

Issue 2

Since I allowed a full Replication Task to run with “Replication from scratch”, everything should have been clean, so I disabled “Replication from scratch”. The Replication Task is running again now, but I’m seeing something that doesn’t make sense to me.

The Task is replicating a dataset that is 5.81TiB in size*. These are images and videos from my NVR. Every day, the NVR adds today’s recordings and deletes the oldest day’s recordings. A day of recording is ~700GB of data. I would expect the Replication Task to be writing ~700GB of new data since it runs every day and ~700GB of new data was written to the dataset since the last Replication Task was completed.

However, TrueNAS is indicating this Task is replicating 4.71TiB of data for this NVR dataset. I don’t understand how this is possible. What would cause TrueNAS to think the backup drive needs 4.71TiB of data to be written to it, with the “Replication from scratch” setting disabled, if there’s no way this much data has changed or been added since the last Replication Task was executed?

I also note that the overall amount of data being written by the Replication Task is 8.54TiB, which doesn’t make sense because this would be 3.83TiB for the other datasets. Including the NVR, less than 1TiB of data was changed between these two Replication Task runs, so TrueNAS is writing > 8x the actual amount of data it needs to.

It seems to me that TrueNAS is replicating a lot of this data from scratch even though I’ve disabled “Replication from scratch”.

*This is the figure reported in the Datasets tab in the TrueNAS web UI. Am I correct in assuming this is the size of the data + the size of all snapshots for this dataset?

Disregard. There were several issues and misunderstandings here that contributed to some or all of this.

I might make a separate Topic for this if I can replicate, but I have seen a Replication Task issue twice where if you have multiple source datasets (e.g. a/x, a/y, a/z) going to a single destination, the task will usually map that correctly on the destination, but will sometimes nest them instead (e.g. the destination may already have a/x, a/y, and a/z, but will start copying data to a/a/x, a/a/y, a/a/z). Let me know if anyone has an explanation for that. Last time I encountered that, I deleted the nested data on the destination and recreated the replication task (with the same settings) to get around it.

Also, it’s annoying that there is no way, that I can find, to end a running replication task in the GUI. In my case, I saw it replicating terabytes of data incorrectly (nesting issue described above) and couldn’t find a way to stop it other than to restart the system.

How did you configure the Replication Task?

The GUI changes what actually happens under the hood if multiple roots are selected.

It might not seem like it, but multiple, independent zfs send/recv are being issued, even though the GUI presents it as a “single” task.