How to properly read Replication Task dialog?

emsicz · April 10, 2024, 11:32pm

I have main TrueNAS box with SCALE which holds data and does snapshots every day. I have secondary TrueNAS box with CORE which runs replication task by pulling data from the main NAS weekly. When the replication task on secondary NAS runs (CORE), this is the dialog that is shown:

20240411 - TRUENAS - Task Dialog

It says sending, but it should be a pull task. So what is being sent?
1 of 15: does this refer to number of snapshots that are being received?
76.45 GiB / 2.31 TiB: what do these numbers represent?
total 4.12 TiB of 4.12 TiB: what do these numbers represent?

Thanks.

dan · April 10, 2024, 11:35pm

Data is still being sent from one machine to the other, regardless of which machine initiated the connection.

I believe it does.

I’ll have to defer on the other questions.

winnielinnie · April 11, 2024, 2:52am

I believe #3 is for the current snapshot’s progress.^[1]

I believe #4 is for the totality of the replication task. I’m guessing that if this is the first time the replication was run, it’s telling you that the entire task is going to send 4.12 TiB of the total used space 4.12 TiB (represented by all snapshots).^[2]

So next time the task is run, you might see something like:
[total 20 GiB of 4.14 TiB]^[3]

I’m probably wrong ↩︎
I’m probably wrong ↩︎
I’m probably wrong ↩︎

emsicz · April 11, 2024, 8:35am

I might not understand how snapshots work, but this replication task has been running for about 6 months now, this dialog never made sense to me when it runs and since last replication, the source data was only slightly changed, we’re talking maybe 100 GiB added. Hence it is confusing to me why is 2.31 TiB being displayed and/or transferred.

And even if that was true, then nr. 3 would make no sense to nr 4 - if I’m 76 gigs there out of 2.31 TiB, then why the totality of the replication task shows that it’s basically done, because it’s showing 4.12 out of 4.12.

chuck32 · April 11, 2024, 8:48am

Do zfs list -t snapshot for the dataset to confirm the snapshot size or check it in GUI.

Stux · April 11, 2024, 9:35am

The total increases as the process progresses.

Basically, I think the replication task doesn’t look ahead. It only tells you what it knows, Ie what it’s doing and what it’s done so far.

emsicz · April 11, 2024, 10:38am

Here is dump of zfs list -t snapshots.

20240411 - TRUENAS - Snapshots

this kind of begs the question - since I have quite short retention policy on snapshots on the source NAS, is it possible that the target NAS has to pull entirety of the dataset, because there is not continuation of where the target NAS last synced? It’s visible from the screen that source NAS keeps 2 weeks worth of snapshots, but the sync was last done maybe a month ago. Would that explain why all of the data needs to be transferred?

Furthermore, here is what the dialog looks like today, after all night’s worth of syncing:

20240411 - TRUENAS - Task Dialog 2

I’m thinking the numbers are starting to make more sense. The 1.86 TiB / 2.31 TiB means that somehow it figured out it needs to transfer 2.31 TiB, that’s happening, regardless of what actually changed since last sync. The total figure [5.9 TiB of 5.9 TiB] I still don’t understand tho. And the 2.31 TiB figure is not the size of the snapshot, nor is it the size of referenced data, so I also don’t get how it arrived there.

Davvo · April 11, 2024, 2:21pm

… if incremental replication is enabled.

Constantin · April 11, 2024, 8:13pm

As I understand it, the incremental snapshots being sent basically consist of all the changes happening in one shape or form bundled into one compact beastie. That in turn eliminates all the issues associated with rsync (having to exhaustively traverse every directory structure in search of changes) and it also allows the system to roll back changes on a snap-shot by snap-shot basis, which is really cool.

It is also the reason why this form of updating is so efficient - It is truly a joy to watch two machines sitting next to each other exchanging snapshots over 10GbE as you see transfer speeds you never see under normal use.

That said, this dialog box has improvement potential for the simple reason that it is unintuitive enough for most of us to have to guess at what the NAS is telling us. Even consumer OS’ typically do a decent job of letting the user know what % of transfers have completed and what the estimated time remaining is.

emsicz · April 12, 2024, 11:54am

So would I be correct in assuming that because the time between replication was longer than retention period of snapshots on the source NAS, all data must be transferred until the oldest available snapshot on source NAS, compared and stored?