Should I switch to rsync?

elorimer · April 10, 2024, 4:41pm

I have three Scale servers going: my main server (A), a backup server(B) next to it, and a third server(C) on another continent. The A replicates about 6TB of datasets to B and C. The location with A and B has 10mbps upload and so replication to C from scratch would take 100 days or so; C has symmetric 1GB. The files do not change a lot over time, and replication usually takes less than a minute or so.

I’ve now physically relocated to C for the next year and it would be more convenient to work off of C, but any changes I make to a C dataset are not going to transfer to A or B in my setup. I’m wondering if I should change the connection between A and C to rsync (leaving A to continue to replicate to B for backup), or change the direction of replication from A–>C to C–>A. But I don’t want to risk a full filesystem rebuild by tinkering with something that works really well.

What I have now:

graph LR;
    A--> B & C;

Change to

graph LR;
  C-->A;
  A-->B;

or

graph LR;
C<-->A;
A-->B;

Suggestions?

Constantin · April 10, 2024, 5:13pm

Rsync is great at replicating data in general but it is not nearly as efficient as the zfs send snapshot block sending equivalent. I was amazed how much faster zfs send is than rsync, especially as rsync has to traverse every directory, every file, individually.

I wonder if there is a way to bless the C dataset to make it the master. Maybe the gods of zfs wandering these halls like @kris know of an approach to do that?

elorimer · April 10, 2024, 5:19pm

Yes, by changing the direction of replication I meant:

Doing one final replication of A to C, then disabling that replication.
Snapshotting the data sets in C, then creating a C → A replication task
Letting A continue to replicate to B.

But I don’t want to risk having #2 involve a full replication, which in that direction might only take a week, but means the A → B replication would also be a full replication and in that time I would not have any redundancy.

kris · April 10, 2024, 5:26pm

It is possible to reverse a replication if you are super careful and ensure you do no writes beyond the last snapshot that was replicated and common between the hosts. But not done very often.

Personally I treat ZFS as one-way replications:

A → B → C

However if you are doing file-only and want flexibility to reverse the stream arbitrarily, you can do rsync, which is heavy to do the re-scans each time. Or alternatively I’d recommend syncthing, which is rsync-like, but can do three-way syncs (or more) and not need to re-scan every file on disk each time. It’ll monitor for changes and only send the blocks which differ.

Syncthing:

    A
   / \
  B - C

Stux · April 10, 2024, 5:39pm

I’d probably reverse the replication, your offsite becomes the primary… and the current primary and backup become the offsite and offsite backup…

Right?

ericloewe · April 10, 2024, 6:51pm

Yeah, this is equal parts doable and massively painful to get right (thanks zfs send/recv CLI /s). It shouldn’t be dangerous, though, unless you use the option that rolls back to the last snapshot.

winnielinnie · April 10, 2024, 9:37pm

Another way to understand the “rules” of ZFS replication:

If a dataset receives snapshots from another source dataset, then it can never create its own snapshots, and…
A dataset cannot receive from multiple source datasets

Both conditions must be true. Otherwise, you break the replication process.

If both conditions are true, you can have any number or series of datasets in your backup/replication plan.

EDIT: This is one of those things that makes more sense with a chart or diagram.

I wonder if Discourse has a plugin to enable simple flowcharts in a forum post?

ericloewe · April 10, 2024, 11:59pm

Apparently so, but let’s try to contain the feature requests before you burn out iX’s IT department.

winnielinnie · April 11, 2024, 12:12am

But it would be so useful in the context of a tech forum, especially one that involves NAS, file transfers, backups, storage, snapshots, etc.

I’ll be ~~harassing~~ suggesting this feature request in the other thread.

elorimer · April 11, 2024, 6:35am

So in my scenario, if a data set has in the past received snapshots from another source dataset, but no longer does, can it thereafter create its own snapshots? Is “never” here “never” or “never while it receives”?

ericloewe · April 11, 2024, 7:00am

Never while it receives. The datasets are idempotent if and only if they’re on the exact same snapshot with no additional data.

Stux · April 11, 2024, 7:45am

It’s not so much new snapshots, but rather new blocks being written.

A desitination nas can make additional snapshots, but in order to replicate to a destination there needs to be a base snapshot in common, and the replication needs to reset/rollback the destination to that base before proceeding.

Thus if you write to the destination after a snapshot is replicated, to continue replicating you need to erase all the writes back to when the common snapshot was.

This is fine if you are going to reverse the replication. And possibly reverse it again in the future.

winnielinnie · April 11, 2024, 12:12pm

If I had a flowchart, I could explain it better. If only…

kris · April 11, 2024, 12:32pm

You mean the flowchart I just enabled about 20 minutes ago?

graph LR;
    A--> B & C & D;
    B--> A & E;
    C--> A & E;
    D--> A & E;
    E--> B & C & D;

winnielinnie · April 11, 2024, 2:05pm

In the following examples, the cylinders represent datasets. (They do not represent pools or servers.)

This can work
Because in the “chain of replications”, there is only one source dataset that has snapshots explicitly created created. The other datasets (along the “chain”) only ever have snapshots “created” by receiving them from a replication. (No snapshots are explicitly created on the datasets themselves.)

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#CCFFCC,color:black
style ds2 fill:#CCFFCC,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)] -->|replication| ds2[(Backup\nRemote)]

us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0

linkStyle 0,1,2,3,4 stroke:green

This can also work
Because of the reasons explained above, and…
…even though there’s a “fork” in the chain of replication, these incoming snapshots all originate from the original source dataset. No datasets (other than the original source) have snapshots explicitly created on them.

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#CCFFCC,color:black
style ds2 fill:#CCFFCC,color:black
style ds3 fill:#CCFFCC,color:black
style ds4 fill:#CCFFCC,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)] -->|replication| ds2[(Backup\nRemote)]
ds1 -->|replication| ds3[(Friend's\nHouse)]
ds0 -->|replication| ds4[(Backup\nIn-House)]

us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0
linkStyle 0,1,2,3,4,5,6 stroke:green

This will NOT work
Because there is a dataset that tries to receive a replication from more than one source dataset. (It is the equivalent of receiving replications from elsewhere and writing new files on the dataset itself.)

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#CCFFCC,color:black
style ds2 fill:#FFCCCC,color:black
style ds3 fill:#CCFFCC,color:black
style ds4 fill:#CCFFCC,color:black
style ds5 fill:#CCFFFF,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black
style us4 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)] -->|replication| ds2[(Backup\nRemote)]
ds1 -->|replication| ds3[(Friend's\nHouse)]
ds0 -->|replication| ds4[(Backup\nIn-House)]
ds5[(Other\nDataset)] -->|replication| ds2

us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0
us4(user or\n system) -..->|create snapshots| ds5

linkStyle 1,4 stroke:red,stroke-width:6px
linkStyle 0,2,3,5,6,7,8 stroke:green

This will also NOT work
Because there is a dataset that receives from elsewhere and has its own snapshots explicitly created on itself.

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#FFCCCC,color:black
style ds2 fill:#CCFFCC,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black
style us4 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)]
ds0 -->|replication| ds2[(Backup\nIn-House)]


us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0
us4(user or\n system) -..->|write files and\ncreate snapshots| ds1

linkStyle 1,2,3,4 stroke:green
linkStyle 0,5 stroke:red,stroke-width:6px

*I’m still getting a feel for this “mermaid” plugin. It’s not as seamless as you think.

I don’t know why it needlessly “crops” the charts. It makes it more difficult to view.

Stux · April 11, 2024, 2:11pm

Actually, I think you’ll find this works fine.

Ie I have a backup pool which receives replicated datasets, and periodically takes a 10 year retention snapshot. source datasets do not have a long retention

winnielinnie · April 11, 2024, 2:14pm

So the “mermaid” plugin is a bit rough on the edges. By quoting my post, it yields a massive scramble of text.

winnielinnie · April 11, 2024, 2:17pm

On the same dataset? If so, it means the incoming replication will either fail or rollback the target dataset, essentially destroying your “10-year retention, explicitly created” snapshots.

You can’t have it both ways on the same dataset. (Different datasets on the same pool is unrelated to the above situations.)

Stux · April 11, 2024, 2:22pm

Incoming snapshots still have a common base as the most recent snapshot on the destination is still present on the source system.

Even if there is an extra long retention snap at the same point which is not present on the source.

Destination diverges if source originated snapshots are deleted and are still present on source OR if new blocks are written, not because additional snapshots are taken on the destination.

winnielinnie · April 11, 2024, 2:25pm

You are right! I made an assumption in my example: I thought you meant that you wrote files to the dataset and then created these “10-year lifespan” snapshots.

To be clear: You’re not actually “using” or writing files to this dataset? Only creating the 10-year snapshots?