Should I switch to rsync?

I have three Scale servers going: my main server (A), a backup server(B) next to it, and a third server(C) on another continent. The A replicates about 6TB of datasets to B and C. The location with A and B has 10mbps upload and so replication to C from scratch would take 100 days or so; C has symmetric 1GB. The files do not change a lot over time, and replication usually takes less than a minute or so.

I’ve now physically relocated to C for the next year and it would be more convenient to work off of C, but any changes I make to a C dataset are not going to transfer to A or B in my setup. I’m wondering if I should change the connection between A and C to rsync (leaving A to continue to replicate to B for backup), or change the direction of replication from A–>C to C–>A. But I don’t want to risk a full filesystem rebuild by tinkering with something that works really well.

What I have now:

graph LR;
    A--> B & C;

Change to

graph LR;
  C-->A;
  A-->B;

or

graph LR;
C<-->A;
A-->B;

Suggestions?

Rsync is great at replicating data in general but it is not nearly as efficient as the zfs send snapshot block sending equivalent. I was amazed how much faster zfs send is than rsync, especially as rsync has to traverse every directory, every file, individually.

I wonder if there is a way to bless the C dataset to make it the master. Maybe the gods of zfs wandering these halls like @kris know of an approach to do that?

Yes, by changing the direction of replication I meant:

  1. Doing one final replication of A to C, then disabling that replication.
  2. Snapshotting the data sets in C, then creating a C → A replication task
  3. Letting A continue to replicate to B.

But I don’t want to risk having #2 involve a full replication, which in that direction might only take a week, but means the A → B replication would also be a full replication and in that time I would not have any redundancy.

It is possible to reverse a replication if you are super careful and ensure you do no writes beyond the last snapshot that was replicated and common between the hosts. But not done very often.

Personally I treat ZFS as one-way replications:

A → B → C

However if you are doing file-only and want flexibility to reverse the stream arbitrarily, you can do rsync, which is heavy to do the re-scans each time. Or alternatively I’d recommend syncthing, which is rsync-like, but can do three-way syncs (or more) and not need to re-scan every file on disk each time. It’ll monitor for changes and only send the blocks which differ.

Syncthing:

    A
   / \
  B - C
5 Likes

I’d probably reverse the replication, your offsite becomes the primary… and the current primary and backup become the offsite and offsite backup…

Right?

Yeah, this is equal parts doable and massively painful to get right (thanks zfs send/recv CLI /s). It shouldn’t be dangerous, though, unless you use the option that rolls back to the last snapshot.

Another way to understand the “rules” of ZFS replication:

  1. If a dataset receives snapshots from another source dataset, then it can never create its own snapshots, and…
  2. A dataset cannot receive from multiple source datasets

Both conditions must be true. Otherwise, you break the replication process.

If both conditions are true, you can have any number or series of datasets in your backup/replication plan.


EDIT: This is one of those things that makes more sense with a chart or diagram.

I wonder if Discourse has a plugin to enable simple flowcharts in a forum post? :grin:

Apparently so, but let’s try to contain the feature requests before you burn out iX’s IT department.

5 Likes

But it would be so useful in the context of a tech forum, especially one that involves NAS, file transfers, backups, storage, snapshots, etc.

I’ll be harassing suggesting this feature request in the other thread.

1 Like

So in my scenario, if a data set has in the past received snapshots from another source dataset, but no longer does, can it thereafter create its own snapshots? Is “never” here “never” or “never while it receives”?

Never while it receives. The datasets are idempotent if and only if they’re on the exact same snapshot with no additional data.

1 Like

It’s not so much new snapshots, but rather new blocks being written.

A desitination nas can make additional snapshots, but in order to replicate to a destination there needs to be a base snapshot in common, and the replication needs to reset/rollback the destination to that base before proceeding.

Thus if you write to the destination after a snapshot is replicated, to continue replicating you need to erase all the writes back to when the common snapshot was.

This is fine if you are going to reverse the replication. And possibly reverse it again in the future.

If I had a flowchart, I could explain it better. If only… :pleading_face:

You mean the flowchart I just enabled about 20 minutes ago?

graph LR;
    A--> B & C & D;
    B--> A & E;
    C--> A & E;
    D--> A & E;
    E--> B & C & D;
2 Likes

In the following examples, the cylinders represent datasets. (They do not represent pools or servers.)




:white_check_mark: This can work :point_down:
Because in the “chain of replications”, there is only one source dataset that has snapshots explicitly created created. The other datasets (along the “chain”) only ever have snapshots “created” by receiving them from a replication. (No snapshots are explicitly created on the datasets themselves.)

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#CCFFCC,color:black
style ds2 fill:#CCFFCC,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)] -->|replication| ds2[(Backup\nRemote)]

us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0

linkStyle 0,1,2,3,4 stroke:green




:white_check_mark: This can also work :point_down:
Because of the reasons explained above, and…
…even though there’s a “fork” in the chain of replication, these incoming snapshots all originate from the original source dataset. No datasets (other than the original source) have snapshots explicitly created on them.

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#CCFFCC,color:black
style ds2 fill:#CCFFCC,color:black
style ds3 fill:#CCFFCC,color:black
style ds4 fill:#CCFFCC,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)] -->|replication| ds2[(Backup\nRemote)]
ds1 -->|replication| ds3[(Friend's\nHouse)]
ds0 -->|replication| ds4[(Backup\nIn-House)]

us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0
linkStyle 0,1,2,3,4,5,6 stroke:green



:x: This will NOT work :point_down:
Because there is a dataset that tries to receive a replication from more than one source dataset. (It is the equivalent of receiving replications from elsewhere and writing new files on the dataset itself.)

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#CCFFCC,color:black
style ds2 fill:#FFCCCC,color:black
style ds3 fill:#CCFFCC,color:black
style ds4 fill:#CCFFCC,color:black
style ds5 fill:#CCFFFF,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black
style us4 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)] -->|replication| ds2[(Backup\nRemote)]
ds1 -->|replication| ds3[(Friend's\nHouse)]
ds0 -->|replication| ds4[(Backup\nIn-House)]
ds5[(Other\nDataset)] -->|replication| ds2

us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0
us4(user or\n system) -..->|create snapshots| ds5

linkStyle 1,4 stroke:red,stroke-width:6px
linkStyle 0,2,3,5,6,7,8 stroke:green



:x: This will also NOT work :point_down:
Because there is a dataset that receives from elsewhere and has its own snapshots explicitly created on itself after you wrote new files to the dataset itself.

flowchart LR

style ds0 fill:#CCFFCC,color:black
style ds1 fill:#FFCCCC,color:black
style ds2 fill:#CCFFCC,color:black
style us1 fill:white,color:black
style us2 fill:white,color:black
style us3 fill:white,color:black
style us4 fill:white,color:black

ds0[(Main\nDataset)] -->|replication| ds1[(Backup\nOffice)]
ds0 -->|replication| ds2[(Backup\nIn-House)]


us1(user or\n system) -..->|create snapshots| ds0
us2(user or\n system) -..->|create snapshots| ds0
us3(user or\n system) -..->|create snapshots| ds0
us4(user or\n system) -..->|write files and\ncreate snapshots| ds1

linkStyle 1,2,3,4 stroke:green
linkStyle 0,5 stroke:red,stroke-width:6px




*I’m still getting a feel for this “mermaid” plugin. It’s not as seamless as you think. :sweat_smile:

I don’t know why it needlessly “crops” the charts. It makes it more difficult to view. :face_with_diagonal_mouth:

2 Likes

Actually, I think you’ll find this works fine.

Ie I have a backup pool which receives replicated datasets, and periodically takes a 10 year retention snapshot. source datasets do not have a long retention

So the “mermaid” plugin is a bit rough on the edges. By quoting my post, it yields a massive scramble of text. :laughing:

On the same dataset? If so, it means the incoming replication will either fail or rollback the target dataset, essentially destroying your “10-year retention, explicitly created” snapshots.

You can’t have it both ways on the same dataset. (Different datasets on the same pool is unrelated to the above situations.)

Incoming snapshots still have a common base as the most recent snapshot on the destination is still present on the source system.

Even if there is an extra long retention snap at the same point which is not present on the source.

Destination diverges if source originated snapshots are deleted and are still present on source OR if new blocks are written, not because additional snapshots are taken on the destination.

You are right! I made an assumption in my example: I thought you meant that you wrote files to the dataset and then created these “10-year lifespan” snapshots.

To be clear: You’re not actually “using” or writing files to this dataset? Only creating the 10-year snapshots?