Get Confused with Sanoid Replication Problem

Hi there,

Here are my environments:

  • My backup server: TrueNAS SCALE, ElectricEel-24.10.1
  • My other servers: Debian 12 bookworm, Linux 6.1.0-29-amd64 with OpenZFS 2.2.7

I take snapshots on my Debian servers using Sanoid, a Perl-based tool developed by Jim Salter. Sanoid.

I’m having some issues these days. When I replicated the snapshots on my Debian server using TrueNAS SCALE’s Data Protection > Replication Tasks function, it sometimes failed because Zettarepl thought some snapshots should be earlier than others. We can get a precise snapshot chronal sequence by invoking zfs list -t snapshot, so this issue can be avoided.

This is the task log:

[2025/01/12 14:01:56] INFO [Thread-67] [zettarepl.paramiko.replication_task__task_25] Connected (version 2.0, client OpenSSH_9.2p1)
[2025/01/12 14:01:56] INFO [Thread-67] [zettarepl.paramiko.replication_task__task_25] Authentication (publickey) successful!
[2025/01/12 14:01:57] INFO [replication_task__task_25] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots: []
[2025/01/12 14:01:57] INFO [replication_task__task_25] [zettarepl.replication.run] For replication task 'task_25': doing pull from 'zroot' to 'DapuStor_R5100_RAID-Z1/machines/r740-debian' of snapshot='autosnap_2025-01-12_06:00:01_hourly' incremental_base='autosnap_2025-01-12_05:00:04_hourly' include_intermediate=False receive_resume_token=None encryption=False
[2025/01/12 14:01:57] INFO [replication_task__task_25] [zettarepl.paramiko.replication_task__task_25.sftp] [chan 5] Opened sftp connection (server version 3)
[2025/01/12 14:01:57] INFO [replication_task__task_25] [zettarepl.transport.ssh_netcat] Automatically chose connect address '10.2.1.33'
[2025/01/12 14:02:02] INFO [replication_task__task_25] [zettarepl.replication.run] For replication task 'task_25': doing pull from 'zroot' to 'DapuStor_R5100_RAID-Z1/machines/r740-debian' of snapshot='autosnap_2025-01-12_06:00:01_monthly' incremental_base='autosnap_2025-01-12_06:00:01_hourly' include_intermediate=False receive_resume_token=None encryption=False
[2025/01/12 14:02:02] INFO [replication_task__task_25] [zettarepl.transport.ssh_netcat] Automatically chose connect address '10.2.1.33'
[2025/01/12 14:02:05] ERROR [replication_task__task_25] [zettarepl.replication.run] For task 'task_25' unhandled replication error SshNetcatExecException(ExecException(1, 'WARNING: could not send zroot@autosnap_2025-01-12_06:00:01_monthly:\nincremental source (zroot@autosnap_2025-01-12_06:00:01_hourly) is not earlier than it\nWARNING: could not send zroot/home@autosnap_2025-01-12_06:00:01_monthly:\nincremental source (zroot/home@autosnap_2025-01-12_06:00:01_hourly) is not earlier than it\nWARNING: could not send zroot/ROOT@autosnap_2025-01-12_06:00:01_monthly:\nincremental source (zroot/ROOT@autosnap_2025-01-12_06:00:01_hourly) is not earlier than it\nWARNING: could not send zroot/ROOT/debian@autosnap_2025-01-12_06:00:01_monthly:\nincremental source (zroot/ROOT/debian@autosnap_2025-01-12_06:00:01_hourly) is not earlier than it\nno error\n'), ExecException(1, 'cannot receive: failed to read from stream\n'))
Traceback (most recent call last):
... 29 more lines ...
zroot@autosnap_2025-01-12_06:00:01_hourly or delete snapshot
zroot@autosnap_2025-01-12_06:00:01_monthly from both the source and destination.
WARNING: could not send zroot/home@autosnap_2025-01-12_06:00:01_monthly:
incremental source (zroot/home@autosnap_2025-01-12_06:00:01_hourly) is not earlier than it
WARNING: could not send zroot/ROOT@autosnap_2025-01-12_06:00:01_monthly:
incremental source (zroot/ROOT@autosnap_2025-01-12_06:00:01_hourly) is not earlier than it
WARNING: could not send zroot/ROOT/debian@autosnap_2025-01-12_06:00:01_monthly:
incremental source (zroot/ROOT/debian@autosnap_2025-01-12_06:00:01_hourly) is not earlier than it
no error
Active side: cannot receive: failed to read from stream

Obviously, the monthly one is earlier than the hourly one but TrueNAS zettarepl claims that the hourly one should be the incremental stream’s source. It is definitely wrong!

cat /etc/sanoid/sanoid.conf

# My sanoid config slice
[zroot]
	recursive=zfs
	hourly=48
	daily=14
	autosnap=yes
	autoprune=yes

Is there a method to adjust Zettarepl’s sending policy?


Here are some unpleasant experiences with Sanoid/TrueNAS and I have the solution already, put them here.

  • Creating recursive snapshots using Sanoid config like recursive = yes will make TrueNAS unable to deal with that. You should use recursive = zfs in Sanoid’s config alternatively.

Now I find a interesting method to solve it:

Turning from naming schema to naming regular expression:

autosnap_\d{4}-\d{2}-\d{2}_\d{2}:\d{2}:\d{2}_(hourly|daily|monthly)

It worked.

zettarepl (TrueNAS’ snapshot and replication tool) seems designed to be used in a very specific manner. In fact, I would go as far as to say that it’s really only meant to be used between TrueNAS servers, but it just happens to also “work” with other servers that run ZFS.

Unlike simple replications (which is what I do in the command-line), zettarepl seems to do this “passing the baton” method. It issues multiple sends/receives that the user “sees” as a single replication run.[1]

Instead of this, as a single run:

  • send A → F | recv
    Everything in between (B, m1, C, m2 D, m3, E) are also sent over.

TrueNAS does this, as multiple “back to back” runs:

  • send A → B | recv
  • send B → C | recv
  • send C → D | recv
  • send D → E | recv
  • send E → F | recv
    The "mN" snapshots will not be included in the destination.

This is why a Replication Task needs you to specify a “Snapshot Task”, “naming schema”, or “pattern”. It is a mandatory requirement.

For vanilla ZFS command-line replications, you can arbitrarily choose a first and last snapshot to send incrementally, and whether you want to include all intermediary snapshots in between, regardless of their “names”.

zettarepl, however, really prefers if you treat “Periodic Snapshots” and “Replication Tasks” as an integrated solution. “You have these automatically created snapshots with a certain timestamped name on this pool? Then please use our replication tool to send these specific snapshots, and only these snapshots, to another destination.”


  1. For incremental replications. ↩︎

I’m curious, do these snapshots share the same TXG?

zfs get createtxg zroot/ROOT@autosnap_2025-01-12_06:00:01_monthly

zfs get createtxg zroot/ROOT@autosnap_2025-01-12_06:00:01_hourly

Yes they are.

zfs get createtxg zroot/ROOT@autosnap_2025-01-12_06:00:01_monthly; zfs get createtxg zroot/ROOT@autosnap_2025-01-12_06:00:01_hourly
NAME                                             PROPERTY   VALUE      SOURCE
zroot/ROOT@autosnap_2025-01-12_06:00:01_monthly  createtxg  286215     -
NAME                                            PROPERTY   VALUE      SOURCE
zroot/ROOT@autosnap_2025-01-12_06:00:01_hourly  createtxg  286216     -

Thanks for this detailed explanation! It helps me to understand the backup logic.

The “passing the baton” method is more friendly to those who want to send a subset of all the snapshots in the sender system. The problem may be due to the implementation. If I create three naming match schemas, such as

  • autosnap_%Y-%m-%d_%H:%M:%S_hourly,
  • autosnap_%Y-%m-%d_%H:%M:%S_daily,
  • autosnap_%Y-%m-%d_%H:%M:%S_monthly,

TrueNAS Zettarepl should generate an ordered array of snapshots based on the output of zfs list -t snapshot, but it creates an incorrect order or adds unnecessary limits by considering the literal naming of snapshots.

I believe the order-generating algorithm should be simple. Anyway, Sanoid or Zettarepl, or any of them should be changed to make their work seamless.

Reference:

I wouldn’t say it’s a “bug” in either TrueNAS or Sanoid. They simply have different (incompatible) ways that they approach snapshotting and replication.

1 Like

I believe it is my fault :smiling_face_with_tear:. I shouldn’t say “[Bug] TrueNAS Refuse to Receive Snapshots created by Sanoid.” Maybe the word “bug” made Jim the developer feel uncomfortable because it is incompatibility rather than a bug.

1 Like

Jim also says this:

This is true. Another property is the TXG, which follows the same chronological order as the creation date.

The “problem” is that zettarepl (TrueNAS) intentionally ignores any snapshot metadata.

zettarepl is strictly a name-based tool, that requires “paresable” names in order to find and work with patterns.

iXsystems’ reasoning for this is that parsing snapshot names is much faster than reading and comparing snapshot metadata, especially if it’s a very long list of numerous snapshots.

I suppose an additional reason for this is that it is more intuitive (for the users) if a snapshot’s name contains a timestamp that coincides with its creation, and to just go ahead and use the pattern in the name for any scripting or background tasks. (Rather than rely on the “invisible” metadata that might not always be obvious or seen by the user.)

1 Like