Replication Error with recursive dataset

Hello

I have an issue with replication with recursive dataset

The problem happen each time in the same scenario, otherwise everything’s fine.

My setup is 2 truenas core, one replicating on the other each night.

I have a dataset named “EN COURS” containing multiple dataset (one level only) for each project we work on.

all this is snapshot-ed each morning at 4am locally and replicated on the other server

Yesterday we deleted some dataset as those project were over but “BUSSANG” cannot be deleted.
First of all it gives an error : “ Error deleting dataset BUSSANG.

[EFAULT] Failed to delete dataset: cannot unmount ‘/mnt/RZ_11X3_1S/EN COURS/BUSSANG’: pool or dataset is busy

Some other dataset has been deleted without issue yesterday but this one is impossible to delete. Maybe some file might be busy inside. It’s always a big amount of data so delete directly the dataset without emptying it as it would take ages to delete manually

Meanwhile all snapshot and shares seams to have been deleted as there is not any snapshot left for this dataset and share has disappeared.

First : Is there a way to force delete a dataset ? what is this error about ?

After that when the replication of “EN COURS” start with all its sub dataset, it fail and nothing is replicated. the other dataset with no error are not replicated also on the other server. whereas Local Snapshot has been done correctly (so i have snapshot for each dataset on the “Feb 18 4am” locally but on the replicated server they stop on “feb 17 4am”

Error in the notification tray :

CRITICAL

Replication “RZ_11X3_1S/EN COURS - RZ2_1X14_1S/MAGENTA_BACKUP/EN COURS” failed: No incremental base on dataset ‘RZ_11X3_1S/EN COURS/BUSSANG’ and replication from scratch is not allowed..

2026-02-18 11:14:13 (Europe/Paris)

If I try manually now :

Log :

[2026/02/18 11:14:09] INFO [Thread-1949] [zettarepl.paramiko.replication_task__task_2] Connected (version 2.0, client OpenSSH_8.8-hpn14v15)
[2026/02/18 11:14:09] INFO [Thread-1949] [zettarepl.paramiko.replication_task__task_2] Authentication (publickey) successful!
[2026/02/18 11:14:13] INFO [replication_task__task_2] [zettarepl.retention.calculate] Not destroying ‘auto-2025-05-14_04-00’ as it is the only snapshot left for naming schema ‘auto-%Y-%m-%d_%H-%M’
[2026/02/18 11:14:13] INFO [replication_task__task_2] [zettarepl.retention.calculate] Not destroying ‘auto-2025-05-14_04-00’ as it is the only snapshot left for naming schema ‘auto-%Y-%m-%d_%H-%M’
[2026/02/18 11:14:13] INFO [replication_task__task_2] [zettarepl.retention.calculate] Not destroying ‘auto-2025-05-14_04-00’ as it is the only snapshot left for naming schema ‘auto-%Y-%m-%d_%H-%M’
[2026/02/18 11:14:13] INFO [replication_task__task_2] [zettarepl.retention.calculate] Not destroying ‘auto-2025-05-14_04-00’ as it is the only snapshot left for naming schema ‘auto-%Y-%m-%d_%H-%M’
[2026/02/18 11:14:13] INFO [replication_task__task_2] [zettarepl.retention.calculate] Not destroying ‘auto-2025-05-14_04-00’ as it is the only snapshot left for naming schema ‘auto-%Y-%m-%d_%H-%M’
[2026/02/18 11:14:13] INFO [replication_task__task_2] [zettarepl.replication.pre_retention] Pre-retention destroying snapshots:
[2026/02/18 11:14:13] ERROR [replication_task__task_2] [zettarepl.replication.run] For task ‘task_2’ non-recoverable replication error NoIncrementalBaseReplicationError(“No incremental base on dataset ‘RZ_11X3_1S/EN COURS/BUSSANG’ and replication from scratch is not allowed”)

The problem now is that replication is broken. Last time I had to modify the replication task to backup from scratch but it is hundreds of terabytes in one run …

Does that mean that as long we have those undeletable dataset arround replication is broken ?

Is there a force delete dataset method ?

Thanks

Bonjour Nicolas,

Your replication seems to fail because it only sees one snapshot, and I would think, this particular snapshot doesn’t exist on the remote system. As such it is trying to perform replication of the dataset but it is not allowed to do that. By the way, for incremental replication to take place, you need at least 2 snapshots, one being present both on the source and on the remote side.

".. and replication from scratch is not allowed”

I think the reason you are not able to destroy the dataset is due to the replication task placing a flag (snapshot hold) preventing deletion of snapshots from a particular time.

Because the issue is related to the replication task, can you edit the task itself and see if you can exclude the “EN COURS” dataset?

I am wondering if the replication didn’t have time to terminate and you decided to delete the dataset in the process.

Hello

in fact What I see is that all snapshot on the source but the actual files have been deleted on the source but not on the destination of course (hopefully)

Still all the actual files of the BUSSANG dataset are still on the main server.

I will try to deleted manually the file in the BUSSANG dataset with winscp to check if I can delete the dataset. Without the dataset on the main server, any reference to BUSSANG should be removed from the replication task

But the impossibility to force a dataset deletion is very painfull

Thanks

I will try to deleted manually the file in the BUSSANG dataset with winscp to check if I can delete the dataset. Without the dataset on the main server, any reference to BUSSANG should be removed from the replication task

You don’t need to delete any files in a dataset to be able to delete a dataset.

See if you have any “hold” on the snapshot.

https://docs.oracle.com/cd/E19253-01/819-5461/gjdfk/index.html

Even if there is no snapshot on this server for this dataset ?

(Storage/Snapshot show no snapshot left for BUSSANG Dataset)

Thanks

Even if there is no snapshot on this server for this dataset ?

(Storage/Snapshot show no snapshot left for BUSSANG Dataset)

If there are no snapshots, then you can’t check for “holds”.

I don’t use the “replication” feature in TrueNAS Core as I handle my replication using a script I created, which I manually run within CLI.

I think we should focus on the reason why the dataset cannot be destroyed.

You mentioned this:

Meanwhile all snapshot and shares seams to have been deleted as there is not any snapshot left for this dataset and share has disappeared.

Can you elaborate whether the shares disappearing is the result of manual action or something made them disappear without reason?

What kind of share where you using, NFS, SMB…?

Are you using iocage jails or applications that required access to the dataset?

Are you using SSH to access CLI on your main server? If so, is it possible you are located in the BUSSANG folder?

I’ve just tried again to delete the dataset (without manually deleting anything) and it worked fine now…

access to those dataset are done over SMB so it might be that a file was “locked” by anything for a certain time after been disconnected.

I have a few other dataset to clean up and I’ll try again the replication.

What is great in recursive dataset replication is that once i’ve set up the task everything is automatic when I add or remove new datasets. I don’t have to reconfigure for each new dataset.

Thanks

I did some testing on my end on my TrueNAS Core 13.3-U1.2:

  • I create some nested datasets
  • created a few recursive snapshots.
  • Enabled SMB on one of the dataset.
  • Mapped SMB share in Windows 11 as a network drive.
  • Copied one file to it with explorer, and showing its content in explorer.

Then I deleted the dataset using the GUI. It took a bit of time but it was successful.

This is what TrueNAS did:

  • Under “Windows Shares (SMB)”, the share has been removed from the list.
  • The dataset has been deleted and no longer show under the Pools list.
  • In Windows 11, the File explorer was still pointing to the share, trying to refresh its content returned and error (expected behaviour). The share is no longer listed inn Windows.

So, my take on this, TrueNAS is able to delete a dataset even the SMB share is enabled. This is non blocking.

As you have stated, a file lock could have been the cause of it, though I suspect trying to delete a dataset while the snapshots are being deleted could also be the cause of it.

I let the snapshot and replication process occur during the night, I’ll see if anything ggot wrong in the morning.

Yes normally when you delete a dataset it delete the snapshots and the relative SMB share.

Our SMB client are Macs and I know that the Finder is a bit of a __ and create much problem. I had so much time to get everything work in SMB because of that. We’ve spent some time with guys from Ix to help achieve good performance.

I might upgrade one day to scale but no time for now

Thanks for your time

Hi

So it was the failed dataset that was causing the replication error. After removing it correctly yesterday the replication task went fine this night.

In fact when you try to delete the dataset and it fail all the snapshots are deleted (but not the actual data) So when the replication task happen it should be able to create an incremental snapshot but as there are no previous snapshot left it fails. And so the whole task fail.

The issue is that we should be able to force delete the dataset even if locked files or whatever.

Thanks