[Moved to Roadmap] Ability to cancel in-progress replication via GUI

Problem/Justification

Sometimes it’s desirable to cancel/abort a replication that is currently in progress. AFAIK, there is no way to do this in the GUI.

Additionally, it can be quite difficult to do it in the shell, requiring killing zetarepl middlewared processes, potentially multiple times.

Today this happened to me again. [when the original feature-request/bug was reported

Impact

The reason I wanted to cancel the replication was that the wizard mistakenly included many more datasets than selected when cloning a replication task (a separate issue), but the once off replication was automatically started and it was going to replicate many TBs of data, when it was desired to only replicate GBs.

User Story

Ideally, there would be a way to cancel/abort a running replication, perhaps an :cross_mark: in a suitable location.

1 Like

Original feature request post:

Tagging some fans of the original request… who might want to vote for this one :wink:
@sfatula @awasb @Johnny_Fartpants @neofusion

3 Likes

Do you vote by liking the post?

Blue “Vote” box at top of thread. You have a limited number of votes (based on forum status) that you can retract at a later time, or will receive back when a feature is closed (due to won’t implement, or will implement) etc.

2 Likes

Got it thanks.

I am just sadly looking at replication task I want to abort but cant.
Well, I will try shell or something.

I guess my own feature request is very similar. Every task/job should have option to minimize and force cancel
We both want to cancel running task. This thread is only about replication while my thread is more general.

I agree that “ideally” it should be allowed.

The 1st step is to work out how it would be done from CLI/API…

Is there an abort capability in zfs send?

How would the receiving system clean up the mess?
(it would need to be safe for all users)

Kill/interrupt the process, or even just the ssh/etc connection and it stops just fine.

The trick is to know the process id.

Is the any clean up needed for the data… the same task can be run again later?

Does the middleware know about process_id?

Don’t believe any clean up is required. Its the same thing that happens when an internet link fails during a replication, or there is a sudden power loss on either end.

When this happens, the replication task errors out… and has a red error flag.

Re-running the task will continue from the snapshot it was transferring when it failed.

1 Like

I do occasionally get this error albeit on TN CORE systems.

cannot receive resume stream: destination … contains partially-complete state
from “zfs receive -s”

The fix is to jump on the receive system and fire off:

zfs recv -A pool/dataset

and that seems to fix things.

Perhaps this is where TrueNAS Connect could come into play having access to both systems.

You wouldn’t even need that. If you really wanted to gate killing a zettarepl job behind an arbitrary requirement, you could just expose the abort option only on jobs to a TrueNAS destination.

That would be absolutely useless for those of us in mixed environments. Personally, I have only one TrueNAS system among a small fleet of ZFS hosts, replicating between each other.

In terms of “keeping it safe for all users” (“safe” is doing a lot of heavy lifting there) — you might need to require some kind of permissions check for non-TrueNAS destinations. I’m going to experiment with this because I don’t know the answer off the top of my head: is it possible to do “zfs receive -A” if the job credential’s user on the target does not have ZFS destroy permission? That permission doesn’t only apply to “zfs destroy” literals but also to some other destructive subcommands within ZFS. Replications to targets that don’t give TrueNAS blanket sudo access could be impacted.

I feel like the days of default sudo access for replication are nearly over and using ZFS allow is a much more secure option. Perhaps a different feature request for another day.

1 Like

Great news! This feature request has been transferred to our internal roadmap for detailed scoping and assessment.

What this means:

  • Our product and technical documentation team will now evaluate technical feasibility, resource requirements

  • This request is now closed for voting, and your votes have been released back to use on other requests

  • Please note that transfer to the roadmap doesn’t guarantee implementation - some features may not proceed based on our assessment findings, however we will update this thread once a decision has been made.

Thank you to everyone who voted and contributed to the discussion. Your input has been invaluable in helping us understand the community’s needs and use cases for this feature.

6 Likes