Anyone have experience with healing option of zfs send/receive

Kevo · October 8, 2024, 7:29pm

I have a singular backup drive attached to my Core 13.3 system. It is setup as a clone of the main array for backup purposes. The drive is over 6 years old and recently started producing some checksum errors during scrubs.

I am planning to replace it, but I wanted to see if I could repair the damaged files on it before removing it.

I have seen mentioned a relatively new feature that is supposed to heal damaged clones or snapshots on receive when a good source is available. Since my main pool is fine I though I would try it, but I could not find an example that worked for me.

If anyone has successfully used that option could you give me some pointers? Right now I have two pools, tank and backup. To clone tank to backup I just setup a replication task from tank to backup with the recursive option.

Ideally I’d like to figure out how to tell backup to receive any damaged files from tank to heal the replica, but my efforts so far have all resulted in error messages and I think I must have some fundamental misunderstanding about how the command should operate.

Stux · October 8, 2024, 9:54pm

I think at this stage, you would have to send the entire replication stream anyway as there is no way to request just the bad blocks.

Ergo, if it’s a backup, you may as well just redo the backup.

winnielinnie · October 8, 2024, 10:05pm

It might be possible with some sort of ZFS “devtools”. (Referencing the other thread.)

The new option you’re referring to is -c, but as @Stux said, there’s no feasible way (that we know of) to only curate the bad blocks from the sender’s side.

Kevo · October 8, 2024, 10:19pm

So as best I can tell, the errors are the result of one bad file which happens to exist in the full two weeks worth of snapshots I retain. Would it then be possible to send just the oldest snapshot, which presumably would have the actual file in it, and have the heal option correct that file which would also correct the subsequent snapshots as well since they should reference the same exact file?

winnielinnie · October 8, 2024, 10:35pm

The “corrective receive” isn’t very clear in the documentation (which can be said about many things related to ZFS).

I’m not even sure if invoking -c on the receiver’s side will prevent it from “rolling back” to the specified snapshot.

For example, if you force the destination to receive a replication stream (even incremental), it will destroy all newer snapshots. Will this be irrelevant if you invoke -c? Will it, in a sense, “insert” the stream of blocks, without affecting newer snapshots on the destination?

Who knows. They developers are vague, which is no surprise.

Super helpful, huh?

EDIT: You also have to love this part:

As if we keep a ledger of what files/blocks specifically exist between which snapshots. ^[1]

When the feature was first teased (before Open ZFS 2.2.0 was released), I thought it would be truly fleshed-out. I was under the assumption that it would know which blocks are corrupted, and only send the correct versions of those blocks from a known good source. (Sort of like how a device in a mirror vdev does that within the pool itself.)

But nope, they just started with a good idea and forgot to actually follow through with a practical usage.

Remember, it’s not good enough to know that a block exists in a snapshot. You need to know between which snapshots it was first created. Otherwise, you’ll have to send a full replication stream, instead of a small, incremental one. ↩︎

Kevo · October 8, 2024, 10:53pm

Well, after playing around with the command and trying to figure out what would actually execute, it seems you have to give it a snapshot that exactly matches the snapshot you are trying to repair. After checking it appears that the damaged file is older than all the snapshots as they all are showing almost no space used, so I’m not sure if this is going to do anything at all. I did go ahead and send a snapshot that shows up with the error in the pool status and am receiving it with the -c flag. It will be interesting to see what things look like when it’s done. Given it’s already been running for quite a few minutes I think it’s sending more than just the snapshot changes. Maybe it’s sending everything for that filesystem.

Here’s the command I am trying.

zfs send tank/tmbackup/mbp@auto-20241001.0000-2w | zfs recv -c backup/tmbackup/mbp@auto-20241001.0000-2w

winnielinnie · October 8, 2024, 11:25pm

You just issued a send for a full stream, not an incremental one.

To see how much is being sent for this stream:

zfs list -t snap -o name,refer tank/tmbackup/mbp@auto-20241001.0000-2w

The output will tell you how much data is referenced by that snapshot. So a full stream (of that snapshot), is how much data is being sent.

You can also invoke the -vn flags on the sender’s side (without a companion recv command), and it will tell you how large the sending stream would be. (It’s too late for that, since you already committed to an actual send.)

Kevo · October 9, 2024, 2:07am

Thanks for the hint on using -vn on the send only. I did try it on the recv after reading the docs. I think it would have only returned the info at the end of the dry run. I didn’t want to wait so I killed that and just went for it.

As for the results…

The errors are no longer showing in the status for the backup pool. So it looks like it worked. I am running a scrub now to see if any other errors turn up. If everything is good I’ll probably wait until the weekend to go ahead and replace this backup drive.

winnielinnie · October 9, 2024, 2:32am

Do the snapshots newer than auto-20241001 still exist on the destination?

Kevo · October 9, 2024, 3:29am

All the snapshots are still there. The only change I can see is it now reports no known data errors in status. I won’t know until late tomorrow if scrub returns clean so there could still be issues I don’t know about, but so far so good.

winnielinnie · October 9, 2024, 3:38am

Thank you for ~~serving as a guinea pig~~ giving this a try and sharing your experience.

It’s interesting to see this “working”.

Kevo · October 11, 2024, 5:37pm

Just to follow up with the end result. After finishing another scrub on the “healed” backup drive, I did get 2 checksum errors and status did report there were permanent errors in some files. However, there were no files listed and I could not find any evidence that there were any damaged files with any of the various things I tried to get more info on the reported error. So, I do believe the healing option worked and I think it’s a useful tool to have.

Since I was already pretty sure this drive was having issues and was planning to replace it anyway, I went ahead and replaced it. I plan on keeping it on the shelf for a week or two before I reformat it and throw it in another enclosure and run it through some testing. I might still use it for temporary transfer purposes on non-critical data if it’s not too bad off or if the problem turns out to be the enclosure and not the drive itself.