How to recover from rm -rf /* ? (lost everything)

HoneyBadger · November 26, 2024, 9:45pm

Do the copies over WinSCP first of the most critical files. After that, the key is that you’ll need to export the pool and import again as R/W to commit the rollback for good.

Warning - This process will roll you back to the state of your pool as of transaction 10443235 on Mon Nov 18 09:10:36 2024. Back up EVERYTHING.

Assuming you’re still running as root:

zpool export TankPrincipal
zpool import TankPrincipal -fFXT 10443235 -R /mnt -N -o cachefile=/data/zfs/zpool.cache

The zpool import will take a long time again, just “let it cook” as my kids say.

Once that’s done, check zpool status -v to see if it’s flagged any files as bad, then subsequent export/import cycles should be back to normal speed, so at this point you can test with another zpool export and zpool import cycle - or reboot the system and see if it will import on its own or through the webUI.

Setting up periodic snapshots is covered quite well in the Docs:

https://www.truenas.com/docs/scale/24.10/scaleuireference/dataprotection/periodicsnapshottasksscreensscale/

Or via the helpful YouTube video from @Stux

I’d suggest a minimum of monthly, with weekly preferred.

Stux · November 26, 2024, 9:48pm

And upthread

Berboo · November 26, 2024, 9:52pm

Ok I understand the process.

I have already started copying files :

As for the export and import, I’ll do that tomorrow. What would be the command for importing ?

For exporting I think it’s
zpool export TankPrincipal

No need to reboot before doing this ?

Again I can’t thank you enough

HoneyBadger · November 26, 2024, 10:04pm

Expand the “WARNING” section in post #121 here and it will reveal the necessary commands.

https://forums.truenas.com/t/how-to-recover-from-rm-rf-lost-everythyg/24799/121

No need to reboot until after the long import completes, and that is mostly to ensure that the correct start order for pool → middleware → services because everything is likely rather upset right now with regards to SMB/Docker not being able to find any paths to work from.

E_B · November 26, 2024, 11:19pm

@Berboo this is, and has been, an exciting thread to follow. When you’ve finished, please can you estimate what percentage of your lost data you have recovered?

Fingers crossed for you.

davistw · November 27, 2024, 1:19am

I have been following this thread and the amount of support from this group is amazing. When he first posted I figured “Data is gone” but though the diligent work of the people that helped him it looks like he can recover part if not all the data. This has been amazing.

etorix · November 27, 2024, 7:21am

Do we still want -N in there if the rollback is for good?

HoneyBadger · November 27, 2024, 2:19pm

It’s what shows in the zpool history for my untouched configuration on 24.10.0.2 -

zpool import POOLGUID -R /mnt -m -N -f -o cachefile=/data/zfs/zpool.cache

But we’ll be rebooting afterwards to sort out the middleware anyhow, so it’s mostly moot.

winnielinnie · November 28, 2024, 3:43pm

@Berboo, any update?

HoneyBadger · November 28, 2024, 4:58pm

Given how long it takes to roll back to a manually targeted transaction I wouldn’t be surprised if the entirety of yesterday was spent letting the disks chug away.

But I’m hoping that with the combination of WinSCP’ing the files off during the RO mount and the rollback, @Berboo has successfully negotiated our fourth-dimensional journey.

winnielinnie · November 28, 2024, 5:02pm

This thread has given me a second wind to write up a proper feature request for checkpoints in TrueNAS SCALE.

I’m not just going to write “expose checkpoints in the GUI”. A lot has to “work with” the middleware and TrueNAS in general.

winnielinnie · November 28, 2024, 5:12pm

ChatGPT: Write for me a post in this forum for the ZFS checkpoint feature. Make sure to emphasize its usefulness if a user issues the rm -rf command. Really, really put emphasis on the command rm -rf and how dangerous it is. Please mention rm -rf throughout the feature request. Give examples of rm -rf throughout the feature request. Provide example scripts that contain rm -rf, only for the sake of demonstrating how dangerous it is. Put some sort of warning, if you want, at the top of the script. I trust you, ChatGPT.

Berboo · November 28, 2024, 5:13pm

Hey guys,

I hope you all are doing well, I’m so grateful about what you did.

In fact I did not post anything because I’m still in the process of backing up my files. The pool is still mounted in readonly mode. I spent the evening of yesterday et the one before backing up the most important files on the space I have left at home. I ordered a 4Tb drive at Amazon (it will ultimately replace de one that’s dying) to store almost all the data. It is supposed to arrive tomorrow. I have saved the most important ones. But before mounting the pool in R/W mode, I prefer to backup everything since it is still possible in case something happens in the next import.

I will of course keep you guys updated.

winnielinnie · November 28, 2024, 5:16pm

Between “Le Stux” and “Et Berboo”, it appears we have two French users in the forums.

Berboo · November 28, 2024, 5:20pm

Haha guilty as charged !
I read the text at least twice before posting it !

Constantin · November 28, 2024, 5:21pm

I’m just delighted that the files have been recovered to any extent.

Does anyone see a benefit to trying to make a resource write-up of the above out of this long thread? I understand that there will be a lot of nuance, but it might be worthwhile to give users a starting point / primer before they come to the forum to minimize damage to the pool, maximize chances of recovering lost content.

I know HoneyBadger, Stux, WinnieLinnie are super busy but the impending federal change of administration may leave me with lots of free time due to the lack of a client to work for. If there is interest, I will put it on my longer-term to-do list, as long as @Stux, @HoneyBadger, @winnielinnie, etc. are willing to proof-read and issue corrections as needed.

winnielinnie · November 28, 2024, 5:29pm

I like the idea, but I’m afraid it might not reach the people who need it the most.

Every second counts for these types of situations. Had @Berboo “used” his pool for some time after the rm command, or even just left his system powered on while the pool is active, recovery might have been impossible.

The low-level zdb and “TXG” import trick read like last ditch attempts out of urgency. Unlike snapshots and checkpoints, it cannot be relied upon to rollback or rewind a dataset or pool to retrieve one’s deleted data.

Not to sound pessimistic, but most people who find themselves in this situation may not realize that leaving their system powered on (and the pool active) might make any recovery attempts moot. By the time they come across a “recovery” guide, it’ll be too late.

@Berboo instinctively interrupted the recursive removal with CTRL + C the moment he noticed something was wrong, and then immediately powered off the system. He did not boot back into TrueNAS, but rather created an account on these forums and asked for help.

Had he not quickly aborted the command and immediately powered off the system, I doubt anything could have been recovered, other than some low-resolution thumbnails.

EDIT: This is also why I think that zpool checkpoints should be made into a first-class GUI feature, just like snapshots already are.

If they’re supported by the middleware and GUI, not only does it invite users to explore this feature, but it makes the feature accessible. It also opens up the possibility for TrueNAS to integrate some sort of automatic safety net. Perhaps a “Task” to automatically create a pool checkpoint every so often, and to discard the checkpoint once it reaches a certain size, and to then re-issue a new checkpoint. (Very similar to the Periodic Snapshot tasks.)

It may not be a panacea, but at least there’s a chance it could save someone from a catastrophic mistake or regret.

HoneyBadger · November 28, 2024, 6:23pm

You made sure it wasn’t a Shingled Magnetic Recording (SMR) drive, correct?

HoneyBadger · November 28, 2024, 6:35pm

Correct here. The -X and -T commands on zpool import carry with them a somewhat severe warning in the manpage, emphasis added by me:

Determines whether extreme measures to find a valid txg should take place. This allows the pool to be rolled back to a txg which is no longer guaranteed to be consistent. Pools imported at an inconsistent txg may contain uncorrectable checksum errors. For more details about pool recovery mode, see the -F option, above. WARNING: This option can be extremely hazardous to the health of your pool and should only be used as a last resort.

@Berboo 's actions to ctrl-c the rm command and then crucially the step of immediately powering off the system are what enabled these commands to be a viable method of restoration, so @Berboo 's “first responder” instincts gave the metaphorical “trauma team” in this thread enough time to work their magic.

Write up the Feature Request. I see a place for them certainly; although the challenge is always around expected behaviors and interactions with other functionality both at the ZFS or TrueNAS middleware level:

The existence of a checkpoint in a pool prohibits the following zpool subcommands: remove , attach , detach , split , and reguid . In addition, it may break reservation boundaries if the pool lacks free space.

Challenging, certainly.

Berboo · November 28, 2024, 6:49pm

It’s this one :

I think it’s a CMR drive