[Pratfall] HELP! Disappearing files from boot pool and datasets - errors PAM, nginx, more - Scale 25.x

boolie · November 28, 2025, 7:45pm

Something really weird/scary keeps happening to my system. I first saw it after updating to 25.10.0.1, so I reverted to 25.10.0 which didn’t fix it. I then rebuilt from scratch on 25.04.2.6 and it was behaving fine while I configured everything.

At some point during the later stages of configuration and rsync’ing data to a couple of folders to fix some files that had disappeared from there, something ditched a load of files from /etc, causing PAM to error (→ no SSH, no web UI shell, all jobs fail). Rebooting threw up further errors from nginx (→ no web UI) and multiple other boot errors, all relating to missing configs in /etc.

I’ve just rolled back to an earlier snapshot of all the boot-pool datasets, rebooted and it’s back up and running.

I can pin down the most recent time the /etc problem happened to within about 10 minutes. There’s nothing obvious in syslog, just a raft of complaints from processes that can’t open a shell. /var/log/error matches that with a load of PAM errors. Journal, nothing interesting.

With the files disappearing from other datasets, it’s a bit random - never folders, only lowest-level files. Not all files; one folder has a XLS but is missing a load of DOCs.

This is a real pain, a major issue and potential for data loss.

Does anyone have any ideas what could be causing this or where else to look for clues? Thanks in advance!

System - i5/10th, 32GB. Boot is a nVME, the data volume that also has disappearing files is a mirror of two 2TB SATA SSDs.

boolie · November 29, 2025, 9:24am

This is utterly bizarre. I’ve just come back to the system after leaving it last night with about 700GB of data on my main mirror. There is now less than 200GB of data there. Whole folder trees are empty of files, just the folder structure left. In the boot-pool, a load of files have gone missing from /etc again. PAM errors, etc are all back. Checking my other disks, I’m also missing a ton of files from the simple 1-disk pool; again, folders are there, files are not and ‘du‘ reports much less than ought to be there. That volume doesn’t even have snapshots scheduled yet, it’s the simplest pool possible.

Looking at the datasets, those where I took a manual snapshot after re-loading the data yesterday look like this: USED 5.4G USEDSNAP 5.4G USEDDS 8M REFER 8M WRITTEN 5.4G (should have 5.4G of data). I’m not very knowledgeable about ZFS, but that looks to me like a massive deletion of files happened. /etc is similar, goes from 7M of data on an early snap to 5M snap + 2M data now. Restoring snapshots recovers all the missing files from those datasets.

The only job scheduled to run overnight was an rsync out from an unaffected pool on a different disk. Scrubs are switched off. At first sight, nothing informative in the logs. I’ve downloaded the whole of /var/log to my PC for closer review. ‘zpool status -v’ shows no errors anywhere.

What the heck is happening here?!

kricka-kracka · November 29, 2025, 9:42am

I’ve never heard of TrueNAS behaving like that, and I’m afraid I can’t offer much direct help here.
But maybe you could share some more details about your setup (is TrueNAS running on bare metal, etc.?).
With a bit more information, other users here might be able to piece together what’s going on.

geri91 · November 29, 2025, 11:08am

Any chance you have an smb client infected? Unplug your server from the network. Access only from the console. See what happens.

boolie · November 29, 2025, 11:11am

OMG. I think I’ve found it, thanks to the job logs. A cron job ran at about 23:45, that’s supposed to clear out old backups from a particular folder. Lots of errors in its log about being unable to delete read-only files from /usr… which it shouldn’t be looking in!
Looks like its environment maybe went gaga and the folder to delete from wasn’t defined. The effect of running ‘find -mtime +30 -exec rm -f’ without a path, I dread to think, but I can guess!! ‘Dear linux, please remove all files over 30 days old from my entire system’?

LarsR · November 29, 2025, 11:13am

maybe it was looking for the french language pack…

boolie · November 29, 2025, 11:15am

You know what, I’ve never seen that happen in all my years with Linux. I would’ve thought that ‘find’ would error if it’s not given a path, but apparently not so, it just helpfully assumes the CWD I guess!

Once I finish laughing, I have some very long rsync jobs to run! At least I know my backups work, they’ve had a good test the last few days.

boolie · November 29, 2025, 11:28am

The old French language pack, presumably… -mtime +30

winnielinnie · November 29, 2025, 2:01pm

There has to be a safer way to do that…

Topic		Replies	Views
Full server - cannot get space back TrueNAS General	20	373	August 30, 2024
Major stability issues after moving to Dragonfish TrueNAS General SCALE , Networking , Hardware , ZFS	37	958	July 22, 2024
The curious case of the extra drive - Boot drive appearing as unassigned in TrueNAS UI TrueNAS General SCALE	5	241	May 7, 2024
Pool unhealthy, no idea what's going on: One or more devices has experienced an error resulting in data corruption TrueNAS General SCALE , Hardware , SMB , ZFS	5	287	September 16, 2025
TrueNAS SCALE system random shutdowns TrueNAS General SCALE	10	502	April 16, 2025

[Pratfall] HELP! Disappearing files from boot pool and datasets - errors PAM, nginx, more - Scale 25.x

Related topics