Corruption of file on boot pool

I updated from 24.10.2 to 24.10.2.2 a couple of days ago, and all seemed fine. I received an email alert this morning - upon running of a scrub, I believe - containing:

Boot pool status is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected…

$ sudo zpool status -v boot-pool
  pool: boot-pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:15 with 1 errors on Mon Jul 14 03:45:16 2025
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0    26

errors: Permanent errors have been detected in the following files:

        boot-pool/ROOT/24.10.2.2/usr@pristine:/lib/x86_64-linux-gnu/libclang-14.so.14.0.6
        /usr/lib/x86_64-linux-gnu/libclang-14.so.14.0.6

My boot pool/TrueNAS installation is running on a single 128GB SSD M.2 Patriot P300.

A long/extended SMART test of the disk, run today, succeeded with no errors.

I don’t have an ECC RAM system. The RAM passed memtest before installation.

My system:
ASUS Prime B660-PLUS D4 Motherboard
Intel Core i3-12100
Lexar Thor DDR4 RAM 64GB (16GB x 4) 3200 MHz
Patriot P300 M.2 PCIe Gen 3x4 128 GB SSD
CORSAIR RM650 ATX 650W Fully Modular Power Supply


I’ve had a conversation with Perplexity and done a little exploration, and I’ve gathered that:

  • this file may not be essential to the every-day running of TrueNAS, but, for all I know, it may cause an issue if an update or some script requires use of it;
  • it is corrupt in the active system - not just in a snapshot (the snapshot was taken a minute after the boot into the newly-installed updated installation (which is still the current boot); $ sha256sum libclang-14.so.14.0.6 gives sha256sum: libclang-14.so.14.0.6: Input/output error)
  • I would rather not reinstall TrueNAS on a new system disk and restore settings/import pools at the moment (though I have taken a new config backup), so perhaps I could try to restore the corrupted file;
  • The most straightforward way of restoring the file seems to be initiating a reinstall of the appropriate package (apt install --reinstall libclang1-14 according to Perplexity, but I’ll check), but that requires enabling developer mode in order to use apt, which is “unsupported”, but I figure I may as well go ahead - hopefully merely reinstalling an already-installed package would not mess anything up.

Before I take any action, I thought I’d ask for reassurance/guidance. Given 26 checksum errors, should I be looking to reinstall on a new M.2 SSD soon anyway (it only cost about €14), or could the errors be down to something other than the disk?

Perhaps simply doing nothing until I have the time to put aside to reinstall and restore the config (and subsequently update to 25.04 and see about a mirrored boot pool…) would be safest?

There haven’t been scrubs of the two data pools since the update. (Should I run scrubs now?) The current status:

  pool: platter01
 state: ONLINE
  scan: scrub repaired 0B in 03:34:53 with 0 errors on Sun Jun 29 03:34:54 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        platter01                                 ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            2989ca31-7275-4d5a-92dc-f99ba81bd144  ONLINE       0     0     0
            049cfda9-8cd1-4c18-a916-3af939946f16  ONLINE       0     0     0
            7c66c77a-d815-4571-b997-5172084958ac  ONLINE       0     0     0

errors: No known data errors

  pool: platter02
 state: ONLINE
  scan: scrub repaired 0B in 06:43:15 with 0 errors on Sun Jun 22 06:43:17 2025
config:

        NAME                                    STATE     READ WRITE CKSUM
        platter02                               ONLINE       0     0     0
          2d30e5ec-9723-4331-86b6-ef08aae69bdc  ONLINE       0     0     0

errors: No known data errors

(platter02 has only unimportant data.)

Many thanks in advance!

Could be a failing NVMe or your temps are too high.

Does this NVMe have a heatsink or heat spreader?

Since it’s the boot-pool, and you don’t mind replacement, no need to spend too much time figuring out if it’s worth replacing the boot device.

Download a copy of 24.10.2.2 and then use it for a manual update (from 24.10.2.2 to 24.10.2.2).

This will create a new boot environment where this file will hopefully NOT be corrupt.

1 Like

I don’t think [it has a heatsink/spreader]. It’s supposed to have thermal throttling.

I’ll look into doing that.

Thanks.

The GUI isn’t always the greatest at reporting the temps for NVMe devices.

Use the nvme command instead.

nvme smart-log /dev/nvmeX

Thanks.

Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning                        : 0
temperature                             : 28°C (301 Kelvin)
available_spare                         : 100%
available_spare_threshold               : 5%
percentage_used                         : 2%
endurance group critical warning summary: 0
Data Units Read                         : 308135 (157.77 GB)
Data Units Written                      : 655231 (335.48 GB)
host_read_commands                      : 1745426
host_write_commands                     : 34978451
controller_busy_time                    : 46
power_cycles                            : 15
power_on_hours                          : 5005
unsafe_shutdowns                        : 6
media_errors                            : 0
num_err_log_entries                     : 0
Warning Temperature Time                : 0
Critical Composite Temperature Time     : 0
Temperature Sensor 1           : 42°C (315 Kelvin)
Thermal Management T1 Trans Count       : 0
Thermal Management T2 Trans Count       : 0
Thermal Management T1 Total Time        : 0
Thermal Management T2 Total Time        : 0

I gather that Perplexity is one of 'em new fangled AI (Artificial Idiot) things - so I am not surprised that what it told you was crap - for example as you pointed out you cannot use apt with TrueNAS without enabling developer mode (which you should not do).

1 Like

Temps look fine, but the NVMe already had to tap into using 2% of its spare cells, with only 335 GB written in its life. Combined with the 26 checksum errors from a scrub, I would consider replacing the boot drive.

EDIT: This is a good time to export your TrueNAS config file, in case you need to use it after a fresh installation. You should be backing up your config regularly anyways.

2 Likes

Apologies - I now have two files, but I’m not sure how to go about doing a manual update.

1.8G TrueNAS-SCALE-24.10.2.2.iso
1.5G TrueNAS-SCALE-24.10.2.2.update

The System > Update > Manual Update page in the web UI has the following tooltip for the update file field:

The file used to manually update the system. Browse to the update file stored on the system logged into the web interface to upload and apply. Update file names end with -manual-update-unsigned.tar

Despite that, should I upload the ‘TrueNAS-SCALE-24.10.2.2.update’ file? (All 1.5GB of it through the web UI?)

Edit: I got the ‘update’ file from this blue box:

yes - that is the file - TrueNAS-SCALE-24.10.2.2.update

Then you go to the System/Update screen:

and click the Install Manual Update File button:

save a backup of your configuration file with Password secret seed (just in case), and choose the file and Apply Update from the Manual Update screen:

I don’t know where you got the following quote from:

but I suspect that it is well out of date.

1 Like

I wonder whether autotrim is set on the boot-pool.

Okay - thanks for removing doubt about the process. I have a long SMART test running on another drive at the moment, but plan to attempt the “update” one day this week.

Understood. As mentioned, it’s the tooltip help which appears on hover of the question mark in a circle next to the file field, at least in this version.

autotrim is off. Perhaps I should add a weekly cron task to trim?

You could, but 26 checksum errors is not likely because of lack of trimming.

1 Like

Apologies for not quoting what I was responding to.

I was wondering whether your comment about using spare cells could have been caused by a lack of trimming.

I don’t think so. As far as I know, trimming is about performance and potential long term health. At only 335 GB’s of total writes, this NVMe hasn’t really seen much I/O to already start manifesting problems.

FWIW, I do have a weekly cron task that issues zpool trim to my SSD pool and boot-pool. I keep “auto trim” disabled on all pools.

@DavidO You said earlier about the low cost of replacing the NVMe. If you keep getting checksum errors after completing boot-pool scrubs, it might be worth it to just replace the drive, install fresh, and upload your TrueNAS config.

You won’t even “lose” your original boot-pool, since you’ll still physically have it, should anything go wrong with the replacement drive.

1 Like

Thanks, both. I’ve ordered a Lexar NM620 256GB M.2 SSD. I plan to do a fresh install on it and restore my config before doing anything to the Patriot that’s already running TrueNAS. If the installation goes okay, I’ll probably temporarily swap back to the Patriot and try the “manual update” to fix the corrupted file, and then hopefully keep the Patriot as a spare boot drive.

Quick question regarding the ASUS Prime B660-Plus D4.

Does TrueNAS work with the 2.5Gbps Realtek adapter one the ASUS Prime B660-Plus D4?

Yes. I’m getting around 275MiB/s (Samba) and 280MiB/s (NFS) when copying from the NAS to my computer.