Help - safe to delete corrupted files

Hi

I recently had some hardware issues with a new PSU and started hard shutdown my machine a few times :man_facepalming:

So obviously i now have some corrupted files on my machine, just looking for advice on if its safe to delete the ones in /var/db/system

root@truenas[~]# zpool status -v
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:24 with 0 errors on Mon Jul 28 03:45:30 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdg3      ONLINE       0     0     0

errors: No known data errors

  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 04:53:52 with 32012 errors on Mon Jul 14 08:00:57 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            f9f1bf24-3c24-401c-b88d-724361ba7a26  ONLINE       0     0     0
            ab767165-11ca-446d-ace9-54bc11064b98  ONLINE       0     0     0
            9392fcf5-a61a-4619-b307-2cc26cf80d1b  ONLINE       0     0     0
            ee87669a-4bf3-4c24-b31e-5ff365a4cd61  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            e47eeee8-c63e-4d27-ada0-36cb5c334bc4  ONLINE       0     0 2.20M
            132be53c-fb0e-416a-b0d0-37577850555b  ONLINE       0     0 2.20M
            20c345eb-61b9-417b-bdc4-feb70e5f80b1  ONLINE       0     0 2.20M
            2f3ccef9-dc86-4ac0-97c9-859862f56f29  ONLINE       0     0 2.20M

errors: Permanent errors have been detected in the following files:

        /mnt/.ix-apps/app_mounts/komodo/pg_data/collection-55-15321617546263701855.wt
        /mnt/.ix-apps/app_mounts/komodo/pg_data/index-56-15321617546263701855.wt
        /mnt/.ix-apps/app_mounts/komodo/pg_data/index-57-15321617546263701855.wt
        /mnt/.ix-apps/app_mounts/komodo/pg_data/index-58-15321617546263701855.wt
        /var/db/system/configs-ae32c386e13840b2bf9c0083275e7941/TrueNAS-25.04.1/20250709.db
        /var/db/system/netdata/dbengine/datafile-1-0000000256.ndf
        /var/db/system/netdata/dbengine/journalfile-1-0000000256.njf
        /var/db/system/netdata/dbengine/datafile-1-0000000255.ndf
        /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf
        /var/db/system/netdata/dbengine-tier1/datafile-1-0000000026.ndf
        /var/db/system/netdata/dbengine-tier1/journalfile-1-0000000026.njf
        /mnt/.ix-apps/docker/volumes/ix-wger_redis-data/_data/appendonlydir/appendonly.aof.4.incr.aof
        /mnt/.ix-apps/docker/containers/123c2904c782d6e701b95d9b350fd574827225191d4c337a97519bb67f90438f/123c2904c782d6e701b95d9b350fd574827225191d4c337a97519bb67f90438f-json.log
        /mnt/.ix-apps/docker/containers/5e63511b4e7bb2bae7fd037973de3395fbc1beff4a9f7cc55a99d9cf81f14927/5e63511b4e7bb2bae7fd037973de3395fbc1beff4a9f7cc55a99d9cf81f14927-json.log
        /mnt/.ix-apps/app_mounts/wger/pg_data/base/16384/2619
        /mnt/.ix-apps/app_mounts/wger/pg_data/base/16384/17197
        /mnt/.ix-apps/app_mounts/wger/pg_data/base/16384/2696

Not sure how to edit, but just before any asks, no i don’t have backups, now i understand why they are important though :smiley:

1 Like

2.2 Million chksum errors - that impressive (in a bad way).

I think you need to fix that issue before worrying about data.

Whats your hardware and how specifically are the disks attached to the motherboard - in particular for the second zvol.

3 Likes

vdev

I had assumed those checksum errors were to do with the corrupted files, guess not

so 8 sas drives connected via a hba, i don’t have 8 sata power cables so i used two splitters one handles 5 drives which is raidz1-0 and one of the raidz1-1 drives the other handles raidz1-1 and the os drive , the issue that made me go through power cycling was that some of the drives weren’t starting up, i assume from lack of power from a connector being overloaded, so i tried different configs to this where all the drives are running but it sounds like thats causing the checksum errors ?

Oops

Are you splitting SATA power connectors?

If yes then they really can’t deliver much power. I suggest you look for another way of powering those drives.

Also - your HBA - how are you cooling that HBA

In fact please post your hardware in detail

AFAIK, a single sata connector can deliver up to 54W. HDDs eat up to 10W during activity. And up to 20+W at the startup when they are all spinning up.

So, using a power splitter is kinda ok, as long as it’s a 1 to 2 power splitter (I saw those kinds of splitters in low-end server case). Maybe, just maybe, you can be ok with 1 to 3. But it is too risky for my taste.

IIUIC, you use 1 to 5 splitters. Those are asking for trouble. Perhaps they can be ok with all flash: consumer sata SSDs usually eat under 10W. They are not ok with HDDs – I’m 99% sure.

1 Like

So given the consensus here I’m getting a new SATA PSU cable, but to hopefully get back on topic, is it safe for me to delete the files

    /var/db/system/configs-ae32c386e13840b2bf9c0083275e7941/TrueNAS-25.04.1/20250709.db
    /var/db/system/netdata/dbengine/datafile-1-0000000256.ndf
    /var/db/system/netdata/dbengine/journalfile-1-0000000256.njf
    /var/db/system/netdata/dbengine/datafile-1-0000000255.ndf
    /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf
    /var/db/system/netdata/dbengine-tier1/datafile-1-0000000026.ndf
    /var/db/system/netdata/dbengine-tier1/journalfile-1-0000000026.njf

So I’ve got an update, I’ve added a molex to sata connection so i have 8 drives SAS HDD split between 4 psu cables 2 each with one cable with the SSD boot and another with fans,

I had assumed the issue might be one of my drives failing so i offlined that drive for about an hour, when i re-enabled the drive the number of corrupted files has reduced

root@truenas[~]# zpool status pool -v
  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 4.79M in 00:00:02 with 0 errors on Sun Sep  7 03:48:55 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            f9f1bf24-3c24-401c-b88d-724361ba7a26  ONLINE       0     0     0
            ab767165-11ca-446d-ace9-54bc11064b98  ONLINE       0     0     0
            9392fcf5-a61a-4619-b307-2cc26cf80d1b  ONLINE       0     0     0
            ee87669a-4bf3-4c24-b31e-5ff365a4cd61  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            0a392fe1-fa97-4d25-b5eb-48baad4f29d1  ONLINE       0     0 4.27K
            132be53c-fb0e-416a-b0d0-37577850555b  ONLINE       0     0 4.26K
            20c345eb-61b9-417b-bdc4-feb70e5f80b1  ONLINE       0     0 4.26K
            2f3ccef9-dc86-4ac0-97c9-859862f56f29  ONLINE       0     0 4.27K

errors: Permanent errors have been detected in the following files:

        /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf

also looking in the db engine folder it appears that 256 is the latest file is 255 even needed anymore

root@truenas[/var/db/system/netdata/dbengine]# ls
datafile-1-0000000239.ndf  datafile-1-0000000251.ndf     journalfile-1-0000000245.njf
datafile-1-0000000240.ndf  datafile-1-0000000252.ndf     journalfile-1-0000000246.njf
datafile-1-0000000241.ndf  datafile-1-0000000253.ndf     journalfile-1-0000000247.njf
datafile-1-0000000242.ndf  datafile-1-0000000254.ndf     journalfile-1-0000000248.njf
datafile-1-0000000243.ndf  datafile-1-0000000255.ndf     journalfile-1-0000000249.njf
datafile-1-0000000244.ndf  datafile-1-0000000256.ndf     journalfile-1-0000000250.njf
datafile-1-0000000245.ndf  journalfile-1-0000000239.njf  journalfile-1-0000000251.njf
datafile-1-0000000246.ndf  journalfile-1-0000000240.njf  journalfile-1-0000000252.njf
datafile-1-0000000247.ndf  journalfile-1-0000000241.njf  journalfile-1-0000000253.njf
datafile-1-0000000248.ndf  journalfile-1-0000000242.njf  journalfile-1-0000000254.njf
datafile-1-0000000249.ndf  journalfile-1-0000000243.njf  journalfile-1-0000000255.njf
datafile-1-0000000250.ndf  journalfile-1-0000000244.njf  journalfile-1-0000000256.njf

How are you cooling the HBA? They need a lot more direct cooling (airflow) than what a person would think. If the HBA overheats it will cause issues. Just being in a case is not normally enough and if the case is a real server chassis, then it may still not get enough airflow depending on design and location in the chassis.

1 Like

So no direct cooling, and its in a Fractal Design Define R4 case not a server chassis, however this isn’t the issue i’m trying to solve tbh, the checksum errors started after the file corruption and i want to know how to fix that first

so please if you have an idea if it is safe for me to just delete the file /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf

or if there is a safe way for me to attempt to delete it that also would work

Yes you can (in theory) delete them. Ideally you would stop netdata first (I don’t know how)

so i stopped netdata and deleted that file then ran a scrub, it appears to have helped since a only have checksum errors of over 100 now not over 1000000

this time there is a config file showing up as corrupted so once again i’m asking for advice if its safe to delete

zpool status -v
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:21 with 0 errors on Sun Sep  7 03:45:25 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdi3      ONLINE       0     0     0

errors: No known data errors

  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 328K in 07:40:06 with 56 errors on Mon Sep  8 18:13:25 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            f9f1bf24-3c24-401c-b88d-724361ba7a26  ONLINE       0     0     0
            ab767165-11ca-446d-ace9-54bc11064b98  ONLINE       0     0     0
            9392fcf5-a61a-4619-b307-2cc26cf80d1b  ONLINE       0     0     0
            ee87669a-4bf3-4c24-b31e-5ff365a4cd61  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            0a392fe1-fa97-4d25-b5eb-48baad4f29d1  ONLINE       0     0   173
            132be53c-fb0e-416a-b0d0-37577850555b  ONLINE       0     0   112
            20c345eb-61b9-417b-bdc4-feb70e5f80b1  ONLINE       0     0   112
            2f3ccef9-dc86-4ac0-97c9-859862f56f29  ONLINE       0     0   112

errors: Permanent errors have been detected in the following files:

        /var/db/system/configs-ae32c386e13840b2bf9c0083275e7941/TrueNAS-25.04.1/20250709.db
        /var/db/system/netdata/dbengine/datafile-1-0000000256.ndf
        /var/db/system/netdata/dbengine-tier1/datafile-1-0000000026.ndf
        /var/db/system/netdata/dbengine-tier1/journalfile-1-0000000026.njf

2 Things, the chksum errors are <> permanent errors. These are two seperate issues, albeit one may be related to the other.

You need to fix the hardware issue causing the chksum errors.

Secondly you haven’t posted your hardware, despite two requests. Please do so - it matters. we only know you have a Fractal Design R4.

Now I understand you changed some cables. Please run a zpool clear on the pool and then check for chksum errors reappearing.

Lastly, any of the netdata files can be deleted. As for the first file - what version of TrueNAS are you running (see we need to know your hardware, which includes TN version.)