Help - safe to delete corrupted files

NateNas · July 30, 2025, 1:20pm

Hi

I recently had some hardware issues with a new PSU and started hard shutdown my machine a few times

So obviously i now have some corrupted files on my machine, just looking for advice on if its safe to delete the ones in /var/db/system

root@truenas[~]# zpool status -v
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:24 with 0 errors on Mon Jul 28 03:45:30 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdg3      ONLINE       0     0     0

errors: No known data errors

  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 04:53:52 with 32012 errors on Mon Jul 14 08:00:57 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            f9f1bf24-3c24-401c-b88d-724361ba7a26  ONLINE       0     0     0
            ab767165-11ca-446d-ace9-54bc11064b98  ONLINE       0     0     0
            9392fcf5-a61a-4619-b307-2cc26cf80d1b  ONLINE       0     0     0
            ee87669a-4bf3-4c24-b31e-5ff365a4cd61  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            e47eeee8-c63e-4d27-ada0-36cb5c334bc4  ONLINE       0     0 2.20M
            132be53c-fb0e-416a-b0d0-37577850555b  ONLINE       0     0 2.20M
            20c345eb-61b9-417b-bdc4-feb70e5f80b1  ONLINE       0     0 2.20M
            2f3ccef9-dc86-4ac0-97c9-859862f56f29  ONLINE       0     0 2.20M

errors: Permanent errors have been detected in the following files:

        /mnt/.ix-apps/app_mounts/komodo/pg_data/collection-55-15321617546263701855.wt
        /mnt/.ix-apps/app_mounts/komodo/pg_data/index-56-15321617546263701855.wt
        /mnt/.ix-apps/app_mounts/komodo/pg_data/index-57-15321617546263701855.wt
        /mnt/.ix-apps/app_mounts/komodo/pg_data/index-58-15321617546263701855.wt
        /var/db/system/configs-ae32c386e13840b2bf9c0083275e7941/TrueNAS-25.04.1/20250709.db
        /var/db/system/netdata/dbengine/datafile-1-0000000256.ndf
        /var/db/system/netdata/dbengine/journalfile-1-0000000256.njf
        /var/db/system/netdata/dbengine/datafile-1-0000000255.ndf
        /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf
        /var/db/system/netdata/dbengine-tier1/datafile-1-0000000026.ndf
        /var/db/system/netdata/dbengine-tier1/journalfile-1-0000000026.njf
        /mnt/.ix-apps/docker/volumes/ix-wger_redis-data/_data/appendonlydir/appendonly.aof.4.incr.aof
        /mnt/.ix-apps/docker/containers/123c2904c782d6e701b95d9b350fd574827225191d4c337a97519bb67f90438f/123c2904c782d6e701b95d9b350fd574827225191d4c337a97519bb67f90438f-json.log
        /mnt/.ix-apps/docker/containers/5e63511b4e7bb2bae7fd037973de3395fbc1beff4a9f7cc55a99d9cf81f14927/5e63511b4e7bb2bae7fd037973de3395fbc1beff4a9f7cc55a99d9cf81f14927-json.log
        /mnt/.ix-apps/app_mounts/wger/pg_data/base/16384/2619
        /mnt/.ix-apps/app_mounts/wger/pg_data/base/16384/17197
        /mnt/.ix-apps/app_mounts/wger/pg_data/base/16384/2696

NateNas · July 30, 2025, 1:22pm

Not sure how to edit, but just before any asks, no i don’t have backups, now i understand why they are important though

NugentS · July 30, 2025, 5:17pm

2.2 Million chksum errors - that impressive (in a bad way).

I think you need to fix that issue before worrying about data.

Whats your hardware and how specifically are the disks attached to the motherboard - in particular for the second zvol.

swc-phil · July 30, 2025, 5:22pm

vdev

NateNas · July 30, 2025, 5:25pm

I had assumed those checksum errors were to do with the corrupted files, guess not

so 8 sas drives connected via a hba, i don’t have 8 sata power cables so i used two splitters one handles 5 drives which is raidz1-0 and one of the raidz1-1 drives the other handles raidz1-1 and the os drive , the issue that made me go through power cycling was that some of the drives weren’t starting up, i assume from lack of power from a connector being overloaded, so i tried different configs to this where all the drives are running but it sounds like thats causing the checksum errors ?

NugentS · July 30, 2025, 5:31pm

Oops

NugentS · July 30, 2025, 5:37pm

Are you splitting SATA power connectors?

If yes then they really can’t deliver much power. I suggest you look for another way of powering those drives.

Also - your HBA - how are you cooling that HBA

In fact please post your hardware in detail

swc-phil · July 30, 2025, 6:06pm

AFAIK, a single sata connector can deliver up to 54W. HDDs eat up to 10W during activity. And up to 20+W at the startup when they are all spinning up.

So, using a power splitter is kinda ok, as long as it’s a 1 to 2 power splitter (I saw those kinds of splitters in low-end server case). Maybe, just maybe, you can be ok with 1 to 3. But it is too risky for my taste.

IIUIC, you use 1 to 5 splitters. Those are asking for trouble. Perhaps they can be ok with all flash: consumer sata SSDs usually eat under 10W. They are not ok with HDDs – I’m 99% sure.

NateNas · July 30, 2025, 9:43pm

So given the consensus here I’m getting a new SATA PSU cable, but to hopefully get back on topic, is it safe for me to delete the files

    /var/db/system/configs-ae32c386e13840b2bf9c0083275e7941/TrueNAS-25.04.1/20250709.db
    /var/db/system/netdata/dbengine/datafile-1-0000000256.ndf
    /var/db/system/netdata/dbengine/journalfile-1-0000000256.njf
    /var/db/system/netdata/dbengine/datafile-1-0000000255.ndf
    /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf
    /var/db/system/netdata/dbengine-tier1/datafile-1-0000000026.ndf
    /var/db/system/netdata/dbengine-tier1/journalfile-1-0000000026.njf

NateNas · September 7, 2025, 12:05pm

So I’ve got an update, I’ve added a molex to sata connection so i have 8 drives SAS HDD split between 4 psu cables 2 each with one cable with the SSD boot and another with fans,

I had assumed the issue might be one of my drives failing so i offlined that drive for about an hour, when i re-enabled the drive the number of corrupted files has reduced

root@truenas[~]# zpool status pool -v
  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 4.79M in 00:00:02 with 0 errors on Sun Sep  7 03:48:55 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            f9f1bf24-3c24-401c-b88d-724361ba7a26  ONLINE       0     0     0
            ab767165-11ca-446d-ace9-54bc11064b98  ONLINE       0     0     0
            9392fcf5-a61a-4619-b307-2cc26cf80d1b  ONLINE       0     0     0
            ee87669a-4bf3-4c24-b31e-5ff365a4cd61  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            0a392fe1-fa97-4d25-b5eb-48baad4f29d1  ONLINE       0     0 4.27K
            132be53c-fb0e-416a-b0d0-37577850555b  ONLINE       0     0 4.26K
            20c345eb-61b9-417b-bdc4-feb70e5f80b1  ONLINE       0     0 4.26K
            2f3ccef9-dc86-4ac0-97c9-859862f56f29  ONLINE       0     0 4.27K

errors: Permanent errors have been detected in the following files:

        /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf

also looking in the db engine folder it appears that 256 is the latest file is 255 even needed anymore

root@truenas[/var/db/system/netdata/dbengine]# ls
datafile-1-0000000239.ndf  datafile-1-0000000251.ndf     journalfile-1-0000000245.njf
datafile-1-0000000240.ndf  datafile-1-0000000252.ndf     journalfile-1-0000000246.njf
datafile-1-0000000241.ndf  datafile-1-0000000253.ndf     journalfile-1-0000000247.njf
datafile-1-0000000242.ndf  datafile-1-0000000254.ndf     journalfile-1-0000000248.njf
datafile-1-0000000243.ndf  datafile-1-0000000255.ndf     journalfile-1-0000000249.njf
datafile-1-0000000244.ndf  datafile-1-0000000256.ndf     journalfile-1-0000000250.njf
datafile-1-0000000245.ndf  journalfile-1-0000000239.njf  journalfile-1-0000000251.njf
datafile-1-0000000246.ndf  journalfile-1-0000000240.njf  journalfile-1-0000000252.njf
datafile-1-0000000247.ndf  journalfile-1-0000000241.njf  journalfile-1-0000000253.njf
datafile-1-0000000248.ndf  journalfile-1-0000000242.njf  journalfile-1-0000000254.njf
datafile-1-0000000249.ndf  journalfile-1-0000000243.njf  journalfile-1-0000000255.njf
datafile-1-0000000250.ndf  journalfile-1-0000000244.njf  journalfile-1-0000000256.njf

PhilD13 · September 7, 2025, 12:52pm

How are you cooling the HBA? They need a lot more direct cooling (airflow) than what a person would think. If the HBA overheats it will cause issues. Just being in a case is not normally enough and if the case is a real server chassis, then it may still not get enough airflow depending on design and location in the chassis.

NateNas · September 8, 2025, 4:19pm

So no direct cooling, and its in a Fractal Design Define R4 case not a server chassis, however this isn’t the issue i’m trying to solve tbh, the checksum errors started after the file corruption and i want to know how to fix that first

so please if you have an idea if it is safe for me to just delete the file /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf

or if there is a safe way for me to attempt to delete it that also would work

NugentS · September 8, 2025, 4:54pm

Yes you can (in theory) delete them. Ideally you would stop netdata first (I don’t know how)

NateNas · September 10, 2025, 8:31am

so i stopped netdata and deleted that file then ran a scrub, it appears to have helped since a only have checksum errors of over 100 now not over 1000000

this time there is a config file showing up as corrupted so once again i’m asking for advice if its safe to delete

zpool status -v
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:21 with 0 errors on Sun Sep  7 03:45:25 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          sdi3      ONLINE       0     0     0

errors: No known data errors

  pool: pool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 328K in 07:40:06 with 56 errors on Mon Sep  8 18:13:25 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool                                      ONLINE       0     0     0
          raidz1-0                                ONLINE       0     0     0
            f9f1bf24-3c24-401c-b88d-724361ba7a26  ONLINE       0     0     0
            ab767165-11ca-446d-ace9-54bc11064b98  ONLINE       0     0     0
            9392fcf5-a61a-4619-b307-2cc26cf80d1b  ONLINE       0     0     0
            ee87669a-4bf3-4c24-b31e-5ff365a4cd61  ONLINE       0     0     0
          raidz1-1                                ONLINE       0     0     0
            0a392fe1-fa97-4d25-b5eb-48baad4f29d1  ONLINE       0     0   173
            132be53c-fb0e-416a-b0d0-37577850555b  ONLINE       0     0   112
            20c345eb-61b9-417b-bdc4-feb70e5f80b1  ONLINE       0     0   112
            2f3ccef9-dc86-4ac0-97c9-859862f56f29  ONLINE       0     0   112

errors: Permanent errors have been detected in the following files:

        /var/db/system/configs-ae32c386e13840b2bf9c0083275e7941/TrueNAS-25.04.1/20250709.db
        /var/db/system/netdata/dbengine/datafile-1-0000000256.ndf
        /var/db/system/netdata/dbengine-tier1/datafile-1-0000000026.ndf
        /var/db/system/netdata/dbengine-tier1/journalfile-1-0000000026.njf

NugentS · September 10, 2025, 12:14pm

2 Things, the chksum errors are <> permanent errors. These are two seperate issues, albeit one may be related to the other.

You need to fix the hardware issue causing the chksum errors.

Secondly you haven’t posted your hardware, despite two requests. Please do so - it matters. we only know you have a Fractal Design R4.

Now I understand you changed some cables. Please run a zpool clear on the pool and then check for chksum errors reappearing.

Lastly, any of the netdata files can be deleted. As for the first file - what version of TrueNAS are you running (see we need to know your hardware, which includes TN version.)

Topic		Replies	Views
Endless checksum errors between multiple drives TrueNAS General	26	329	November 8, 2025
Replaced a broken disk and having trouble resilvering TrueNAS General CORE	33	625	October 7, 2024
All drives in a pool degraded - many checksum errors. Pool data seems ok, all errors are in rrd files TrueNAS General CORE , Hardware , ZFS	12	1362	June 17, 2024
Constant State of Disk Degradation TrueNAS General SCALE	9	142	August 31, 2025
Checksum errors without any identifiable data corruption TrueNAS General ZFS	6	168	January 2, 2026

Help - safe to delete corrupted files

Related topics