I had assumed those checksum errors were to do with the corrupted files, guess not
so 8 sas drives connected via a hba, i don’t have 8 sata power cables so i used two splitters one handles 5 drives which is raidz1-0 and one of the raidz1-1 drives the other handles raidz1-1 and the os drive , the issue that made me go through power cycling was that some of the drives weren’t starting up, i assume from lack of power from a connector being overloaded, so i tried different configs to this where all the drives are running but it sounds like thats causing the checksum errors ?
AFAIK, a single sata connector can deliver up to 54W. HDDs eat up to 10W during activity. And up to 20+W at the startup when they are all spinning up.
So, using a power splitter is kinda ok, as long as it’s a 1 to 2 power splitter (I saw those kinds of splitters in low-end server case). Maybe, just maybe, you can be ok with 1 to 3. But it is too risky for my taste.
IIUIC, you use 1 to 5 splitters. Those are asking for trouble. Perhaps they can be ok with all flash: consumer sata SSDs usually eat under 10W. They are not ok with HDDs – I’m 99% sure.
So I’ve got an update, I’ve added a molex to sata connection so i have 8 drives SAS HDD split between 4 psu cables 2 each with one cable with the SSD boot and another with fans,
I had assumed the issue might be one of my drives failing so i offlined that drive for about an hour, when i re-enabled the drive the number of corrupted files has reduced
root@truenas[~]# zpool status pool -v
pool: pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 4.79M in 00:00:02 with 0 errors on Sun Sep 7 03:48:55 2025
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
f9f1bf24-3c24-401c-b88d-724361ba7a26 ONLINE 0 0 0
ab767165-11ca-446d-ace9-54bc11064b98 ONLINE 0 0 0
9392fcf5-a61a-4619-b307-2cc26cf80d1b ONLINE 0 0 0
ee87669a-4bf3-4c24-b31e-5ff365a4cd61 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
0a392fe1-fa97-4d25-b5eb-48baad4f29d1 ONLINE 0 0 4.27K
132be53c-fb0e-416a-b0d0-37577850555b ONLINE 0 0 4.26K
20c345eb-61b9-417b-bdc4-feb70e5f80b1 ONLINE 0 0 4.26K
2f3ccef9-dc86-4ac0-97c9-859862f56f29 ONLINE 0 0 4.27K
errors: Permanent errors have been detected in the following files:
/var/db/system/netdata/dbengine/journalfile-1-0000000255.njf
also looking in the db engine folder it appears that 256 is the latest file is 255 even needed anymore
How are you cooling the HBA? They need a lot more direct cooling (airflow) than what a person would think. If the HBA overheats it will cause issues. Just being in a case is not normally enough and if the case is a real server chassis, then it may still not get enough airflow depending on design and location in the chassis.
So no direct cooling, and its in a Fractal Design Define R4 case not a server chassis, however this isn’t the issue i’m trying to solve tbh, the checksum errors started after the file corruption and i want to know how to fix that first
so please if you have an idea if it is safe for me to just delete the file /var/db/system/netdata/dbengine/journalfile-1-0000000255.njf
or if there is a safe way for me to attempt to delete it that also would work
so i stopped netdata and deleted that file then ran a scrub, it appears to have helped since a only have checksum errors of over 100 now not over 1000000
this time there is a config file showing up as corrupted so once again i’m asking for advice if its safe to delete
zpool status -v
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:21 with 0 errors on Sun Sep 7 03:45:25 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdi3 ONLINE 0 0 0
errors: No known data errors
pool: pool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 328K in 07:40:06 with 56 errors on Mon Sep 8 18:13:25 2025
config:
NAME STATE READ WRITE CKSUM
pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
f9f1bf24-3c24-401c-b88d-724361ba7a26 ONLINE 0 0 0
ab767165-11ca-446d-ace9-54bc11064b98 ONLINE 0 0 0
9392fcf5-a61a-4619-b307-2cc26cf80d1b ONLINE 0 0 0
ee87669a-4bf3-4c24-b31e-5ff365a4cd61 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
0a392fe1-fa97-4d25-b5eb-48baad4f29d1 ONLINE 0 0 173
132be53c-fb0e-416a-b0d0-37577850555b ONLINE 0 0 112
20c345eb-61b9-417b-bdc4-feb70e5f80b1 ONLINE 0 0 112
2f3ccef9-dc86-4ac0-97c9-859862f56f29 ONLINE 0 0 112
errors: Permanent errors have been detected in the following files:
/var/db/system/configs-ae32c386e13840b2bf9c0083275e7941/TrueNAS-25.04.1/20250709.db
/var/db/system/netdata/dbengine/datafile-1-0000000256.ndf
/var/db/system/netdata/dbengine-tier1/datafile-1-0000000026.ndf
/var/db/system/netdata/dbengine-tier1/journalfile-1-0000000026.njf
2 Things, the chksum errors are <> permanent errors. These are two seperate issues, albeit one may be related to the other.
You need to fix the hardware issue causing the chksum errors.
Secondly you haven’t posted your hardware, despite two requests. Please do so - it matters. we only know you have a Fractal Design R4.
Now I understand you changed some cables. Please run a zpool clear on the pool and then check for chksum errors reappearing.
Lastly, any of the netdata files can be deleted. As for the first file - what version of TrueNAS are you running (see we need to know your hardware, which includes TN version.)