I have a 3 drives setup, and usually power off because i don’t access it often.
Recently, HexOS forwarded the warning that one of my drive was REMOVED…and nope, go to the web UI, it says everything is normal. So it just somehow got disconnected for a moment.
This drive also gave me tons of warning but not anything wrong upon scanning 2 months ago, now it looks like the problem is outside the drive itself.
I move the drives out of the case and put them in an external shelf, change to a new 4 to 4 SATA cables, power switch to a 1 to 4 splitter, then power on…
There are LOADS of problem.
Some of them (Can’t start docker, drive sounds constantly and consistently shutting down, scrub taking months) never came back once I connect the SATA power back to PSU one.
Some other thing are not. Namely: Report doesn’t work, and endless piling checksum errors. Not sure if they’re related.
Report stops working is just that, the report page is empty. The home page also doesn’t show any status.
I’m not sure exactly WHEN this became a thing since I don’t check it often, but I’m sure it used to work.
Checksums errors are interesting…because it’s the same amount between 2 drives.
THE 2 DRIVES THAT NEVER HAD AN ISSUE. The trouble maker drive now gives no error, what.
Basically, the longer the system runs, the more error it raises, like 400 of them within 10 minutes.
SMB is still up, randomly choose a file to open usually works, but loads of checksum errors.
0 read error, 0 write error, 400 checksum errors, same thing on 2 drives and a clean 3rd drive.
Tried switching port and switching SATA cable, did nothing.
Ran a 6 hours scrub, still spitting errors.
truenas_admin@HexOS[~]$ sudo zpool status -v
pool: HDDs
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 18.6M in 06:49:35 with 108 errors on Mon Nov 3 08:35:52 2025
config:
NAME STATE READ WRITE CKSUM
HDDs ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
00f9e94d-2c50-4903-b984-501286530311 ONLINE 0 0 364
974f1184-6c2d-49a4-b19f-f675387f7bbb ONLINE 0 0 0
98f8ae81-f736-414d-8afa-ada57c274e28 ONLINE 0 0 364
errors: Permanent errors have been detected in the following files:
/var/db/system/netdata/context-meta.db-wal
/var/db/system/netdata/netdata-meta.db
/var/db/system/netdata/netdata-meta.db-wal
/var/db/system/netdata/dbengine/journalfile-1-0000000057.njf
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.69260.1762103223000000.zst
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.71272.1762103613000000.zst
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.74868.1762104259000000.zst
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.67888.1762102978000000.zst
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.69885.1762103368000000.zst
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.71963.1762103742000000.zst
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.64957.1762102445000000.zst
/var/db/system/cores/core.netdata.999.71c07f17d3fd46deb40c2629586f2aa5.73343.1762104014000000.zst
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:21 with 0 errors on Mon Nov 3 03:45:23 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sdb3 ONLINE 0 0 0
errors: No known data errors
So…Same errors across 2 drives suggested that it’s not a drive problem. But then what?
SATA cable is not a factor, SATA port is not a factor, my only idea left is just get a new rig at this point.
But then, is checksum error a necessity to be dealt with? Are my data bit by bit started dying, or is it just freaking out on insignificant mismatch value?
Thanks for reading.
