Constant State of Disk Degradation

RuberDuck9 · August 25, 2025, 12:02am

Hi everyone. I posted about this on reddit a bit ago, when this problem first started. Essentially, one of my drives showed up one morning as degraded. I first tried replacing the sata cable for the drive - no luck. Then, I tried replacing the drive, and the issue seemed to have gone away. However, a few days later the drive I had just replaced it with appeared as degraded. Then, I replaced that drive and the motherboard since I had had issues with the motherboard in the past. Now, a third drive is showing up as degraded. This seems like too much to simply to be bad luck, unless somehow all the re-silvering has just triggered some bad chain of events. So to recap, I have replaced bad drives, the motherboard, and sata cables, but no luck. Also, when I was re-silvering the last drive, I saw sky high zfs errors (almost exactly 2,678,731 errors on each drive, these were all checksum errors) and am wondering what to even do. I’ve also seen some places that this could be a potential psu issue, does anyone have experience with a bad psu causing something similar?

System Info:

AMD Ryzen 5 3600
2x16 DDR4 3200 consumer ram
1t samsung boot m.2 drive
256 gb log sata ssd
4 8t drives in raidz2. Originally 4 seagate barracudas (not ideal I know), have since replaced one with a wd red plus.

PS: I’m currently doing a pool scrub, but I’m not hopeful since that has yet to solve this issue in the past. Also, I still have not RMAed the originally seagate drive that first showed as degraded, so I could theoretically swap it in if I needed to. One last thing, all the data so far seems to be good (I can play videos just fine), but is there any way to check if data has been corrupted irreversibly.

NugentS · August 25, 2025, 3:11am

Seagate Barracuda’s are almost certainly SMR - and not recomended for ZFS.

How are the drives connected to the motherboard?

RuberDuck9 · August 25, 2025, 11:18am

3 of the hdds and the log ssd are connected to 4 sata ports on the motherboard. The other 2 hdds are connected to a pcie to sata card. I don’t think the card is the issue though, because this was a problem before I even had that card, although I can’t rule it out.

NugentS · August 25, 2025, 4:46pm

What make and model is that PCIe card - some are a disaster from the start, some turn into a disaster at a point in the future and a few are OK

On balance - most are not good.

Its important to know which drives are connected to what device so you can see if there is any correlation.

RuberDuck9 · August 26, 2025, 2:26am

It’s a generic card from amazon, the ACTIMED PCI-E X1 to SATA 3.0 Controller Card (the one with 667 ratings). The card is maybe 3 weeks old. Do you have recommendations for any higher quality ones?

NugentS · August 26, 2025, 9:42am

That rings alarm bells for me. Its a Marvell 88SE9215 chipset but the guff about the card mentions multipliers - which is a really bad sign.

This forum reccomends LSI HBA cards - but they are PCIE x8 cards. With a few exceptions (that I don’t remember enough about) we reccomend against Chinesium SATA Controller cards in PCIe x1 slots. There is an ASM Chipset thats OK - but I don’t remember which one

RuberDuck9 · August 26, 2025, 3:55pm

Yeah, I’m beginning to think that card might be the cause of all my issues. I just ordered an LSI 9300 that I’m hoping will resolve the issue. It arrives in a few days and I’ll let you know if that works.

NugentS · August 26, 2025, 4:01pm

Before you use that card:

Make sure its the correct firmware version before you use it.
Put a fan on it - they are designed for servers with lots of airflow in DC’s and not in a tower case in someone’s house. An overheating (very easy to do) LSI card will produce all sorts of wierd errors before dying completely

RuberDuck9 · August 26, 2025, 7:01pm

Thanks for the quick response. I heard about the overheating issues on that card, so I bought a dual pci mounted fan bracket so I can angles fans directly at it. How do I know if the firmware version I’m using is correct though?

RuberDuck9 · August 31, 2025, 5:59am

Ok, so I have another update. I installed the lsi card, and I believe the firmware is setup correctly (and I think it is being properly cooled, but I don’t know how to check for sure). I replaced the psu just in case. I also re-silvered another drive which appeared as degraded, and I thought all was well, but no. After running zpool status -v I found some files and snapshots that were permanently corrupted (only like 5 though) so I deleted them. I ran zpool clear and started another scrub, but I can already see checksum errors popping up again. Any ideas?

Zpool status:

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:03 with 0 errors on Sun Aug 24 06:45:04 2025
config:

NAME         STATE     READ WRITE CKSUM
boot-pool    ONLINE       0     0     0
  nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

pool: main
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: Message ID: ZFS-8000-8A — OpenZFS documentation
scan: scrub in progress since Sun Aug 31 01:50:30 2025
3.12T / 19.0T scanned at 6.99G/s, 266G / 19.0T issued at 596M/s
0B repaired, 1.37% done, 09:09:07 to go
config:

NAME                                      STATE     READ WRITE CKSUM
main                                      ONLINE       0     0     0
  raidz2-0                                ONLINE       0     0     0
    5410ae8c-6787-4878-b0b8-365ff4197a80  ONLINE       0     0    20
    2e4c5163-3141-48d1-8952-78d72f5dca31  ONLINE       0     0    20
    c6456c3c-7c3f-423a-ac60-e2999ef28709  ONLINE       0     0    20
    407cdefc-d380-4090-9c97-0e4cd6af8264  ONLINE       0     0    20
logs	
  451616c7-2051-4f06-b890-7271ecaa78b6    ONLINE       0     0     0

errors: List of errors unavailable: no such pool or dataset

Topic		Replies	Views
Replaced a broken disk and having trouble resilvering TrueNAS General CORE	33	648	October 7, 2024
Hundreds of thousands of checksum errors on one Vdev during scrub TrueNAS General Hardware , ZFS	4	121	January 10, 2026
ZFS erros (DEGRADED pool) TrueNAS General SCALE , Hardware , ZFS	11	641	March 5, 2025
Cascading failures while replacing a drive in RAIDZ2 TrueNAS General CORE , Hardware , ZFS	8	154	November 6, 2025
In zfs3 all disks have changed status to DEGRADED TrueNAS General	8	99	April 18, 2025

Constant State of Disk Degradation

Related topics