Hi everyone. I posted about this on reddit a bit ago, when this problem first started. Essentially, one of my drives showed up one morning as degraded. I first tried replacing the sata cable for the drive - no luck. Then, I tried replacing the drive, and the issue seemed to have gone away. However, a few days later the drive I had just replaced it with appeared as degraded. Then, I replaced that drive and the motherboard since I had had issues with the motherboard in the past. Now, a third drive is showing up as degraded. This seems like too much to simply to be bad luck, unless somehow all the re-silvering has just triggered some bad chain of events. So to recap, I have replaced bad drives, the motherboard, and sata cables, but no luck. Also, when I was re-silvering the last drive, I saw sky high zfs errors (almost exactly 2,678,731 errors on each drive, these were all checksum errors) and am wondering what to even do. I’ve also seen some places that this could be a potential psu issue, does anyone have experience with a bad psu causing something similar?
System Info:
AMD Ryzen 5 3600
2x16 DDR4 3200 consumer ram
1t samsung boot m.2 drive
256 gb log sata ssd
4 8t drives in raidz2. Originally 4 seagate barracudas (not ideal I know), have since replaced one with a wd red plus.
PS: I’m currently doing a pool scrub, but I’m not hopeful since that has yet to solve this issue in the past. Also, I still have not RMAed the originally seagate drive that first showed as degraded, so I could theoretically swap it in if I needed to. One last thing, all the data so far seems to be good (I can play videos just fine), but is there any way to check if data has been corrupted irreversibly.
3 of the hdds and the log ssd are connected to 4 sata ports on the motherboard. The other 2 hdds are connected to a pcie to sata card. I don’t think the card is the issue though, because this was a problem before I even had that card, although I can’t rule it out.
It’s a generic card from amazon, the ACTIMED PCI-E X1 to SATA 3.0 Controller Card (the one with 667 ratings). The card is maybe 3 weeks old. Do you have recommendations for any higher quality ones?
That rings alarm bells for me. Its a Marvell 88SE9215 chipset but the guff about the card mentions multipliers - which is a really bad sign.
This forum reccomends LSI HBA cards - but they are PCIE x8 cards. With a few exceptions (that I don’t remember enough about) we reccomend against Chinesium SATA Controller cards in PCIe x1 slots. There is an ASM Chipset thats OK - but I don’t remember which one
Yeah, I’m beginning to think that card might be the cause of all my issues. I just ordered an LSI 9300 that I’m hoping will resolve the issue. It arrives in a few days and I’ll let you know if that works.
Make sure its the correct firmware version before you use it.
Put a fan on it - they are designed for servers with lots of airflow in DC’s and not in a tower case in someone’s house. An overheating (very easy to do) LSI card will produce all sorts of wierd errors before dying completely
Thanks for the quick response. I heard about the overheating issues on that card, so I bought a dual pci mounted fan bracket so I can angles fans directly at it. How do I know if the firmware version I’m using is correct though?
Ok, so I have another update. I installed the lsi card, and I believe the firmware is setup correctly (and I think it is being properly cooled, but I don’t know how to check for sure). I replaced the psu just in case. I also re-silvered another drive which appeared as degraded, and I thought all was well, but no. After running zpool status -v I found some files and snapshots that were permanently corrupted (only like 5 though) so I deleted them. I ran zpool clear and started another scrub, but I can already see checksum errors popping up again. Any ideas?
Zpool status:
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:03 with 0 errors on Sun Aug 24 06:45:04 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0
errors: No known data errors
pool: main
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: Message ID: ZFS-8000-8A — OpenZFS documentation
scan: scrub in progress since Sun Aug 31 01:50:30 2025
3.12T / 19.0T scanned at 6.99G/s, 266G / 19.0T issued at 596M/s
0B repaired, 1.37% done, 09:09:07 to go
config: