I guess that was the main worry here and the reason why everybody was asking about the hardware.
The Node 804 is a nice case, I have one myself. It has 8 places for 3,5 disk and another 2 for 2.5 or 3.5. How did you fit all the all drives into it, or is the backup pool somewere else?
When I tried to add some 2.5 drives with a small adapter, the bigger HDD screws would not find and I had to buy screws and washers in a DIY store.
The OP should set up email alerts via the alert and settings area in Scale. Then, when the problem occurs, he will get an email and can capture relevant info.
In theory, you’d have to have colocated block issues, ie more than one failure while reading the same block.
This could happen if a crappy sata adapter glitched out. Which can happen when port multipliers read an error. The error triggers errors on other reads.
Or on a write.
Ie, the insidious thing is it works fine, until something goes wrong.
Hope it doesn’t come across like that too much. Oddly enough I notice that used enterprise hardware is way further from being ‘fancy’ & is instead commonly available for prices at or below consumer hardware that has been made in last 5 years.
I think the teasing was that the things originally suspected are common points of failure that are repeatedly mentioned to stay away from, yet folks keep using them & then getting defensive when they fail. Maybe it was unfair to have this attitude towards you before getting it confirmed beyond doubt.
Glad nothing significant of value was lost yet & that you got other backups for mitigation.
ECC, SMR, Port Multiplier; I’m not sure what caused the issue but take your pick. However, 1 thing I’m very concerned about is you mentioned on your backup pool in your signature is 1TB Kingston NVMe as ‘cache’… Would you mind confirming if you set it up as L2ARC, SLOG, or a special vdev?
We’re also had folks setup special vdevs on single point of failure drives & then lose the entire pool when it fails…
I was just worried he has a metadata vdev & risks losing his entire backup pool more than anything. Was gonna gloss over (likely) no performance gains until that was confirmed.
OK, then it is not strictly a “backup” but also an active pool.
“Cache” is proper terminology for L2ARC (but SLOG is NOT a “write cache”), and 128 GB RAM may support 1 TB L2ARC.
B550 x1 slot → ASM1064
If so, this is a waste of precious CPU lanes on a boot device which probably does not need to be mirrorred in the first place. (Just keep a recent copy of the configuration file in case you need to reinstall on a new boot device.)