Pool unhealthy, all disks and cables working fine

prez02 · June 14, 2024, 6:25pm

I guess that was the main worry here and the reason why everybody was asking about the hardware.

The Node 804 is a nice case, I have one myself. It has 8 places for 3,5 disk and another 2 for 2.5 or 3.5. How did you fit all the all drives into it, or is the backup pool somewere else?

When I tried to add some 2.5 drives with a small adapter, the bigger HDD screws would not find and I had to buy screws and washers in a DIY store.

sfatula · June 14, 2024, 6:42pm

The OP should set up email alerts via the alert and settings area in Scale. Then, when the problem occurs, he will get an email and can capture relevant info.

rdcustom · June 14, 2024, 6:58pm

for >6TB drives there are some adapters in the accessories box.
one the mobo side I used just 2 screws/drive and I placed other 2 drives near the PSU.

it can handle a total of 10x 3,5" + 2x 2,5" (inside the front panel) oob

rdcustom · June 14, 2024, 6:59pm

done. but never had any notification

sfatula · June 14, 2024, 7:02pm

I have gotten notification of every minor glitch myself. Love it.

winnielinnie · June 14, 2024, 7:40pm

If you try to send a “test” email from the GUI, does it go through?

Stux · June 14, 2024, 10:19pm

Non ECC memory could be responsible.

Did you try a rescrub? After maybe a shutdown?

In theory, you’d have to have colocated block issues, ie more than one failure while reading the same block.

This could happen if a crappy sata adapter glitched out. Which can happen when port multipliers read an error. The error triggers errors on other reads.

Or on a write.

Ie, the insidious thing is it works fine, until something goes wrong.

Fleshmauler · June 15, 2024, 2:19am

Hope it doesn’t come across like that too much. Oddly enough I notice that used enterprise hardware is way further from being ‘fancy’ & is instead commonly available for prices at or below consumer hardware that has been made in last 5 years.

I think the teasing was that the things originally suspected are common points of failure that are repeatedly mentioned to stay away from, yet folks keep using them & then getting defensive when they fail. Maybe it was unfair to have this attitude towards you before getting it confirmed beyond doubt.

Glad nothing significant of value was lost yet & that you got other backups for mitigation.

ECC, SMR, Port Multiplier; I’m not sure what caused the issue but take your pick. However, 1 thing I’m very concerned about is you mentioned on your backup pool in your signature is 1TB Kingston NVMe as ‘cache’… Would you mind confirming if you set it up as L2ARC, SLOG, or a special vdev?

We’re also had folks setup special vdevs on single point of failure drives & then lose the entire pool when it fails…

etorix · June 15, 2024, 7:14am

EIther way, there’s no real use for “cache” on a backup pool…

Fleshmauler · June 15, 2024, 7:29am

I was just worried he has a metadata vdev & risks losing his entire backup pool more than anything. Was gonna gloss over (likely) no performance gains until that was confirmed.

rdcustom · June 15, 2024, 8:44am

The backup pool hosts also Final Cut Pro libraries, I added a read cache vdev, not write cache.

I noticed some differences in speed when browsing folders and loading project into FCPX

etorix · June 15, 2024, 8:52am

OK, then it is not strictly a “backup” but also an active pool.
“Cache” is proper terminology for L2ARC (but SLOG is NOT a “write cache”), and 128 GB RAM may support 1 TB L2ARC.

rdcustom · June 15, 2024, 8:55am

thanks for clarifying

Fleshmauler · June 15, 2024, 4:51pm

L2ARC on single nvme is fine then - was worried for nothing

etorix · June 16, 2024, 2:29pm

@rdcustom With so many NVMe drives in actual use, B550 was probably not the best platform.

If I get the lanes right (Ryzen is a mess here…) you have:

CPU x16 → HyperM2 → 2x 2 TB (app) + 1 TB (L2ARC) + 1 free M.2 slot
CPU x4 → boot M.2 #1
B550 x4 → boot M.2 #2
B550 x1 slot → i350 NIC
B550 x1 slot → ASM1064
If so, this is a waste of precious CPU lanes on a boot device which probably does not need to be mirrorred in the first place. (Just keep a recent copy of the configuration file in case you need to reinstall on a new boot device.)

Possible improvements:

CPU x16 → x8x4x4 riser → HBA + 2x 2 TB (apps)
CPU x4 → 1 TB L2ARC

OR

CPU x4 → M.2 to PCIe adapter → HBA

In either case, single boot device, or mirror with a M.2 drive in a PCIe x1 slot.