Need help with degraded pool and at my wits end on this

Helpppppp

I was running Truenas 11 smoothly With two pools. Pool1, two 18tb Seagate drives, pool 2 with four 8tb WD Raidz2 on a old PC with 4 core AMD processor and 10g ECC memory with USB mirrored boot disks. One of the 8tb went into degraded mode and eventually offline, within hours 2nd 8tb went degraded and a day later went offline. I tried to replace both drives with new burnt in 18tb to eventually expand the pool. No matter what I tried I had failures replacing the drives and resilver. I won’t go into those details as those are long past. I bought a SuperMicro 6028U X10DRU-I+ 12-Bay 2x E5-2690v4 128Gb 2U Server FREENAS/UNRAID. backed up all my important information and set about trying the grown up world with a true server. Exported the pools, created a new 256 SSD boot disk with Truenas 13 core. I took out all of the old drives and put them into the new server, got it to boot up after many attempts at the new learning curve. I imported the pools and downloaded my configuration and was at the same point I was at with my old PC setup. I replaced one of the failed off line drive with the 18 tb Seagate and resilvered, replace the second failed drive with another and resilvered. I replaced the other two WD with 18tb and resilvered one at a time until all was done. After the resilver proscess Truenas reported 17 errors and 3 of the 4 drives had 32 checksum errors. Thinking a pool srub should heal everything I did a manual scub, still the pool reports errors and now I have over 2k of checksum errors and the pool is running degraded. I have cleared the checksum errors only to have them appear again in a few day and the errors are never corrected. I did a zpool status -v and here are the results. They all seems to be related to Plex in the jail and IOcage. Short of wiping everything and starting over I hope someone can give new a quick fix and get the pool healthy again. I still have all the data backed up so I can start from scratch but really don’t want to take that path if I don’t have to. Please see the attached pictures and hope I can get this solved. Thanks in advance




Is there a way to delete the effected files without messing up the pool or plex? These are all cache files that seem redundant. Again Thanks

I can’t copy them off the system’s IOcage nor can I delete them as I don’t have Unix permission. Again I don’t want to blow away everything because these files want to generate errors and degrade a system that is running great.

Looks like an HBA issue.

Should I swap cables around and see if the problems follow to other drives? I have 3 pools running and this pool is the only one that has issues. Everything still points to Plex cache. I tried to copy Plex IOcage to mess with the cache safely, everything copied except the problems files. .JPG .PPM

thanks again

Check everything. HBA seating. Cables. Connections. Temperature and airflow.

All those sudden checksum errors across multiple disks usually is an HBA problem.

Ok, will check cables and seating, air flow is great all fans are running and cpu temps are around 29, when taxing the system cpu temps can get to 33 degrees, well below limits. don’t know how to check to check hdd temps and have searched on how to but came up empty. there use to be cpu and hdd temp on freenas gui that seems to be gone. i have tried smartctl -h and -a to no avail

thanks again,

*Airflow and temperature in regards to the HBA.

again how do i do that?

thanks

From the outside, either with touching or an IR-thermometer, or just a visual check. (Is the heatsink lose? Is there a fan attached to it? Is it possible to direct airflow over the heatsink?)

Others who use HBAs might be able to share more tips about managing their temperatures.

temps on hdd’s are at 25, HBA is 90 on the heat sink. Two other pools are fine running off the same HBA. Again all my issues are with corrupted files in the jail centered around plex cache.

It’s the same number of checksum errors, across different drives. This is usually HBA related. It’s unlikely that all three drives started failing at the same time.

(It could also be due to bad RAM.)

I guess my next question is do I have to destroy the pool and wipe the drives and start over or is there a way to delete the corrupt files that are causing the problem? If I delete the Plex plugin will it clear the iocage and the cache and I can reinstall it and start clean and start over? when I built this up that pool had 2 of 4 drives failed and offline. I believe the errors transferred over with the resilver. After the resilver I wiped the old drives and tested them for errors all were clean and I’m reusing them in other pc’s for uncritical data storage and all have been working for two months without problems. SMART runs show nothing wrong. I was looking for the easiest way to clear the errors than what caused them. I’ve now relocated all the data of that pool and can destroy it and start over it’s just a weeks worth of data transfer back that I’m looking to avoid. thanks again for your thoughts and advice.

I want to say this is solved but there is still a problem. I upgraded to dragonfish 24, deleted IOcage and ran a scrub. All the errors are gone, disks have been running over 4 days now without any errors read, write, checksum. The pool still flags as unhealthy. Do I have to live with the flag? Is there a CLI to reset this flag? I’ve tried everything that I have read and found elsewhere again to no avail. It’s taken me the same four days to try and set up Plex through the app page, again reading everything I can find and every video that is out there. I now have it up and running, god only knows how I stumbled on the correct combination as I could never repeat it if I tried. Again I thank you for your responses and your attempt to help. I do really feel sorry for anyone trying to do this as a newbie as I’ve had sleepless nights and blurred vision from staring at this screen for hours on end. Next time I’ll buy a NAS out of the box before I go through this again. Freenas worked and was simple to use and indestructible Truenas took the rule book and tossed it, now this team is like android, every month lets make this obsolete and have something new to work your way through without much assistance. Again thanks for being the only one who responded.