Hi All,
I’m most of the way done migrating from a synology system over to a new TrueNAS Scale 24 (latest) install on a supermicro dual xeon I picked up. 12 bay system. I have 8 drves in now with mirror VDEVs, all same model drives. And one is faulty and I’m not sure what to do. Here’s how I got here:
I started w/ new (refurbished) Seagate EXOs 4 x 16TB HDD in mirror vdev stripe to make a pool.
After had migrated my data to the new pool from my synology which also had 4 x 16 TB in BTRFS, I pulled those drives out and set them aside to wait ‘just in case’, and I loaded my old 4 x 4 TB drives from way back into the synology.
I then backed up key data from the TrueNAS system onto the old drives on the synology in JBOD as that will be my 3rd backup (first is cloud, second NAS, now third is syno).
At that point I feel good enough to take the 4 x 16 TB drives with I had set aside and wipe them while adding them as more mirrored VDEV to the same pool. Size increases. Looks great.
When I had backed up the data from synology, it went onto 4 drives. People talk about a ‘reflow’ to redistribute data which I wanted to do in this way. I also didn’t find an instant way to move the data within the datasets, so I set up a copy (in midnight commander) and let it rip. I had moved some of the data from the syno backup to target dataset folders with rsync before which was good but I wanted to try midnight commander. This one is a lot of data.
About 9 hours into the 15 hour copy, i get an error that one drive has increased error count, which is now at 40 read errors. A corresponding error also says " Pool EXOS-Pairs state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
The following devices are not healthy:
- Disk ST16000NM001G-2KK103 xxxxxx is FAULTED" (do you guys keep serial numbers private?)
zpool status for the drive says
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use ‘zpool clear’ to mark the device
repaired.
also “FAULTED too many errors”
I am running a SMART test on the faulty drive right now. The big MC copy event ended without issue. I am backing up non-essential stuff to the synology because the faulty drive isn’t showing activity in its faulty state.
Should I wait until SMART is done tomorrow and run a SCRUB and will the drive, that appears to be not offline but not online, be included in that SCRUB?
Should I use that interesting function to remove that whole mirror VDEV which, I think, would migrate a few TB of data to the remaining 3 VDEVS (6 drives)?
Will the faulty drive be salvaged? I had a funny bug like this years ago when I first got my synology going and a simple system repair fixed it and never saw a problem again.
I did already order a replacement drive to be here in 2 days. I just don’t want to get screwed over in the 3-4 days it will take to get past this. Key data is at least in 3 places and data I really like is still on the new system and synology JBOD. Other stuff I kinda like, but don’t need, is all that is at risk at this point. But it would sure be a big bummer if it were gone; Just not worth a ton of expensive cloud storage.
Sorry for the noob style of this. It’s my first post and I read various forum posts across the internet about this awesome system and I hope I’m not breaking too many noob rules. Haha. Been on many forums back in the day and I know people have a way of doing stuff.
Please help me make a good choice. I kinda trust in the other refurb EXOs to hold out.