Drive errors during resilvering

TL;DR Is it safe to shut down the PC and remove a drive that is getting resilvered or will this break the pool?


I just plugged in a new drive and immediately put it to work replacing a failed drive. Within 30 minutes I received a notice of 16 uncorrectable sectors on the new drive. Checking smart info confirmed sectors and a >0 number of read errors.

Since then it has changed it’s story, now claiming re-allocated sectors witha much higher error rate.

The resilver has been taking an excessive amount of time, much longer than my only previous resilver. After a full 24 hours, it is only 1.62% done, with the amount scanned sitting at 12.4T since 2 hours after start.

At this rate, Truenas estimates it to take months to complete.

image

Likely because the drive is writing at a max of 11MB/s…

So, lesson learned on not testing a drive before dumping it into a pool. That said, I would be quite happy to start an RMA with Seagate for this drive and remove it from the system, but resilvers are not cancellable.

Can anyone advise whether it is safe or worth shutting down the PC and removing the drive, or if this will completely destroy my pool somehow? I don’t really feel like having my machine sit around for potentially months attempting to resilver a faulty drive…

My advice is just for what I would do and “I don’t think” anything will harm your pool as you have a RAIDZ2 and only one failed drive.

I would tell TrueNAS to shutdown. Once powered off, replace the failed drive with another drive and power on. I’m assuming you have a good drive to use. You could just power on and your pool “should” come back online even if you do not install a replacement drive. But I’m not the expert. I do know that if my resilver was taking this long, then to me it is a failure.

You might be better off waiting for someone else to chime in.

3 Likes

I’d be happy for some other chimes, for sure, but I appreciate your input as this is uncharted territory for me.

I would agree wth Joe. With Raidz2 you should be alright to just do a regular shutdown, swap the drive for a known good one and power back on. JUST MAKE SURE YOU ARE REPLACING THE CORRECT DRIVE. While losing a second drive accidentally on a raidz2 won’t kill it, don’t tempt fate.

3 Likes

100% correct; the sda/b/c/etc like to change at random & without warning during a reboot. Triple check the SN#s & what they’re assigned to after your boot (sda/etc)

2 Likes

After a manual backup to be safe, I have rebooted and my pool is still working lol. :slight_smile: I have yet to plug in a replacement drive as this was sadly my only 12TB laying around.

Not quite sure what actions I will have to take when I do put in a new drive to get the resilver working again, but I am hoping it goes easy. Right now it gives me the option to detach or replace it, which both sound like ways to get where I want to be. I will figure that out when I get there.