Help Determining If Drive Is Bad

Scrub is per pool.

Not sure how to see the status in the gui in scale, but from the shell zpool status will show the last scrub or resilver operation status

1 Like

Is there a way to show past scrubs? Since it is currently resilvering it just shows that instead.

No, but if your notifications are setup properly, you’d have gotten warning messages about it if something had come up.

The SMART test is there to warn you about potential trouble using common statistics that a drive might be about to go bad. The severity of each error varies, so it’s good to peruse good guides like the one from @joeschmuck over here in the old forum.

(There may be a more recent one over here in the new forum as well.)

Depending on the error, I offline, pull and replace the drive ASAP, then resilver. The drive is usually still 100% responsive but there is no reason to tempt fate. That’s why it is also important to have qualified spare drives - give yourself the luxury of a fully protected NAS while the RMA process takes its time.

1 Like

I don’t remember seeing any scrub errors, just that the drive had unreadable sectors.

I’ll take a look at the guide. What do you mean by a qualified replacement? In my case i had a spare of the same drive (although untested)

Qualified means I have put the drive through multiple tests to weed out infant deaths before the drive becomes part of the pool. Usually I do this as a short, a long SMART test followed by bad blocks and another SMART long test. See here.

@spearfoot wrote a nice script to automate it all. I have yet to use it but I reckon it works.

Running these tests is not a guarantee that the spare will function as intended under actual use but every sector / byte / block has been tested at least once. If the drive doesn’t freak out after many hours of hot grinding bad blocks work then chances are it’ll take to a nicely cooled NAS with largely dormant data like mine and call it a vacation.

1 Like

Why do you assume there’s a necessary relationship between the two things?

Your drive hasn’t passed a single SMART self-test in 500 hours of runtime–not a single self-test in the log has passed. Because the log only shows the last 21 tests, we can’t see if the drive has ever passed a self-test. It’s long since time to RMA the drive.

And thoroughly test a drive before putting it into service. The method I generally follow is here:
https://www.familybrown.org/dokuwiki/doku.php?id=fester112:hvalid_hdd

2 Likes

Definitely going to test out the replacement drive. I am planning to build another NAS in the near future and will definitely test those drives as well. Do the burn-in tests use a lot of CPU resources? I have about 6 or 7 drives I will need to test and would love to do them all at once.

1 Like

I am still fairly new to the NAS world, let alone TrueNAS, so I figured if a drive was bad, the pool would throw some kind of warning as well.

Definitely going to stress test drives as I receive them from now on to make sure they are good drives, or at least better chances of being good.

“Pool health” is logical: Data is valid, and has the required level of redundancy.
“Drive health” is physical: Drive is working without defect.

ZFS will strive to keep data valid even if drives are partially failing.

Burn-in is not CPU intensive. You can test as many drives in parallel as you want, using a dedicated tmux session for ech drive if using badblocks, or test the entire array with solnet-array (read only, non destructive).

The first burn in guide I listed shows you how to use tmux to run the bad blocks test in parallel. It is disk intensive, ie 190MB/s per disk. My CPUs seemed to be ok (even the c2750 in my miniXL).

Practically nothing.

They can saturate your I/O resources though. I’ve run burnins simultaneously on 8 disks before.

And just use spearfoot’s script. It does the badbocks and long testing etc. it skips the initial long test these days.

I noticed I have the same drive and the same issues!
I opened a topic here: Critical sector errors for one drive - #11 by Nikotine
Looks like this type of drives are sh*t…

Usually I hear a lot of good things about Seagate drives. Maybe a bad batch? When was your drive manufactured?

There are two manufacturers left (plus Toshiba, I guess, they seem to have had a resurgence) and they’ve both made more than their fair share of dodgy HDDs. Treat HDDs as a commodity and never trust them.
Definitely don’t waste your energy trying to determine which manufacturer is better.

3 Likes

Never buy SMR.

I always buy new NAS drives. Whichever is cheapest when I need one.

And not to contradict @stux, I happen to buy used Helium drives from a reseller that I have had good experiences with. Goharddrive.com will warranty their drives for up to five years.

I have had to return drives and the process was easier than with a OEM which will warranty a new drive only up to 3 years, depending on what drive tier you went for. Plus, it’s really difficult / expensive to get genuine NOS Helium HGST He10 drives now.

I just bought a batch at -$80 / drive, qualified all of them, and have them sitting ready to start replacing the 45+khour drive in the NAS. They all still seem to be very happy though. No issues with SMART, etc.

Pretty much what i read online when i first started looking for drives. But that brought about caution on WD drives, as they quietly changed some CMR drives to SMR, so you had to look at the fine print to determine what to are getting (if it even said). That included the red drives as well.

Might look into that if i ever need drives in the future.