Tmux how do I make sense of what is going on with badblocks?

I am following instruction to do burn-in of disks. tmux session with 3 panes doing ‘badblocks’ commands for sda sdb sdc

One of the disks started produce errors and I am not sure how to make sense of what is going on right now.

  1. Only 3rd pane shows correct “process” %. First 2 panes filled with those errors and I can’t see progress or understand if it’s for sda or sdb.
  2. Maybe advice on correct SSH client so I get better formatting (split line, etc). It’s on Windows.

I can move cursor between panes by using Ctrl-B and up/down arrows but that doesn’t reveal any progress. Just trying to understand what is going on right now…

Were the earlier tests using non-destructive badblocks?
And is the affected disk a WD Red SMR by chance?

It is always helpful to list hardware, like disk model in this case.

3 Likes

Ok. Some background.
YES, it’s WD REDs 4TB drives. WD40EFRX purchased 2015. They were used part-time in RAIDZ mirror with no problem. All SMART tests look good.

I bought 4 more of the same (WD40EFPX) just now. And did go through SMART/badblocks for those.
Idea is to build RAIDZ2 with 6 drives (4 new and reuse old)

  • “Exported”/disconnected pool (after taking backups somewhere else)
  • Ran destructive badblocks on both of those old drives: badblocks -b 4096 -ws /dev/sdj

I have hard time believing that both of those drives which were working perfectly now bad for some reason. Yet they go shortly (after a minute or so of badblocks) into loop of

badblocks: Invalid argument during seek
46141370
badblocks: Invalid argument during seek
46141371
badblocks: Invalid argument during seek
46141372
badblocks: Invalid argument during seek

And when they get to end something else interesting happens.
TrueNAS “freaks out” and changes letters for those drives.
They were sda/sdb, then they became sdg/sdj, and then they changed again.

I feel like it’s something to do with zfs being on disks prior to this? SMART tests look good. There is couple questionable entries, but according to my research it’s OK

Bad blocks seems like the perfect workload to show why you don’t use SMR disks.

My advise would be firstly, don’t use SMR disks in a RAID, and secondly, if you must, don’t do a whole disk write, which is what bad blocks is.

1 Like

Pardon what is SMR? And how about new 4 I got? They checked ok. Should I go with something else instead?

Those are not SMR. They are CMR. :+1:

2 Likes

Yes. CMR. False alarm.

Not false alarm.

The previous disks listed are in fact SMR.

Never mind.

That was a false alarm about your false alarm not being a false alarm.

1 Like

This is a good example of why I hate what WD did.

Pretty safe to buy Seagate IronWolf instead, which is what I do.

Why not just buy WD Red Plus? Why reward them.

Anyway, back to OP.

The point of burning in is to find issues. You’ve found issues. Now you need to work out if your issues are real.

I’d wait for the other drives to finish, and then take it from there.

2 Likes

The only other “test” is just old 1TB drive I had and it’s almost done. 4 new drives tested OK (took like 50hr)

Old drives also “NAS” and according to table CMR. And from my memory (been awhile) they did burn in just fine. And SMART tests have some questionable items but OK.
Very very strange how they behave. And there was no issues with data, no issues with SCRUB tests. Weird.

Unbelievable. How? Both disks. I ran smart test on both of those disks, they failed and this is what I see for them:

How is it even possible? I guess data was stored in “good parts” and when I ran badbocks it triggered those and Smart test found it? But prior to badblocks smart test was good, does it make sense?


Sh!t happens. Especially with nine year old hardware.

1 Like

Not sure you can read too much into the error “rate” values

The drives self correct everything and expect errors.