Cryptsetup for burn-in test of large HDDs?

dasdreHmomenT · January 26, 2025, 10:22pm

Hi,
I have just bought two 20tb HDDs (one Hitachi, one Seagate) for my system and this time I wanted to “do it right”, especially considering the expense.

So I started to follow the recommendations from the old forums for burn-in, doing the SMART-tests, short, then conveyance (not supported by Hitachi btw) and long (will finish sometime tomorrow).

After that, the recommendation is badblocks. from what I read just specifying -b 4096 will not suffice for 20tb as the limit is 16tb. From what I understood setting an even bigger block size could lead to false negatives. So I searched around and on archlinux wiki (badblocks - ArchWiki) there is mention of using cryptsetup to write to the whole disk instead.

So here is my question: What do the sages here think about that approach?
I reckon that it is (only) equivalent to one pass of bad blocks?
Is cryptsetup even available on TNS?
What is your go-routine for burn in for large HDDs?

Thanks in advance!
ht

dan · January 26, 2025, 10:31pm

Honestly, I’m not too concerned with the output of badblocks; I expect any true drive errors would be shown in the SMART data. The main thing I want badblocks to do is to (1) read/write every block on the drive, so that the SMART system will recognize any issues; and (2) stress the drives for a bit, in the hopes that if they’re going to fail, they’ll fail then rather than after I commit data to them.

dasdreHmomenT · January 26, 2025, 10:38pm

Thank you for the quick answer. I see. So you would recommend just going with a bigger block size? Which one though?
The linked alternative suggestion also says it would be faster. This would definitely be a plus, considering that my back of the envelope calculations got me to over a week for testing 20tb with badblocks.

dan · January 26, 2025, 10:40pm

That’s what I’d do.

That’s a feature IMO, not a bug.

winnielinnie · January 27, 2025, 1:00am

I don’t understand the purpose of using cryptsetup to write nothing but “zeros”?

If a write is going to fail, it would fail without encryption anyways.

If you dd zeros across the disk, even without a layer of encryption, then you can still check if all the writes were in fact “zero” using cmp.

etorix · January 27, 2025, 8:29am

I have an Ironwolf Pro 20 TB going through badblocks -wvs -b 8192 -c 64 right now. It took a little 51 hours for the fist pass, so, yes, over one week for the whole thing. You may, however, abort anytime you feel confident enough… or do one month burn-in, as jgreco did.

What’s more annoying is that the long SMART test takes over one day to complete. I’ll have to revise my “daily short, weekly long” scheme…

dak180 · January 27, 2025, 11:28am

I have a script for that.

dasdreHmomenT · January 27, 2025, 9:22pm

Thank you all for your input! I really appreciate it.

(I think) I see why you think so. But it is very, very long for big disks. This is my only system and my production pool has to be offline while I test, so quicker would definitely please me. Honestly I don’t expect my disks to be stressed for more than an hour at most in the future, so I’m trying to test thoroughly but not excessively. I really only want to find whether the disks come with defects that should warrant a replacement. If the established knowledge is that that can only be said after four passes of badblocks I will probably go ahead with it. I would however be happy if someone has a well-grounded different opinion on this…

Yeah, it will be daily/monthly for me as well…

Thank you!

I don’t either. I thought there was an idea behind it. But I don’t know. Does anyone here?

Btw, cryptsetup is a no-go anyway, it appears dm_mod is not part of TNS (anymore), see Upgrade Scale with legacy LUKS pool .

So, what do you think, will dd and cmp take as long as one pass of badblocks? What would I miss besides stressing the drives for longer?

Thanks again, everyone!

winnielinnie · January 27, 2025, 10:32pm

It’s up to you if it’s really worth the extra passes and time spent on “burning in” the drive(s).

I, personally, would be fine with a single pass of zeros and checking to see if any writes failed. As @dan mentioned above, SMART will pick up any read/write errors along the way, which can be reviewed with smartctl.

etorix · January 28, 2025, 9:17am

Some form of burn-in is recommended, but not necessary, and you fully get to define what to do and what you deem to qualify as “enough”.
Burn-in can be done anywhere, from any system. External USB enclosure? Yep.
You do not have to do it on your main NAS, and certainly not by taking down the pool.
Triple-check any destructive command before hitting “Enter”, it’s a good habit…

dasdreHmomenT · January 29, 2025, 10:19pm

You’re right of course, it’s my decision how much I want to stress them.

Well, my problem is not enough power (on the 5v line, which is why I’m switching from 5*2,5 inch to a mirror of 3,5 inch drives, but that is a different story) so I had to take down the pool to test the new drives.

Anyway, I settled on one pass of badblocks. But now I find that I can’t run tmux. I get the error mentioned in this thread (with zsh, but the same on any shell). Now I’m hesitant to do it without a way to reattach in case the ssh session is closed…

Always good advice!

dasdreHmomenT · January 30, 2025, 9:42pm

Well, for anyone who is stupid like me and comes across this:
As long as the bug in sudo mentioned in the linked thread is not fixed in Debian/TNS, you cannot run tmux as root and it doesn’t help either to set tmux as the standard shell for root. In both cases you get the argv[0] mismatch, expected "zsh/bash" got "-zsh/bash" error.
But you can run tmux as truenas_admin (or probably any other user) and go su inside tmux.
Hope this helps somebody (before having a broken pipe, like I had…)