Definately run a very thorough disk test before putting disks into production. Especially used disks. Scrubs only look at used space, and “new” disks has no used space
Badblocks is a useful (if tedious) way of doing this. You can do them all at the same time, but this will take the better part of a week
Remember, do this under tmux and change the block size to 4096.
Possibly Rather even 8k blocks: -b 8192
A side effect is that you’ll be testing your cooling through one week of thermal stress…
Speaking of which:
The Mirrored Boot Overkill Monster has struck again!
May I suggest that you switch to a (single) small NVMe drive for boot, move the six HDDs to the six motherboard ports (perfect match), and do without the HBA? One less part to cool.
I forgot to mention - my drives are outside of RMA (I bought them back in Nov when prices were starting to rise, but had issues building my first NAS, I didn’t get it to finally run until this week).
If that’s the case, should I still run badblocks?
Yes, you want to know the condition of the drives even if you will be unable to get them replaced with a warranty. A scrub will only look at parts in use, unused sectors will not see any testing.
But shouldn’t you still have some form of warranty? I thought SeverPartDeals offered warranties and the drives were, by your description, bought less than 6 months ago.
The primary purpose of running it is to detect drive failure before committing data to the drive; whether you can return it really is irrelevant (though obviously it’s nicer if you can).
Thanks - you’re right about the warranty I believe. I will test with badblocks.
First going to increase the baseline fan speed to constantly run higher, as the drives went from 35C idle to 43C - 46C (depending on the drive) during the long SMART test.
I believe I can’t tie the HDD temps to a certain fan(s), but I’m all ears if you guys have any ideas in regards to the fans, besides just increasing the baseline speed.
PS - Would a simple Full Write Pass , and then running another long SMART test after, be sufficient in my case? Or badblocks much more recommended?
Your drives may get hot. Try to not let them exceed 50C. I don’t know off the top of my head the max temp for your drives are bit 60C is a typical value. At 57C I’d likely shut it down until i could get better cooling. Now this just my opinions so take it as just a point of data.
Your questions make you sound very reluctant to run badblocks. Why is this?
It’s not that badblocks is a perfect tool–it isn’t really even designed to do what we use it for. But your apparent reluctance is curious, to say the least. Run the test, all drives at once, turn up the fan speed if you need to.
One of many problems with Very Large Capacity Drives is also the best feature, they store a lot of data. In order to test it completely, badblocks (all 4 passes) should run, but those high capacity drives will take a very long time. If you do not have an UPS, you might consider getting a good quality UPS with reasonable runtime. This will help with temporary outages (less than 10 minutes) but while testing these drives, if you have a power issue causing the computer to reset, then the test needs to start all over again.
IF, your data is not very important to you, you could run a single pattern and then cross your fingers that all is actually good. This kind of thing has been suggested before, I personally prefer the entire 4 test patterns, and there is a reason for using these test patterns. If it could be done with 2 test patterns, we would all do it.
We are not trying to force your hand, we are just trying to give you the best advice we can. Ultimately it is your decision. That is like someone choosing between ECC RAM or Non-ECC RAM. We will all tell you ECC is better and there are reasons, but it is up to you.
And I will reiterate @NugentS, remember to do it under tmux. If you don’t know how to use tmux, Google is your friend here. I used it last week for tmux, because I have used tmux twice now in several years and I needed to know the commands and what they did.
Thank you all for your tips, I will hook up the UPS I got for it, and then run the full badblocks test, using tmux.
A few more questions:
I’m not sure if the fan curves I set inside BIOS are actually kicking in. Is there a way to monitor the speed of all 4 of my fans, from within truenas?
sensors command wasn’t picking up fan speeds, in shell.
(And I haven’t installed my NVMe I will use for apps yet… not sure if I could install docker apps on the boot pool? Confused on how to go about this).
Is it ok to run badblocks on all 6 of my drives, simultaneously? (24TB WD Ultrastar HD580’s)
Commands I will run for badblocks (please let me know if this is not ideal / incorrect)
SSH into TrueNAS
start tmux: tmux new -s burnin
create 6 panes for my 6 HDDs
Ctrl + B, then %
Ctrl + B, then "
repeat until 6 panes
move between panes: Ctrl + B + arrow keys
for each pane, run this command: badblocks -b 8192 -c 8192 -wsv /dev/sda | tee /root/sda.log
(replace sda with sdb, sdc, etc, for each HDD)
detatch: Ctrl + B, then D
come back later: tmux attach -t burnin
I will be periodically monitoring temps via TrueNAS GUI
Short answer is “you can’t install anything else on the boot pool”, long answer is “you shouldn’t install anything else on the boot pool, but if you understand the risks & know your cli, then it is possible”.
If your board does not have IPMI, and I think you do not have a server board, then I don’t think you will be able to monitor the system fans, but I don’t know everything. As for the BIOS fan curve, you typically pick a sensor to monitor and the fans respond based on that. On my ASRackRock B650D I can pick each fan individually in the BIOS and set it up to different sensors. I can also set it to Full speed if I desire. For the hard drive burn-in, I would set your case fans to full speed. The drives will heat up, especially if they are close together. The ones in the center you should monitor for temp. After 2-3 hours, it probably will not get any hotter.
Update: I had to increase block sizes to 8192 to get it to work. Will that make the test less accurate?
Ran this command to get it to work:
badblocks -b 8192 -c 8192 -wsv /dev/sdX | tee /root/sdX.log
When using badblocks -b 4096 -c 8192 -wsv /dev/sda | tee /root/sda.log I was getting the error: Value too large for defined data type … must be 32-bit value