I have had a bit of trouble with 10 Ironwolf 10TB HDD’s, when they were about 4 years old, I transferred them into a TrueNAS system made up from on old PC and then started getting a couple of checksum errors.
Over the next couple of months 6 of the drives degraded with errors.
Two of the HDD’s were clicking and powering up and down so I know those drives are dead. The rest of them seemed to be fine according to SMART data.
I was using long SAS 60cm cables to connect them to my HBA in IT mode could that be the cause of the checksum errors?
I ask this as since I moved the disks to new purpose built Mini ITX TrueNAS system none of the remaining drives have had any issues at all the only difference is much shorter SAS cables and not using an HBA (I did need to add a NVMe to SATA to add 2 extra SATA ports), and its been a couple of years.
I am currently running long SMART on the errored drives.
The 4 drives that were not clicking have been sat on a shelf for 2 years waiting to be disposed of but given the current price of replacement drives I want to see if they are ok and keep them as cold spares.
I purchased 2 more HDD’s few years ago to replace 1 errored drive and kept 1 new unused 10TB Ironwolf spare.
UDMA CRC errors are usually associated with bad cabling, connector oxidation, or like issues.
I would suggest you take the drives in question, put them through a SMART Long, full bad blocks, and another SMART Long, using known-good cabling. Then make a decision.
Look for threads here that describe how to test a drive prior to using it in a NAS. Some folk call it provisioning, but looking for bad blocks and tmux commands should do the trick. Good luck.
60 cm should be within spec even for SATA drives (which I assume is the case since you could move the drives to another system without a SAS HBA), but shorter is better. And you may have a bad cable—or insufficient cooling on the HBA.
It is sensible to re-test the drives with different hardware. The error flag is correct, but the fault quite possibly does NOT lie with the drives.
Here is a shell script I was not aware of that rolls all the usual disk burn-in tests into one script, courtesy of @dan (who made me aware) and @dak180, who wrote it. I have never used it but it looks comprehensive. It should be part of the GUI for TrueNAS CE.
Yes, your disk has too many blocks. Badblocks is an old tool, and can only use 32-bit numbers to count them. The workaround is to add the -b flag to tell it to use a larger “block” size: badblocks -wsv -b 16384 /dev/sda.
-b 4096 (4k blocks) is the minimum to handle large drives.
Doing multiple blocks together (-count) speeds things up… but only as far as the drive can actually handle so much in one go. My limited testing concurs that anything between -b 4096 -c 1024 and -b 4096 -c 2048 is the practical plateau; going higher might even be slightly detrimental.
I ran badblocks for 220+ Hours and it found no errors. Strangely after badblocks finished both drives experienced a lot of disk seeking noises for about 45 minutes. AI suggests the disks were performing some internal housekeeping after such a long time of activity.
After the disks settled down I tried to tun Conveyance and short SMART test but they got stuck at 90% on both disks.
I think it was my USB docking station throwing a wobbly since both disks where experiencing the same problem. I know using USB docking station is not ideal, but it was the easiest way of doing the testing with the hardware I have available.
Anyway after a quick reboot and power cycle of my USB Hub the short SMART, conveyance, and extended SMART test completed without error
So for over a week of testing both disks are just fine. To think I was going to destroy them! I am going to test the last two errored disks and then do the same for my old cold spare.
I now understand the value of rigorous testing on disks.
As it happens I have 64GB of DDR4 sitting in a drawer unused and another 32GB from another unused system (the one that caused all this trouble to begin with). Made up of 16GB DIMMS.
I do have a backup already being a USB HDD which I take a robocopy mirror, and a Macrum Reflect files and directory’s image of the NAS every Monday.