I am fairly convinced that it is a hardware problem. I have tried a couple of different versions of TrueNAS, and even tried OMV, and still see the same issue.
When I purchased the 9300-8i HBA, it came with a FW version lower than 16.00.12.00. Before updating the firmware, I was seeing the same issue, but the error rate was slower - maybe a few errors per day. After updating the HBA firmware, I see way more errors thousands per day.
The SAS drives are refurb. I definitely understand that it is possible to get a bad drive, but have a hard time believing that all drives are bad.
I did a fair amount of research before the build, and landed on the 9300-8i as it was one of the HBAs recommended. I have swapped the HBA with a different one (same model), and still see the same issue.
I don’t have any other drives to try, and really can’t stomach the thought of several hundred $$ just to find out it may not be the drives.
If you have swapped the HBA and the same drives. Have you tried moving the cables used or are they breakouts? Is there any specific pattern to the issues?
I have 4 drives on one set of cables, and the 5th drive on another. I have tried swapping the 2 cables. No pattern that I see - I see failures on multiple drives in either case.
I added info on my case to the initial post (SilverStone CS382). The case has a backplane that supports both SAS/SATA drives.
I would suggest investigating the temperature of your HBA. Those cards are designed for high airflow servers and desktop style chassis often lack adequate airflow. There are a number of PCI slot capable or 3D printed fan brackets, but the HBA likely needs some direct airflow over the heatsink.
HBAs overheating often present as false positive drive errors.
I have an open PCI slot, so I have ordered a fan that will install in the open slot. In the mean time, I have installed additional temporary cooling - i.e. a fan clamped onto the inside of the case, pointed at the HBA card. (It is also worth noting that I currently have the side panel removed from the case.)
I did a zpool clear last night with a clean reboot. Below is the current state of the drives this morning.
It has been a few days, and I have completed quite a bit of additional testing. I have made progress, but still having issues.
Temperature of the HBA definitely appears to have been part of the issue. I have added a PCI slot mounted dual fan, and that has addressed the HBA temp issue.
I also continue to see SMART failures for different devices.
I have tried running the LONG SMART test with the following results:
sda: Aborted 4 times
sdb: Aborted 2 times, then Passed
sdc: Aborted 5 times
sdd: Passed first time
sde: Aborted 1 time, then Passed
@Craig_L - you may very well be correct. The drives were fairly inexpensive.
This is the actual model # I purchased: MD8TSAS12872E
I see 5 stars here, but only 2 reviews:
I see this drive listed here as ‘top 5’ with a 9.3 rating
I am almost at the point of pulling the trigger to purchase new drives. I just hate to spend several hundred dollars without being to fairly definitely proved the drives are the problem.
If I purchase new drives, I will go SATA vs SAS. My backplane supports either. I was hoping for some benefit from 12GB/s vs 6. As mentioned, the price-point was also attractive.
Any recommendations on good 8TB SATA drives for a NAS?
I am pretty sure that MDD/SexPanther/WhateverOtherRandomName drives are just used Seagate/WD/Toshiba drives with a new label and the smart stats wiped. This is likely why you’re getting errors about retrieving SMART data.
I’ve bought a bunch of questionably-sourced HBAs from ebay over the years with a 100% success. Temps are important to keep down, but it sounds like you took care of that.
For cheap drives, I recommend manufacturer recertified from serverpartdeals or gohdd. I’ve bought 100+ drives from the vendors and over the last 2-3 years I’ve only had a couple of drives spit out any concerning errors, at which time they’ve been responsive at replacing them. They aren’t quite as cheap as the ones you bought, but you know what you’re getting.
Between those two, even if the Barracudas were CMR, I would still choose the Exos. I have no experience with buying ‘renewed’ drives and have only bought ‘manufacturer recertified,’ but they are enterprise drives with a longer warranty from gohdd.
Of course, plan for any drive to fail, no matter which brand, whether it’s new or used.
I purchased 5x of the 8TB Seagate Exos drives linked to above.
The drives are due to be delivered tomorrow. I will plan to post an update in a couple of days once I am able to get the drives installed and everything configured and up and running.
Why SATA? You have a SAS controller. When I was looking at drive prices on eBay for NOS drives, the SAS drives were about 2/3 the cost of the same capacity SATA drives. And SAS drives are almost always Enterprise grade.
What are NOS drives? Whenever I’ve shopped for manufacturer recertified or new, SATA has been cheaper than SAS. I guess on the used market, SAS could be cheaper since there are fewer people that can use them.
New Old Stock … in other words, new drives that were never sold but that are not current product. While they typically carry a manufacturers warranty, depending on how old they are, you may already be beyond the warranty period.
I bought a bunch (40) NOS WD RE series 2TB drives a while back and have only had one failure in over 5 years of use. I have had very good luck with WD RE3 and RE4 drives.
In terms of price, I was looking at NOS drives on eBay and 6TB SATA drives were running about $140-$150 while 6TB SAS were running $100 - $120.
I am not comfortable with ‘reconditioned’ drives as I have no idea what that really means.
Side note: all 8 of the Seagate ES.2 series drives I bought failed during the 5 year warranty, none of the ES.3 drives they replaced them with have failed in well over 5 years of use.
I don’t buy reconditioned either. I buy manufacturer recertified. I still don’t know what that entails, but they are backed with a 3-5 year warranty depending on the vendor at a much lower price.