I have 3 x 10TB WD Red Plus drives. Should I consider them consumer grade junk?
I had a situation where I managed to bump the power cord of my Truenas and it went down… not a power surge off/instant on… it completely went down. Restarted and there was some ZFS activitiy in the startup screens to recover everything. All was fine after restoart.
Left the system running overnight. Next morning came in and heard one of the drives spinning up, then there was a grating sound, a click and console would throw a mean looking error and the process would repeat… looked like it had been going on for hours and hours.
It’s just a test system so I took it down, booted up again. same behavior.
took it down again took the drives out of the case and then put each drive into a usb external enclosure. Connected it to another PC and tested the drives under Windows with WD tools:
WD Drive Utilties didn’t see the drives
WD Discovery didn’t either … even though Windows 11 Disk Management could see the disks
Eventually blew away the partition on one and formatted with Win11 Disk management , then … WDD could see the drive but none of the utilities appeared.
Finally found WD Kitfox and it could see the drives. ran tests on the two drives … both passed all the Quick tests with not one abnormal sound.
Does that sound like a false positive on “all green” statuses with Kitfox? Is there some other utility to test with?
Some tools can’t recognize drives in USB enclosures because the USB interface chip & it’s firmware may not have been designed to pass through all things, like SMART needs too.
Drives can fail early on, thus the common consensus is to test before use in production. These forums have various opinions on how much testing is needed, so search around.
Sudden power offs can damage disks. Its rare but happens.
It is not clear in your message if you tested the noisy drive. At one point you said:
I have 3x10 WD Red Plus drives. In this iteration of testing, I only put two of the drives into the server and set up Raid 0 mirroring with 2 of the drives.
I tested the noisy drive and the other …no difference in WDFox tests… and no noise from the noisy drive at all.
I have a startech drive dock on order. Wondered what software tools I should use once it arrives.
p.s. i also wondered if i encountered a sata port issue. the motherboard is a used iBuyPower/ASRock OEM pull which is new to me but 2018 vintage release… exhibits no issues as far as I can tell.
Haven’t had clicking/noise due to port, but I’ve had crc errors & drives dropping as they endlessly reset from sata 3.0 to sata 2.0 speeds before failing out. Those would make sense to qualify as a faulty sata data port to me.
Otherwise, SMART long test, followed by checking the results & then a full pass of badblocks (followed by another round of smart long tests) would take about a full week for a 10tb drive & would test every single sector of it multiple times. The best way to burn in & test a new HDD as far as I’m concerned.
What kind of interface does it use to connect to the system? Is this dock for the TrueNAS system?
I was testing the drives on a windows system separate from the Truenas system. Maybe this goes back to the question above - are there Linux utilities I should be testing the drives with. I did want to test this dock to see if it could work as a temporary/manual backup option to plug a drive in, back things up, take the drive offsite and then swap in another drive for that backup (realizing this is complicated as Truenas wants fixed backup targets - looking into options for a longer term solution but for now was just going to use a manual process)
no worries. thanks for the heads up. I can RTF(ine)M. Good to know about burn in.
At this point I blew away one of the partitions trying to figure out which WD utilities would work with these drives. Just in test mode with these newly purchased drives and hoping I haven’t encountered a failure at this point.
both drives are running under badblocks -w tests right now. Interesting how it slows down. It looked like it would finish in a few hours based on the first 2 hours of each drive’s run. Significantly slower as time progresses. Only about 25% done after 16 hours.
when badblocks runs overnight and I don’t check it via tmux, it seems like it slows down significantly … as mentioned, in a couple of hours it was 10% done on each drive. By morning (at 16 hours, it was only 25% done)… now 4 hours later it’s over 50% done (with me checking in every once in a while via tmux in the truenas shell).
What factors into what priority badblocks runs at? Saw some articles on niceness settings but it indicated that tmux session attach/detach doesn’t bump niceness so … what is causing faster processing when the session is occasionally viewed versus slowness overnight when no one is looking at the sessions?
There was a reply in that link that says something close to “badblocks alone cannot be trusted”, and I think that says it best.
Smart tests before & after badblocks are still necessary to compare - you never know when the drive silently realocates sectors & prevents failures (this, to be fair, is a good thing & well designed). But badblocks would force smart to update pending/reallocated sectors, even if the runs pass.
Nothing in life is guarenteed, but for a new drive to pass smart long, badblocks, then smart long again without anything at all popping up? It gets some trust in my books.
Anything to be concerned with? The errors were thrown when the drive was configured under truenas but all the subsequent testing with badblocks and smartctl long have only reported the same numbers I saw when I started testing after the errors under truenas.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID#
ATTRIBUTE_NAME
FLAG
VALUE
WORST
THRESH
TYPE
UPDATED
WHEN_FAILED
RAW_VALUE
1
Raw_Read_Error_Rate
0x002f
200
200
51
Pre-fail
Always
-
0
2
Throughput_Performance
0x0024
100
100
0
Old_age
Offline
-
0
3
Spin_Up_Time
0x0027
225
144
21
Pre-fail
Always
-
9725
4
Start_Stop_Count
0x0032
100
100
0
Old_age
Always
-
174
5
Reallocated_Sector_Ct
0x0033
200
200
140
Pre-fail
Always
-
0
7
Seek_Error_Rate
0x002e
200
200
0
Old_age
Always
-
0
8
Seek_Time_Performance
0x0024
100
100
0
Old_age
Offline
-
0
9
Power_On_Hours
0x0032
100
100
0
Old_age
Always
-
406
10
Spin_Retry_Count
0x0032
100
100
0
Old_age
Always
-
0
11
Calibration_Retry_Count
0x0032
100
100
0
Old_age
Always
-
0
12
Power_Cycle_Count
0x0032
100
100
0
Old_age
Always
-
88
192
Power-Off_Retract_Count
0x0032
200
200
0
Old_age
Always
-
76
193
Load_Cycle_Count
0x0032
200
200
0
Old_age
Always
-
104
194
Temperature_Celsius
0x0022
110
101
0
Old_age
Always
-
42
196
Reallocated_Event_Count
0x0032
200
200
0
Old_age
Always
-
0
197
Current_Pending_Sector
0x0032
200
200
0
Old_age
Always
-
0
198
Offline_Uncorrectable
0x0030
100
100
0
Old_age
Offline
-
0
199
UDMA_CRC_Error_Count
0x0032
200
200
0
Old_age
Always
-
0
200
Multi_Zone_Error_Rate
0x0008
100
100
0
Old_age
Offline
-
0
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 397 -
# 2 Short offline Completed without error 00% 245 -
# 3 Short offline Completed without error 00% 243 -
looks like it’s power supply related. I started getting clicking/then attempt spin up the drive on the same drive that this was happening to prior to the badblocks/smartctl checks … runs for a few seconds after click, repeat…. sitting there in idle mode. Kept getting “ata5 not available” on the console (it was writing the error right into the “enter an option from 1-10:” on the actual console screen; this is new, it was not doing that in last go-around).
Changed everything out related to the drive - SATA Cable, moved it to the port where the drive in the mirrored pair was working fine (moved the drive without issue to the next SATA port). Problem continued. Finally got a MOLEX → SATA power splitter and powered the affected drive from that (still the same power cable from the PSU, just a different connector). It has gotten one error since doing that late last night and that was right at startup … after that, no further errors for 12 hours.
Will replace PSU as the current PSU is an Amazon, no-namer, 500W, $24.99 special from 2016. I’d only ever run 1 x SSD + 1 x spindle HDD on that power supply (with a different, much older motherboard in this case, the replacement has an AMD Ryzen 3200G so the power schedule has creeped up on all sides of this old PSU) … now I have 1 SSD + 2 x spindle HDDs and they’re 7200rpm so I assume that either that connector is bad or the combined spinup power requirements are not quite met by this PSU. Will put a Corsair in there and see if it clears the problem permanently.
p.s. a lot of learning for a simple NAS backup. Reason I was going minimal on this hardware is that I plan to put this TrueNAS PC in one of my relatives houses across the street and then setup a VPN link to come up once a day and sync my QNAP NAS to this TrueNAS setup. Rsync is connected and working from QNAS to this machine but I’m still not 100% convinced it’s worth it.
Haven’t yet been able to get QNAP do only do new/changed file replications once an initial rsync is completed. It’s copying everything on every rsync initiated by the QNAP.
it’s horribly slow even on a LAN (I realize that’s my infrastructure). Kind of wondering if I dump a 128GB phone full of photos, will that sync finish in… days if I’m connecting across VPN to another (consumer grade) cable modem setup.