So late last year when ram prices were going crazy, I realized SSD’s were next to go & luckily picked up more than a full set of cold redundant spare SSD’s & NVMe’s. Frankly I’ve recouped most of the “investment” by selling a few on ‘discount’ (compared to current prices). I’m not proud of it.
However, I failed to imagine that HDD’s would also be impacted.
I have 1 cold spare for my raidz2 eight 8TB drive pool that I’ve had rotting away on a shelf for a while now - I’m wondering if I should grab a second one before things get worse. I got WD Red+ all out of stock anywhere near me already, so I’m starting to sweat a worst case scenario where I won’t be able to keep my pool alive long enough to find a 2nd spare for any cost.
My drives are anywhere from 20k-45k power on hours. I’ve currently put all thoughts of a secondary off-site nas on hold & am now just in preservation mode.
I had years of people telling me RAID-Z3 was overkill and then Covid stuck and for a while I wasn’t allowed in our DCs to replace disks and I was pretty glad I had Z3 then. Once again I’m being told by Seagate and WD they don’t have replacement drives till May at best and again I’m so pleased I picked Z3. When I build a system I’ve always assumed I’m never going to be able to touch it ever again. Now in reality that’s never the case but it’s served me well over the years.
I have the same issue right now. For me the calculation is based on experience with similar drives. If I “know” that my drives live for X years before they start failing, I keep spares according to how long I think I will need to be able to “survive” without new ones. That was one cold spare per pool < 6. It changed to 2 now. Might even consider 3. But yes: it is costly and might be a bad investment IF the situation gets better. But seeing as WD is sold out of drives for 2026 already according to their investors call, I’d err on the side of caution and invest now in slightly overpriced drives instead of not being able to get any in a month or two. Without panic buying.
Well - I went ahead & picked up a second cold spare. I honestly think this is still panic buying because I am in a bit of a panic
Hopefully two cold spares keeps my pool alive through the latest tech availability crisis. It feels like these get more common every year…
Now I’m debating if I open it & burn it in or if I leave it sealed. Leaning towards a burn-in because who knows when I’ll need it & better to know it isn’t doa sooner than later.
Oh yes: Definitely burn in! Never just shelve a cold spare. It’s like with backups: An untested backup (cold spare) is a corrupt (faulty) backup (cold spare).
one for every drive providing redundancy. That was the decision you originally made for the pool safety and provides you with a significant stop gap for failures without actually duplicating the entire pool worth of drives. Which is what your backup is for of course.
Unless you’re already at z3 or got multiple vdevs in a pool with less than vdev count worth of spares, imo hot spares are useless - you may as well move from z2 to z3 or mirror 3 instead of mirror 2 and have the drive be useful all the time instead of running z2 or single mirror pair and a hot spare. It still gets most of the wear but doesn’t contribute to data safety. For that reason, I turned my hot spares into cold a while back.
Or you could engage in some reliability centred maintenance statistics of your own and base your decision on that, but i’d suggest most don’t have a big enough dataset of drive/enclosure etc survivability to make that useful.
Yeah, that is what I’ve now grown to in regards for cold spare amount; 2 cold spares for the 2 redundant drives.
Haven’t bothered with hot spares because I’m never far away from my NAS for long enough & if I was I have a few people I could trust to perform a swap.
It’s all based on your design layout and the level of integrity you wish to maintain. HDD pricing is indeed going up, though it’s not as drastic as some other components. Pricing can be related to many factors, such as supply/demand, market stability, and stability of your regional currency.
I spec my pools to have a rough life of around 5-7 years, rely on enterprise rated hdds, and past branding reliability. In my case, 1-2 hot spares have been more than sufficient for brands like Western Digital (or previous HGST Helios brands), Samsung, Western Digital, and Crucial for SSDs. For SSDs I tend to 2x my capacity requirements. If I need 1TB, I buy 2TB.
Right around 4-5 years I start pricing for new drives which are usually 2x the capacity for roughly the same cost as the old drives for replacement and start building a replacement pool.
As for cold spares? There’s no such thing as a sane amount generally. It’s as much as you are willing to purchase.
All of our TrueNAS enterprise and core\scale systems are RAIDZ2. We tend to keep at least 2 hot spares in the head units of our enterprise appliances with multiple 60 bay JBODs. We’ve not yet had a flash failure and none of our NVME based TrueNAS appliances have hot spares.
When my pool drives aged out of warranty, I fortuitously elected to buy a new set just as prices were dipping below $/8 per TB for 10TB He-filled drives and a 5 year warranty. So I have a lot of spares. But then again, a bunch of my drives are from 2017 and likely have been spinning almost continuously since then (except when they were on the shelf at goharddrive.com for a few days).
Presently, I’d only replace cold spares for busted drive on a “must” basis - with the benefit of a functional backup - i.e. only one spare at a time w/a two cold spare minimum (pool size = 8 drive Z3, adjust as appropriate). For now, I have plenty of spares.
Unlike the Thai floods, there is no constraint on production, just demand shooting up prices. Once used but certified / warrantied prices settle back down to sub-$10/TB levels, I’d pounce and buy more replacement drives. When that bubble will pop is anyone’s guess though.
root@prox:~# badblocks -c 2048 -b 4096 -wvs /dev/sda Checking for bad blocks in read-write mode From block 0 to 1953506645 Testing with pattern 0xaa: 2.22% done, 11:25 elapsed. (0/0/0 errors)
Wish me luck - hopefully all clean after the ~7 day long process
Yes, that was what I was referring to earlier in the thread. On a side note: Can we stop calling LLMs AI already? I slip up sometimes myself but this is just pure marketing and overhype of a statistical model. That eats all the RAM, Flash, storage, power and water. And our sanity.
Oh, and of course I bought 2 disks today as well. With the ones I have already it is now 2 cold spares per system except for the Flash-only. Hopefully, we’ll all regret our purchases in a few months due to storage and everything being available again…