What is (currently) a sane amount of cold spares?

So late last year when ram prices were going crazy, I realized SSD’s were next to go & luckily picked up more than a full set of cold redundant spare SSD’s & NVMe’s. Frankly I’ve recouped most of the “investment” by selling a few on ‘discount’ (compared to current prices). I’m not proud of it.

However, I failed to imagine that HDD’s would also be impacted.

I have 1 cold spare for my raidz2 eight 8TB drive pool that I’ve had rotting away on a shelf for a while now - I’m wondering if I should grab a second one before things get worse. I got WD Red+ all out of stock anywhere near me already, so I’m starting to sweat a worst case scenario where I won’t be able to keep my pool alive long enough to find a 2nd spare for any cost.

My drives are anywhere from 20k-45k power on hours. I’ve currently put all thoughts of a secondary off-site nas on hold & am now just in preservation mode.

Thoughts?

Truth is… no one knows exactly.

Fact is, this is a “costly” question to ask, and a “costlier” one for anyone to answer. :cry:

1 Like

This is a fun tool I found many years ago. It doesn’t support ZFS but you can create a very similar config. I’m not saying it’s accurate but it’s fun.

1 Like

I had years of people telling me RAID-Z3 was overkill and then Covid stuck and for a while I wasn’t allowed in our DCs to replace disks and I was pretty glad I had Z3 then. Once again I’m being told by Seagate and WD they don’t have replacement drives till May at best and again I’m so pleased I picked Z3. When I build a system I’ve always assumed I’m never going to be able to touch it ever again. Now in reality that’s never the case but it’s served me well over the years.

3 Likes

I have the same issue right now. For me the calculation is based on experience with similar drives. If I “know” that my drives live for X years before they start failing, I keep spares according to how long I think I will need to be able to “survive” without new ones. That was one cold spare per pool < 6. It changed to 2 now. Might even consider 3. But yes: it is costly and might be a bad investment IF the situation gets better. But seeing as WD is sold out of drives for 2026 already according to their investors call, I’d err on the side of caution and invest now in slightly overpriced drives instead of not being able to get any in a month or two. Without panic buying.

2 Likes

Well - I went ahead & picked up a second cold spare. I honestly think this is still panic buying because I am in a bit of a panic :stuck_out_tongue:

Hopefully two cold spares keeps my pool alive through the latest tech availability crisis. It feels like these get more common every year…

Now I’m debating if I open it & burn it in or if I leave it sealed. Leaning towards a burn-in because who knows when I’ll need it & better to know it isn’t doa sooner than later.

1 Like

Time for a vacation?

4 Likes

Oh yes: Definitely burn in! Never just shelve a cold spare. It’s like with backups: An untested backup (cold spare) is a corrupt (faulty) backup (cold spare).

1 Like

one for every drive providing redundancy. That was the decision you originally made for the pool safety and provides you with a significant stop gap for failures without actually duplicating the entire pool worth of drives. Which is what your backup is for of course.

Unless you’re already at z3 or got multiple vdevs in a pool with less than vdev count worth of spares, imo hot spares are useless - you may as well move from z2 to z3 or mirror 3 instead of mirror 2 and have the drive be useful all the time instead of running z2 or single mirror pair and a hot spare. It still gets most of the wear but doesn’t contribute to data safety. For that reason, I turned my hot spares into cold a while back.

Or you could engage in some reliability centred maintenance statistics of your own and base your decision on that, but i’d suggest most don’t have a big enough dataset of drive/enclosure etc survivability to make that useful.

1 Like

Yeah, that is what I’ve now grown to in regards for cold spare amount; 2 cold spares for the 2 redundant drives.

Haven’t bothered with hot spares because I’m never far away from my NAS for long enough & if I was I have a few people I could trust to perform a swap.

1 Like

I agree however we were discussing cold spares. We might all be on the same page here after all.

It’s all based on your design layout and the level of integrity you wish to maintain. HDD pricing is indeed going up, though it’s not as drastic as some other components. Pricing can be related to many factors, such as supply/demand, market stability, and stability of your regional currency.

I spec my pools to have a rough life of around 5-7 years, rely on enterprise rated hdds, and past branding reliability. In my case, 1-2 hot spares have been more than sufficient for brands like Western Digital (or previous HGST Helios brands), Samsung, Western Digital, and Crucial for SSDs. For SSDs I tend to 2x my capacity requirements. If I need 1TB, I buy 2TB.

Right around 4-5 years I start pricing for new drives which are usually 2x the capacity for roughly the same cost as the old drives for replacement and start building a replacement pool.

As for cold spares? There’s no such thing as a sane amount generally. It’s as much as you are willing to purchase.

1 Like

Yeah - see I was doing similar, but turns out that this cycle hit the AI craze. Usually I’m luckier with my timings.

:frowning:

Anyway, yeah I’m gonna hope that with my recently purchased additional cold spare I can ride through the current crisis.

All of our TrueNAS enterprise and core\scale systems are RAIDZ2. We tend to keep at least 2 hot spares in the head units of our enterprise appliances with multiple 60 bay JBODs. We’ve not yet had a flash failure and none of our NVME based TrueNAS appliances have hot spares.

2 Likes

I donno - maybe I’m less lucky with flash. I blame Samsung’s firmware.

When my pool drives aged out of warranty, I fortuitously elected to buy a new set just as prices were dipping below $/8 per TB for 10TB He-filled drives and a 5 year warranty. So I have a lot of spares. But then again, a bunch of my drives are from 2017 and likely have been spinning almost continuously since then (except when they were on the shelf at goharddrive.com for a few days).

Assuming you’d consider a Seagate Exos an equivalent to my HGST He10’s, they’re now selling for 3x that there.

Presently, I’d only replace cold spares for busted drive on a “must” basis - with the benefit of a functional backup - i.e. only one spare at a time w/a two cold spare minimum (pool size = 8 drive Z3, adjust as appropriate). For now, I have plenty of spares.

Unlike the Thai floods, there is no constraint on production, just demand shooting up prices. Once used but certified / warrantied prices settle back down to sub-$10/TB levels, I’d pounce and buy more replacement drives. When that bubble will pop is anyone’s guess though.

1 Like

root@prox:~# badblocks -c 2048 -b 4096 -wvs /dev/sda
Checking for bad blocks in read-write mode
From block 0 to 1953506645
Testing with pattern 0xaa: 2.22% done, 11:25 elapsed. (0/0/0 errors)

Wish me luck - hopefully all clean after the ~7 day long process :frowning:

2 Likes

Increasing the -c value can improve the speed.

Never mind. You’re already using a high value.

1 Like

Seems like AI has moved into the HDD space for long term storage, otherwise known as long term profiling. Western Digital is already sold out of hard drives for all of 2026 — chief says some long-term agreements for 2027 and 2028 already in place | Tom's Hardware

Yes, that was what I was referring to earlier in the thread. On a side note: Can we stop calling LLMs AI already? I slip up sometimes myself but this is just pure marketing and overhype of a statistical model. That eats all the RAM, Flash, storage, power and water. And our sanity.

Oh, and of course I bought 2 disks today as well. With the ones I have already it is now 2 cold spares per system except for the Flash-only. Hopefully, we’ll all regret our purchases in a few months due to storage and everything being available again…