SCALE + LSI HBA can (potentially) shorten the lifespan of your hard drives

tannisroot · July 8, 2024, 6:05am

Warning to folks out there: by default, SCALE (all versions AFAIK) has a default for LSI HBAs that may potentially shorten the lifespan of your hard drives!
I am not sure about other LSI HBAs, but a while ago, I’ve discovered that my array of Toshiba MG09 18TB drives connected to an LSI 9217-8i (IT mode), even if the NAS was shutdown/rebooted properly, never spun down and instead used the drive’s emergency retract feature.
You can verify this is happening by running sudo smartctl -a /dev/sdX where X is the drive’s letter you can check in Web UI by going to Storage → Disks. The parameter you are looking for is Power-Off_Retract_Count. Try to reboot, and you may see that its value goes up by 1.
Before I found a workaround for this, my relatively new drive that never suffered power loss and was used exclusively under SCALE managed to have a value of Power-Off_Retract_Count at over 200, which is a lot!
According to the user manual of my disks:

Emergency Unload is mechanically much more stressful to this drive than a
controlled Unload.

(Power-Off_Retract_Count represents the number of Emergency Unload events that happened in the lifetime of the drive)
So while it may not cause direct failure, according to various resources and forums online, it can still prematurely wear out the mechanics inside the drive, thus permanently damaging it and shortening its possible lifespan, and maybe worsening its performance. I can also see how data integrity may be compromised by this too.
Thankfully, you can fix this by creating 2 Init Scripts.
The commands for each are the following:
for i in /sys/class/scsi_disk/*/manage_system_start_stop; do echo 1 > $i; done
and
for i in /sys/class/scsi_disk/*/manage_runtime_start_stop; do echo 1 > $i; done
Since adding those, the Power-Off_Retract_Count value stopped increasing.

I adressed this on Jira with IX and suggested that they should change this default behavior under SCALE, but, despite the manufacturer explicitly stating in the user manual:

The minimum number of Emergency Unloads that can be successfully performed
is 50 000. Emergency Unload should only be performed when it is not possible to perform a
controlled Unload.

the reply from iX was

However, it says that the recommend emergency unload operations is 50,000. That’s an insane amount and I don’t ever think one would ever seriously hit that before having to replace the drives altogether.
…
I do agree with you that it’s an issue, but it’s a low-risk issue since the manufacturers documentation states 50K emergency unloads is the maximum.

which doesn’t really fit with Toshiba’s statement that this should not be the default behavior, and that this is a number of emergency unloads the drive is capable of performing and still function fine if you do that and last as long as it’s meant to. + the 50000 number of Emergency Unloads is for this particular Enterprise rated model only and may be lower for other, cheaper models. And since not all drives are made equally, the real number of Emergency unloads you can do safely in real life may be lower.
You can read the full bug report here: TrueNAS - Issues - iXsystems TrueNAS Jira
Personally, I think any reasonable user will want to adhere to the manufacturer recommendations and run their hard drives in optimal conditions to make sure they last as long as possible, to reduce operational costs and the personal environmental impact, but this default behavior prevents that unless they are aware of it

dan · July 8, 2024, 7:49am

Seriously?

A very important point that you’ve failed to substantiate among your hysteria is that a “power-off retract” is the same thing as an “Emergency Unload”–because only if that is the case is any of the rest of this relevant.

And then you have the issue that this field increases by one on each reboot, and the drives are spec’d to handle tens of thousands of these. Even if the spec were only 10k instead of 50k, and you were rebooting twice a day, you’d hit that 10k in 13 years.

But points for one thing: this is the most hysterical FUD I’ve read in a long time.

Protopia · July 8, 2024, 8:25am

I agree with @dan that this is not the most urgent of bugs to fix, nevertheless I do agree with @tannisroot that this is not something that should be happening.

The failure of a drive is IMO typically going to be due to a combination of stress factors e.g. thermal stress, unexpected accelerations i.e. knocks, head wear and tear, etc. Failing to fix a known stress factor is IMO not a good idea.
When a manufacturer says 50,000 Emergency Unloads, that is likely to be a Mean Number Before Failure rather than a Minimum Number Before Failure. The key word here is Mean, and in all likelihood that is a Mean of a Normal Distribution curve - and the implication is that some drives will survive 100,000 Emergency Unloads, whilst some drives will fail with (say) 10,000 Emergency Unloads (and indeed some drives - allbeit VERY VERY small number - will fail with only 100 Emergency Unloads). And of course, some manufacturers might have a much lower failure figure than 50,000.

So IMO whilst this is NOT an urgent thing to fix, it is something that should be fixed at some point and not simply disregarded.

I do wonder exactly who made this judgement that 50,000 is so high that doing Emergency Unloads as standard is acceptable, despite being against the manufacturers recommendation, and I do wonder whether they are an inexpert 1st line support guy or someone who is sufficiently technically knowledgeable to understand the importance of following manufacturer recommendations and understands the actual meaning of the 50,000 figure.

tannisroot · July 8, 2024, 9:01am

I kind of forgot to explain both are the same thing, made the correction.
I did not mean for it to sound like fear mongering, I just hate the idea of people’s hardware they paid hard earned money for being treated in a subpar way when the fix is something so simple, so I tried to make the post more noticeable hoping that more people see it and apply the fix.

tannisroot · July 8, 2024, 9:04am

Also, I really don’t appreciate being labeled a troll for expressing my genuine concerns.
I did not imply that this is critical bug that needs to be fixed ASAP, but it is really not something that should just be ignored and labeled as a non-issue, trolling and “FUD”.

dan · July 8, 2024, 2:16pm

“SCALE unloads drives using method X, while manufacturers recommend method Y” is perfectly fair. If the limits are in the tens of thousands, as the only example you’ve shown is, it’s fair to question whether this needs to be a priority, but it’s nonetheless a valid criticism.

What’s not a valid criticism–and is FUD IMO–is the “damage your hard drives,” “atrocious default behavior,” “sabotage,” and the other inflammatory rhetoric you decided to use. No user’s going to get anywhere close to those specs with anything like sensible use of the system^[1], and you know it.

It’s also unlikely that this behavior is tied in any way to a LSI HBA. My UGREEN NAS doesn’t have one, and the drives in it report a non-zero value for that SMART attribute–though as it’s in the middle of a long replication task right now, I’m not going to try rebooting it.

All in all, it reminds me a lot of the guy in the Dragonfish release thread who was hyperventilating about the fact that the boot pool was created with kernel device identifiers rather than UUIDs–a valid point as far as it went, but blown enormously out of proportion.

It’s explicitly stated in the documentation OP quotes to be a minimum.

The longest-running drive in my NAS as of right now has 71,164 power-on hours, or a bit over 8 years. It shows a power-off retract count of 78. Admittedly, much of that time was under CORE, and FreeNAS before that, though I’d be surprised if they behaved any differently. ↩︎

tannisroot · July 8, 2024, 3:03pm

What’s not a valid criticism–and is FUD IMO–is the “damage your hard drives,” “atrocious default behavior,” “sabotage,” and the other inflammatory rhetoric you decided to use

I’ve edited my post to have less exciting language in order to more accurately represent the problem.

This is a known default for the kernel driver that manages LSI HBAs under Linux, and I did testing to confirm that it is tied to the LSI HBA. Without touching managed start stop they even make a sound as if a power cord was pulled out. And it’s solved by turning on the managed start stop parameters, I can hear the drives parking 1 by 1 until the NAS powers off.

and the drives in it report a non-zero value for that SMART attribute

It’s normal for it to have a non-zero value if the NAS ever encountered a power loss or it was shut down before the OS could actually boot, but it’s not normal for it to have it at 200 like in my case.

No user’s going to get anywhere close to those specs with anything like sensible use of the system[

I haven’t yet started looking through user manuals of popular hard drive models (which are surprisingly hard to actually find online), but this page for instance makes (although an unsubstantiated) claim that even several hundreds of emergency retract cycles can kill a drive. This can be someone’s anecdotal experience though.
https://romaco.ca/blog/2017/08/06/how-to-spin-down-hard-disks-at-shutdown-on-lsi-hbas-on-linux/

Protopia · July 8, 2024, 3:29pm

@tannisroot I agree with Dan - you have used emotional language to make a mountain out of a molehill.

Of all the risks to your data on your NAS, I suspect this is possibly the least of them.

Care to share your ZFS layout with us so that we can confirm to you that it is risk free (or point out it’s risks if there are any - and hopefully we won’t go over-the-top in describing these risks)?

sfatula · July 8, 2024, 7:35pm

My power off retract count after 566 days with Seagate Exos drives and a LSI HBA is 18, probably exactly the number of times I’ve shut the system down.

Davvo · July 8, 2024, 7:59pm

Wait, you people power off your systems?

Jokes aside, another example of the superiority of CORE!

For real now, it’s an interesting find and I wonder the reason of this behaviour; that being said, I would have appreciated the topic more if it was written with a different tone… less as an outraged cry against iX and more as a discussion/feedback occasion.

Something along “Hey, I found this; iX responds this way, what is your opinion?”.

Also, I would like to point out that people should not use desktop-grade HDDs in a NAS since the workloads and operating conditions are vastly different.

As such, I’m not overly worried about this issue (especially since I run CORE and use no HBA, lol) but would appreciate an official-ish response from iX outside Jira since we all know that information sharing and accessibility is not the platform’s strongest point.

Thank you for bringing this issue to our attention.

sfatula · July 8, 2024, 8:12pm

I do, for cleaning, adding new hardware, updates, and one time moving the system. No other reason. That’s why my count is 18 after 1.5 years!

Protopia · July 8, 2024, 8:24pm

Toshiba MG09 drives are CMR (rather than SMR) Enterprise drives and although they are not specifically targeted at NAS systems, they are targeted for 24/7 operation in servers - so they are definitely NOT “desktop-grade HDDs”.

Davvo · July 8, 2024, 8:53pm

I know, I was referring to the following part.

Or similar.

Point is, there should be a not that big difference between the drives used in a TN system (either an Enterprise drive or a NAS drive).

I hope it’s clear now.

As a side note, the Toshiba MG line is amazing.

Stux · July 8, 2024, 9:50pm

Heh. I always thought that was the Power Off Retract Count and was a proxy for how many times I had rebooted

volts · July 8, 2024, 10:42pm

I don’t think that’s correct. I don’t think it implies “Emergency Unload”.

Many drives increment that counter whenever the drive is powered down normally, and I think some even increment that counter when they sleep or idle deep enough.

How are Start_Stop_Count and Power_Cycle_Count?
How’s Load_Cycle_Count - does it match Power-Off_Retract_Count?

tannisroot · July 9, 2024, 7:41am

I think you are right, but in fairness I did bring it up 3 times before - on Jira, old forum and Discord, and in all instances it just went silent, which really didn’t feel nice.
I kind of honestly forgotten about it, but someone recently messaged on the old forum thanking me about my old blog post about it, and I felt like I needed to bring it up in a way that will catch people’s attention so that they at least workaround it on their end, which I failed to do in a nice way

tannisroot · July 9, 2024, 7:53am

Here are current values of another identical drive model that I have in my array:
Start_Stop_Count - 83
Power-Off_Retract_Count - 39
Load_Cycle_Count - 83
Before I applied the managed start stop thing, Power-Off_Retract_Count was indentical to Start_Stop_Count, after that it stopped incrementing after shut downs or reboots. The sound drives made on shutdown changed too, as I previously mentioned.

PhilD13 · July 9, 2024, 8:20pm

Kinda interesting but I never worried about it and you don’t have to INMO.

I have a bunch of used drives of various manufactures within an enclosure that came from other systems over the years that I consolidated so I can’t compare apples to apples in the current system. The systems ranged from propitiatory QNAP systems, to QNAP JBOD systems with Marvel (I think was the brand) controllers, to now server (Truenas Scale) systems with LSI controllers. The numbers run all over the place and as you can see most of these drives are quite old and all have been powered on 24/7 for most of their life. Some of these drives were in a 24 hour surveillance recording use on a QNAP system which literately tried to beat them to death.On all of these systems I have never attempted to do anything special to them to make drives last longer or not.

A sample:
10Tb exos
Power On Time: 35286
Start Stop Count: 29
Load Cycle Count 10318

6Tb HGST
Power On Time: 69646
Start Stop Count: 96
Load Cycle Count 2935

6Tb Seagate Exos
Power On Time: 11774
Start Stop Count: 167
Load Cycle Count 14394

The Seagate Exos drives tend to more aggressively park when not actively reading/writing while the HGST are not as aggressive.

Truenas has settings for parking, spindown, available but since manufactures may use different values for each setting (Exos) the settings may not work or work as expected. Both 6Tb drives (Hgst, exos) came out of the same QNAP system, but the Exos drives are newer and you can see the exos drives are more aggressive at parking.

I have always gone by this general info below.

The Start Stop Count is the amount of times the hard disk spindle has started to spin, and then stopped spinning. The spindle will power up and start spinning whenever the system power is turned on or the system is resumed from sleep.

The Load Unload Cycle Count is how often the actuator arm (Head) is parked in the loading zone.

What is the difference? Your hard disks might be spinning, but if nothing has been written/read for a while, the drive firmware might park the head, but not stop the drives from spinning. The head parking behavior is not usually controlled by the operating system. It is controlled by the disk drive firmware. Each disk brand and models within the brand has it’s own method of choosing when to park the heads. Manufactures generally have command line tools they provide and commands to be issued to change the park and spindown parameters from the default setting. Obviously hard drives will always park the heads before powering down, unless power is abruptly cut off then they will or will try to emergency park so as not to crash heads into platters. I say try as I had a backplane “let the smoke out” failure in a QNAP server that did skip the heads on 4 10TB drives during the event.

So it’s not really the responsibility or the problem of Truenas, Debian, LSI, etc. in not changing the default behavior of drives installed in a system by default to solve.

Protopia · July 9, 2024, 9:10pm

IMO this is not at all obvious - indeed I cannot see how hard drives would know to park the heads because the power is about to be shut down unless the O/S tells the drives that this is about to happen.

IMO this is incorrect.

It is quite likely that the O/S or ZFS will write something to the drives in the seconds or 10th of seconds prior to the O/S shutting down - and if so the heads will all be active at the time. An assumption that the disks will have been idle for sufficient time for the microcode to park the heads is IMO erroneous.
IMO it is the responsibility of the O/S to quiesce hardware properly prior to power-down because it is the O/S that is responsible for actually doing the orderly shutdown / powerdown. I do not think that higher layers in the software stack should (like TrieNAS) should have the responsibility of doing so, however if iX are genuine about calling TrueNAS an appliance then if the O/S doesn’t do it then TrueNAS needs to.
However, all that being said, as far as I can see from the analysis above, the issue is NOT that the O/S doesn’t have the functionality to tell drives that it is about to power down, but rather than when there are drives attached to an LSI HBA, then the default settings are such that the O/S doesn’t do it for these drives. I suspect that this these defaults are somehow the responsibility of the LSI drivers rather than Linux and that this is a shortcoming of the LSI drivers included in TrueNAS as standard.

Captain_Morgan · July 9, 2024, 9:57pm

Officialishly…Its an interesting thread.

@tannisroot did the right thing by reporting the issue and then documenting it with a workaround on the forums. That is what they are for. Thank you.

Should TrueNAS SCALE change its defaults?
We’re always wary of doing that due to unintended consequences and lack of consensus.
Without some examples of real problems, the risk of change is probably higher than the reward.

Until we have seen premature drive failures, I’d suggest the headline be:

“SCALE + LSI HBA may not have the best defaults for drive lifespan.”