Warning to folks out there: by default, SCALE (all versions AFAIK) has a default for LSI HBAs that may potentially shorten the lifespan of your hard drives!
I am not sure about other LSI HBAs, but a while ago, I’ve discovered that my array of Toshiba MG09 18TB drives connected to an LSI 9217-8i (IT mode), even if the NAS was shutdown/rebooted properly, never spun down and instead used the drive’s emergency retract feature.
You can verify this is happening by running sudo smartctl -a /dev/sdX
where X is the drive’s letter you can check in Web UI by going to Storage → Disks. The parameter you are looking for is Power-Off_Retract_Count
. Try to reboot, and you may see that its value goes up by 1.
Before I found a workaround for this, my relatively new drive that never suffered power loss and was used exclusively under SCALE managed to have a value of Power-Off_Retract_Count
at over 200, which is a lot!
According to the user manual of my disks:
Emergency Unload is mechanically much more stressful to this drive than a
controlled Unload.
(Power-Off_Retract_Count represents the number of Emergency Unload events that happened in the lifetime of the drive)
So while it may not cause direct failure, according to various resources and forums online, it can still prematurely wear out the mechanics inside the drive, thus permanently damaging it and shortening its possible lifespan, and maybe worsening its performance. I can also see how data integrity may be compromised by this too.
Thankfully, you can fix this by creating 2 Init Scripts.
The commands for each are the following:
for i in /sys/class/scsi_disk/*/manage_system_start_stop; do echo 1 > $i; done
and
for i in /sys/class/scsi_disk/*/manage_runtime_start_stop; do echo 1 > $i; done
Since adding those, the Power-Off_Retract_Count
value stopped increasing.
I adressed this on Jira with IX and suggested that they should change this default behavior under SCALE, but, despite the manufacturer explicitly stating in the user manual:
The minimum number of Emergency Unloads that can be successfully performed
is 50 000. Emergency Unload should only be performed when it is not possible to perform a
controlled Unload.
the reply from iX was
However, it says that the recommend emergency unload operations is 50,000. That’s an insane amount and I don’t ever think one would ever seriously hit that before having to replace the drives altogether.
…
I do agree with you that it’s an issue, but it’s a low-risk issue since the manufacturers documentation states 50K emergency unloads is the maximum.
which doesn’t really fit with Toshiba’s statement that this should not be the default behavior, and that this is a number of emergency unloads the drive is capable of performing and still function fine if you do that and last as long as it’s meant to. + the 50000 number of Emergency Unloads is for this particular Enterprise rated model only and may be lower for other, cheaper models. And since not all drives are made equally, the real number of Emergency unloads you can do safely in real life may be lower.
You can read the full bug report here: TrueNAS - Issues - iXsystems TrueNAS Jira
Personally, I think any reasonable user will want to adhere to the manufacturer recommendations and run their hard drives in optimal conditions to make sure they last as long as possible, to reduce operational costs and the personal environmental impact, but this default behavior prevents that unless they are aware of it