Hi, Using core v12 .u8
I am trying to figure out a way to have a automatic shutdown with drive temperature. We had a issue last week, were the aircon unit failed and we had a alert come through for excess temperature. Lucky enough it was during the day and I was able to fix the issue.
My worry is if it happens during the night or when I’m away. I had a search and found a few others with the same issue but with no solution. I’m guessing i need a script to put into the init/shutdown task.
Would anyone have a solution?
This version is long since EOL and should be upgraded to a supported release.
Not into the init/shutdown tasks; those are things that run when the system is starting up (or shutting down), but don’t make it start up or shut down.
The best case would be if such a feature were built into the disk monitoring–the SMART service already monitors disk temps and will send a warning when thresholds are reached. If it could also run a specified command at that time, your problem would be solved. Unfortunately, CORE doesn’t have that feature and never will. SCALE also doesn’t have that feature, but as it’s still under active development, it’s possible it could be added if a feature request gains enough traction.
So what that means is that you’ll need to either (1) find a script that does what you want (I’m not aware of any such scripts, but that certainly doesn’t mean they don’t exist); (2) adapt an existing script to do what you want (in which case, Joe’s multi-report script would be a good starting point); or (3) write your own script.
The latter shouldn’t be too difficult if you can figure out bash syntax. You’d want to run smartctl -a
on one or more of your disks, grep
for 194
(which is the attribute for the temperature), then probably use awk
to extract the relevant field from that line of text. If it’s higher than whatever threshold value you want to set, run poweroff
to shut down the machine.
Having written that script, you’d set up a cron task to run it every 15 minutes or so.
I can give you a simple script but I have to ask you a few questions:
- Do you want any drive temperature that exceeds xxC to shutdown the system, or one specific drive?
- If it is all drives, how many drives are we talking about maximum?
- Are there any drives than need to be excluded?
This will decide which path I choose. But the script will be easy. You will need to run it in a CRON Job periodically to check the temps.
I no longer have Version 12 on my system so I will not be able to test with that, but I do have CORE 13.3.
EDIT: Forgot to ask, are any of the drives to test of temperature NVMe?
Thats not actually that hard to do I think, although @joeschmuck has already got all the tools you need.
Be careful of SSD’s and NVMe’s which run hotter, potentially at temps that would cause HDD’s to appear to be “melting” - so you probably only want to test for HDDs
Here is a simple script (simple because I already had the foundation).
This will scan every drive one time and check for a specific temperature (55C by default). If any drive hits 55C or greater, the system will send out a message that the system will power down in 2 minutes, and it should power the system down.
You can also exclude drives by serial number, just modify the Ignore_Drives_List variable.
You will need to create a CRON Job and run it periodically, I suggest every 15 minutes as it takes a little time for drives to heat up.
If you must test NVMe drives, let me know, there is a different way to collect that data, right now I’m using the TrueNAS API which I’m trying to stick with.
Exceptions:
- NVMe drives and Virtual Drives will not be tested.
- It will not power down a Hypervisor unless you have it triggered by TrueNAS.
shutdown_on_temp.txt (2.1 KB)
We run 72 3.5" mechanical drives, so ssd/nmve options shouldnt be a issue.
Thank you for the input and I will apply that script now.
Glad I put up the scan all drives version. Manually adding 72 drives would suck.
In the Cron Job, if you do not click on “Hide Standard Output”, you will see all the drive temperatures. Of course you can eliminate that in the script by putting a hash #
symbol at the beginning of line 12 to comment it out. I placed that line in there so you can at least see that the script is looking at al the temperature data.
Run the script, make sure there are no odd results.