Quick checks with smartctl: tests and temperatures

Hi all, just in case you were wondering how to quickly check the results of your last smartctl runs other than tediously navigating via Truenas Community Web UI, then here is a script generated by ChatGPT. Please let me know if you find it useful or complete non-sense and there is an easier way to scan all drives and see the results in a human readable format. This script in particular shows how long ago in hours the tests were run in the column “Difference”. Here is the zsh script “smart_test_summary.zsh”:

#!/bin/zsh

# Print header
printf "%-8s %-20s %-20s %-10s %-12s %-12s %-10s %-20s\n" \
  "Disk" "Model" "Serial" "On_Hours" "Test_Hours" "Difference" "Type" "Result"
printf "%-8s %-20s %-20s %-10s %-12s %-12s %-10s %-20s\n" \
  "--------" "--------------------" "--------------------" "--------" "------------" "------------" "----------" "--------------------"

# Get all disk devices
drives=($(lsblk -dno NAME,TYPE | awk '$2=="disk" {print "/dev/" $1}'))

for drive in $drives; do
  # Skip if SMART not available
  if ! sudo smartctl -i "$drive" >/dev/null 2>&1; then
    continue
  fi

  # Get model and serial
  model=$(sudo smartctl -i "$drive" | awk -F: '/Device Model|Model Number/ {print $2}' | xargs)
  serial=$(sudo smartctl -i "$drive" | awk -F: '/Serial Number/ {print $2}' | xargs)
  [[ -z "$model" ]] && model="N/A"
  [[ -z "$serial" ]] && serial="N/A"

  # Get power-on hours
  power_on_hours=$(sudo smartctl -A "$drive" | awk '/Power_On_Hours|Power-On Hours/ {print $10}')
  [[ -z "$power_on_hours" ]] && power_on_hours="N/A"

  # Get last test entry (line #1)
  last_test=$(sudo smartctl -l selftest "$drive" | grep "^# 1")
  if [[ -z "$last_test" ]]; then
    continue
  fi

  # Parse test details
  test_type=$(echo "$last_test" | awk '{print $3}')
  test_status=$(echo "$last_test" | awk '{for (i=5; i<=NF; i++) printf $i " "; print ""}' | xargs)
  test_hours=$(echo "$last_test" | awk '{print $(NF-1)}')

  # Calculate difference if possible
  if [[ "$power_on_hours" != "N/A" && "$test_hours" != "" ]]; then
    difference=$((power_on_hours - test_hours))
  else
    difference="N/A"
  fi

  # Determine result more accurately
  if [[ "$test_status" =~ "Self-test routine in progress" ]] || [[ "$test_status" =~ "in progress" ]]; then
    result="In progress"
  elif [[ "$test_status" =~ "Completed" ]]; then
    result="PASSED"
  else
    result="FAILED ($test_status)"
  fi

  # Print row
  printf "%-8s %-20s %-20s %-10s %-12s %-12s %-10s %-20s\n" \
    "${drive##*/}" "$model" "$serial" "$power_on_hours" "$test_hours" "$difference" "$test_type" "$result"
done

Next one “smart_drive_temp_info.zsh” shows temperatures in a table format

#!/bin/zsh

# Print header
printf "%-8s %-20s %-20s %-10s %-12s\n" "Disk" "Model" "Serial" "On_Hours" "Temperature (°C)"
printf "%-8s %-20s %-20s %-10s %-12s\n" "--------" "--------------------" "--------------------" "--------" "--------------"

# Get all disk devices
drives=($(lsblk -dno NAME,TYPE | awk '$2=="disk" {print "/dev/" $1}'))

for drive in $drives; do
  # Skip if SMART not available
  if ! sudo smartctl -i "$drive" >/dev/null 2>&1; then
    continue
  fi

  # Get model and serial
  model=$(sudo smartctl -i "$drive" | awk -F: '/Device Model|Model Number/ {print $2}' | xargs)
  serial=$(sudo smartctl -i "$drive" | awk -F: '/Serial Number/ {print $2}' | xargs)
  [[ -z "$model" ]] && model="N/A"
  [[ -z "$serial" ]] && serial="N/A"

  # Get Power-On Hours
  power_on_hours=$(sudo smartctl -A "$drive" | awk '/Power_On_Hours|Power-On Hours/ {print $10}')
  [[ -z "$power_on_hours" ]] && power_on_hours="N/A"

  # Get Temperature
  temperature=$(sudo smartctl -A "$drive" | \
    awk '/Temperature_Celsius|Temperature_Internal|Temperature Composite/ {print $10}' | head -n1)
  if [[ -z "$temperature" ]]; then
    temperature=$(sudo smartctl -A "$drive" | awk '/Temperature:/ {print $2}' | head -n1)
  fi
  [[ -z "$temperature" ]] && temperature="N/A"

  # Output row
  printf "%-8s %-20s %-20s %-10s %-12s\n" "${drive##*/}" "$model" "$serial" "$power_on_hours" "$temperature"
done

Update 1: added “smart_drive_temp_info.zsh”
Update 2: added Model and Serial columns to the output in “smart_test_summary.zsh”
Update 3: “smart_test_summary.zsh” was showing “FAILED (Self-test)” for a drive that runs a test at the moment, now it shows “In progress” for such drive.

You could install Scrutiny instead and get a nice dashboard on top.

@pmh , this one GitHub - AnalogJ/scrutiny: Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds ?

Yes, it’s in the app catalog, just a handful of clicks to get up and running.

1 Like

@pmh scrutiny looks super advanced for my use case :thinking: I am not (yet) using Docker and have no idea how to run it on Scale :innocent: any hints for a proper fan control scripts on Supermicro? Otherwise going through the pain of Supermicro Fan Control - #29 by pvb after an upgrade from Core to Scale. And yes, I tried hybrid_fan_control, but it randomly fails to read CPU temperature for no good reason.

update: figured out Docker, menu Apps → Configuration - Choose Pool, then search for scrutiny, then install/deploy… works like a charm :tada:

Docker is built into SCALE/CE - just click on “Apps” in the menu, set the pool, click on “Discover Apps” in the upper right and fire away. No need to know anything about Docker at all - you have hundreds of apps at your disposal.

1 Like

Hi @pmh , is it possible that scrutiny reports a different temperature than own Truenas Web UI? And I am not finding controls to run (re-run) the current temperatures in scrutiny… does it work on its own schedule after some (how many?) minutes? No parameters in Settings :frowning: at least not in the UI.

You could also give Multi-Report a try, or FreeNAS Report. Scrutiny is easy and has a nice GUI. The other two are BASH scripts and send you the data in an email.

I run Scrutiny and Multi-Report. Multi-Report has other features besides the reporting.

1 Like

@joeschmuck , thank you for sharing. GitHub - JoeSchmuck/Multi-Report: FreeNAS/TrueNAS Script for emailed drive information. – I sense this is gold for novice and pros, specifically, Drive Troubleshooting Flowcharts.

ok, it seems that this page allows only one message to be marked as a solution, while I see at least two:

:pray: @pmh – Apps → Configuration - Choose Pool, then search for scrutiny, then install/deploy, scrutiny on Github: GitHub - AnalogJ/scrutiny: Hard Drive S.M.A.R.T Monitoring, Historical Trends & Real World Failure Thresholds

:pray: @joeschmuck – Multi-Report, discussed here and source on Github: GitHub - JoeSchmuck/Multi-Report: FreeNAS/TrueNAS Script for emailed drive information.