Multi-Report

The issue for @oxyde was not actually an issue. His NVMe drives were in low power mode and thus his NVMe drives were very slowly counting the Power On Hours. I don’t know why some drives do this and some don’t, it just happens. Think of it as extending your warranty :rofl:
The script is working still, whew!

2 Likes

Yeah, thanks a lot for your time!

In the user guide, I see a version note that says “Added Email Report ONLY on Alert (any Error Message).” But I can’t find how to do that in the user guide, nor in the config interface.

I would like to get the email only if it’s not “all good”.

Yea, the manual could be written clearer. But it was clear in my mind :wink:

This should do it for you, sort of. It will not report a Warning level event.

In the Cron Job, add the switch -m to the command. For example (this is my command for the Cron Job: cd /mnt/farm/scripts && ./multi_report.sh -m to get an email only during a Critical error.

While this is typically set by default, you should open the multi_report_config.txt file and edit the following lines, if they even need to be changed:

###### Alert Email Configuration - For Temperature and Critical Error monitoring when you suspect a problem.
### You must use the '-m' switch
AlertEmail="YourAlertEmail@Address.com"		# Alert email address used with the '-m' switch.
AlertOnWarningTemp="true"			# Send alert on Warning Temp. Default = "true"
AlertOnCriticalError="true"			# Send alert on Critical Error. Default = "true"

I would recommend testing this as well by setting the drive temp critical warning value to below the current drive value. This should send you an email when you run the script. If it does, then change the value back and you should be good.

Thanks Joe. Sounds like I better not do that, as I want to be made aware of anything that is not “all good”, for example too many days without a SMART test. I guess I’ll leave it as is and check the daily email subject before deleting it.

You could make a quick edit to the script to trigger on anything. Pretty easy, Ha Ha. Look in your email today and I will send you a line to change in the script.

Okay Joe. I added the -m to the CronJob command, verified that in the config file AlertOnWarningTemp and AlertOnCriticalError are both "true", and made that little secret change to the script. Thank you!

I ran the cron job and didn’t get an email, so it seems to be working. I’ll run that test later just to be sure.

I have grown to like the All Good emails I get from each server every morning. I know immediately even before coffee, if anything is wrong in the server room if one or both emails don’t show up. Give the emails a quick look see and off to other things.

2 Likes

I was able to install and run this successfully on my server. It is working great, however, I do have a question regarding scrubbing the drives. If I’m not mistaken I don’t see any schedule recommended for scrubbing and I would like to know what best practices are.

My server was down for a few days and the last scrub was 38 days ago. I have it set to scrub once every 7 days. Each month on the 7th, 14th, 21st, and 28th.

Do you or anyone here have any best practice recommendations with the scrubs?

Please advise and thank you for your time and dedication to this community!

I generally figure that every 2-4 weeks is good.

The default is on Sunday not to exceed 37 days I believe (maybe that was 35 days).

If Sunday is a bad day for you and your system is always up and running on a particular day, maybe Thursdays, then you can change to that schedule.

Myself, I think the iXsystems folks have it right, once a month should be good. A Scrub is like running a SMART Long test, it is drive intensive and takes time.

I was thinking about adding a feature to run a Scrub into Multi-Report, but because TrueNAS does a very good job of supporting it now, maybe not.

Version 3.1 is posted on github (link in first post) if you would like to grab it. I have not setup the automatic update yet as I would like those over zealous folks to give it a try. Okay, the truth… The wife has us scheduled to go on a trip out of town for a week, I will not have a means to troubleshoot any problems that may be discovered, so if you want to grab the needed files, that is perfectly fine. I expect to return home and be available to troubleshoot (if needed) on 19 December. If you have a problem, feel free to send me an email to joeschmuck2023@hotmail.com and run the script using the ‘-dump emailall’ switch, I will answer it as soon as I can, but the grandkids have to come first, until I get back home.

You will need the following files: (place both in your scripts directory)

  1. multi_report_v3.1_2024_12_11.txt >> rename to multi_report.sh
  2. drive_selftest_v1_2024_12_11.txt >> rename to drive_selftest.sh
  3. Drive_Selftest_User_Guide.pdf (This user guide is written better)

I have not updated the Multi-Report User Guide “yet”, however the Change Log (attached) has all the important stuff in it.

If you update to version 3.1 and need to roll back, you can use the multi_report_config.txt file, it remains backwards compatible, or use the old version that you should have emailed to you automatically.

Sorry, no GUI configuration yet however I am working on changing over to an INI file format which will make using the GUI significantly easier to manipulate.

While I am here I have a question to ask of those folks who use this script. What features/functions do you not use or do you feel could be tossed into the wood chipper? For example, the encryption of the already encrypted TrueNAS config file? (line 5 of the script). I was asked for years ago, I have never been asked about its use since. I’m looking to simplify where it makes sense.

Change_Log.txt (8.6 KB)

1 Like

Hey @joeschmuck, were you able to address this in the upcoming release? Have not been able to follow up the development after our last email exchange.

@Davvo Specifically what? There have been a lot of changes since version 3.07.

One thing I added which I hope will be helpful is when a drive has a ZFS error (read/write/chksum), the text portion will specify exactly which drive it is vice someone manually cross-referencing the data. Crap, I wanted to add a note about a ZFS error does not mean a physical drive error. So many people get these things confused.

Yes, that should have been fixed a long time ago. However I think the fix was to delete the statistical data file as it is likely not using the current format. If you want to retain the data you have, just rename it, a new file will be created. If that fails to wotk, reach out after trying the new version.

1 Like

Yup I can confirm deleting the old data file fixed the issue, thank you.

Sorry if I didn’t pass that information out, I’m old and forgetful. That is my excuse and I’m sticking to it. :grin:

1 Like

Nah, I remember you telling me to do so… and me forgetting about it :grimacing: :rofl:

1 Like

@joeschmuck Seeing the following error from v3.1

Would you like to scan the drives and setup these offsets (y/n): 
 
Collecting data, Please wait...
 
AUTOMATIC DRIVE COMPENSATION - UDMA_CRC, MultiZone, and Reallocated Sectors
 
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.nvme_smart_health_information_log.temperature_sensors.[0] | values                                                       
jq: 1 compile error
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.nvme_smart_health_information_log.temperature_sensors.[1] | values                                                       
jq: 1 compile error
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.nvme_smart_health_information_log.temperature_sensors.[0] | values                                                       
jq: 1 compile error
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.nvme_smart_health_information_log.temperature_sensors.[1] | values                                                       
jq: 1 compile error
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.nvme_smart_health_information_log.temperature_sensors.[0] | values                                                       
jq: 1 compile error
jq: error: syntax error, unexpected '[', expecting FORMAT or QQSTRING_START (Unix shell quoting issues?) at <top-level>, line 1:
.nvme_smart_health_information_log.temperature_sensors.[1] | values                                                       
jq: 1 compile error
Scanning Results:
No UDMA_CRC Errors
No MultiZone Errors
No Reallocated Sectors
Bad Sectors Detected
No Media Errors