Multi-Report

joeschmuck · April 13, 2024, 5:25pm

Multi-Report is script which was originally designed to monitor key Hard Drive and Solid State Drive data and generate a nice email with a chart to depict the drives and their SMART data. And of course to sound the warning when something is worth noting.

Features:

Easy To Run (Depends on your level on knowledge)
Very Customizable
Sends an Email clearly defining the status of your system
Runs SMART Self-tests on NVMe (CORE cannot do this yet)
Online Updates (Manual or Automatic)
Has Human Support (when I am not living my life)
Saves Statistical Data in CSV file, can be used with common spreadsheets
Sets NVMe to Low Power in CORE (SCALE does this automatically)
Set TLER
And many other smaller things

SMART was designed to attempt to provide up to 24 hours to warn a user of pending doom. It is very difficult to predict a failure however short of a few things, SMART works pretty well. Up to 24 hours, that means when you find out about a problem, that failure could happen at any moment. Just heed the warning.

Multi-Report has been expanded to also run SMART Self-tests on NVMe drives for anyone running TrueNAS 13.0 or TrueNAS 24.04 an below as these platforms do not have the ability to run and/or configure NVMe for Self-tests. This script gives you the option to run these tests if your NVMe supports SMART Self-tests.

Another key feature is sending your email a copy of the TrueNAS configuration file weekly (the default). How many times have you seen someone lose that darn configuration file?

There is an Automatic Update feature that by default will notify you an update to the script exists. There is another option to have the update automatically applied if you desire.

I have built in troubleshooting help which if you specifically command, you can send me (joeschmuck) and email that contains your drive(s) SMART data. I can then figure out if you need to make a small configuration change or I need to fix the script.

All the files are here on GitHub. I retain a few previous versions in case someone wants to roll back. The files are all dated. Grab the multi_report_vXXXX.txt script and the Multi_Report_User_Guide.pdf, that should get you started.

There is a nice thread on the old TrueNAS Forums Here that is a good history.

Download the Script, take it for a spin. Sorry, this forum does not allow uploading of PDF files, so grab the user guide either from GitHub or run the script using the -update switch and it will grab the most current files.
multi_report_v3.0.1_2024_04_13.sh (407.7 KB)

An alternative to Multi-Report is FreeNAS-Report, also found in this resource section. Both are based off my code (and others who modified the script over the years, names are at the top of the script) and it is freeware to share.

unseen · April 14, 2024, 9:22am

Hi Joe,
I have multi-report configured to auto-update (running on the latest release of TrueNAS CORE). Whenever it performs an auto-update, I get an e-mail with the following message:

mkdir: /tmp/multi_report.lock: File exists
Script is already Running… Exiting
If this message is in error, remove ‘/tmp/multi_report.lock’ directory or just reboot TrueNAS to recover.

The update succeeds, and when I log in to check, the directory /tmp/multi_report.lock does not exist.

It’s not a major annoyance, but I thought you might like to know.

joeschmuck · April 14, 2024, 2:39pm

@unseen
You are the first person to report this problem since I had to introduce it, and I appreciate all feedback, good or not so good. Problems need to be fixed, call me OCD.

The purpose of the lock file is to ensure the script does not try to run multiple instances of itself and if it tries, it aborts and gives you the message you have. The update process is a key thing where is it possible, while unlikely to happen.

It would annoy the hell out of me. There is something going on and I’d like to fix it. I’m hoping you have something just configured in a strange manner.

Do you get an email when the script runs?

The lock file should be deleted when the script exits, which explains why you can’t locate it. Since the lock file is also located in the /tmp directory, a reboot erases everything in the /tmp directory.

Questions:

How are you running the script, please be specific as I cracked my crystal ball last week.

Do you run from the command line?
Are you using a CronJob?

If using a CronJob, does the issue happen when running from the command line?
If it only happens from a CronJob, a screen shot of your CronJob settings would go a long way.
Do you have more than one CronJob to run this script?

Lots of questions so we can hopefully solve this quickly.

Based on your responses, this will be an easy fix or I will need to send you a modified script to point out exactly what is going on.

Feel free to run the script using the ‘-dump emailextra’ switch to send me (and yourself) and email containing some data and then we can exchange emails directly. But only if you are comfortable doing that.
We can do most of this also through private messages if it looks to be getting complicated. I do not want to flood this forum topic with too many messages here but if we do take this offline, I will come back here and post what the problem was and the solution. If a fix is needed, v3.0.2 will be coming out .

masterjuggler · April 14, 2024, 11:54pm

I actually also had that same lock issue and came here to report it. It only started happening in the past two weeks for me, and rebooting the system did not help. I had to update from v2.5 to v3 and switch my cron job from running the old multi-report.sh to multi_report.sh (for some reason I guess I named it with a hyphen instead of an underscore at some point).

I’ll be able to grab a dump in a few days if the root cause isn’t found by then, but in any case the issue seems solved for me after switching to v3.

joeschmuck · April 15, 2024, 12:43am

@masterjuggler So you still have the issue, correct? And you are running only a single CronJob and not several different ones? For example, someone may want to run a CronJob at 2AM, 2:15AM, 3PM, etxc and setup multiple jobs.

Try this to see if it works: Append to the end of the command line ‘-ignore_lock’ so your CronJob or SSH window would look like this: multi_report.sh -ignore_lock for example. If you are using any other switches, use this one last.

masterjuggler · April 15, 2024, 12:58am

If I switch back to using v2.5 I’ll have the issue, yes. Using v3 I do not have the issue.

I only have a single cron job set to run once per week.

Setting the -ignore_lock flag allowed v2.5 to work as expected.

unseen · April 15, 2024, 5:56am

Yes, I get an e-mail from my NAS:

“mkdir: /tmp/multi_report.lock: File exists
Script is already Running… Exiting
If this message is in error, remove ‘/tmp/multi_report.lock’ directory or just reboot TrueNAS to recover.”

The script runs as a Cron Job via TrueNAS Tasks->Cron Jobs.
To answer the other three questions, I would need to downgrade the script to a previous version and run it from the command line to see the upgrade process working. If I run it from the command line, using exactly the same command as I have specified in the cron job configuration, the script runs just fine and sends me a report by e-mail when it is done.
The cron job runs the command: /mnt/freenas/Scripts/multi_report.sh and is set to trigger at 08:00 every day.

I will try to find some time to downgrade and run the script from the command line.

joeschmuck · April 15, 2024, 8:19am

You can go to GitHub to grab one of the previous versions back to v2.4.2.

But you do not get the expected email with the charts and data, correct?

And please try the ‘-ignore_lock’ switch.

@unseen @masterjuggler I forgot to ask, what version of TrueNAS are you running?

unseen · April 15, 2024, 8:45am

Like I said in my first post, I’m running the latest release of TrueNAS core.

I don’t get a report when the script fails, just a message about it failing, so when that has happened, I simply logged in and ran the script from the command line so that it could produce the report for that day.

As the script saves the previous version when it does the update, all I need to do it rename some files to put the old version back.

Looking at the script, the lock create and check looks like this:

        if ! mkdir /tmp/multi_report.lock; then
                printf "Script is already Running... Exiting\n" >&2
                printf "If this message is in error, remove '/tmp/multi_report.lock' directory or just reboot TrueNAS to recover.\n" >&2
                exit 1
        fi

That should probably read:

if mkdir /tmp/multi_report.lock

As mkdir returns 0 on success and 1 on failure.

… edit

That’s probably not correct, it works fine. The problem is that when you do an auto-update, you copy the new version of the script to ‘temprunfile.sh’ from the currently running script and then execute it. Problem is, the previous version of the script is still running and the new version running as ‘temprunfile.sh’ is going to try to create the lock directory again. If you delete the current lock directory before running ‘temprunfile.sh’, I think it wil fix the problem.

joeschmuck · April 15, 2024, 9:33am

I will look into that when I return from work tonight, however the odd thing is, I can’t recreate the issue and this is the first time I’ve had any lock complaints, only when v3.0.1 came out. Did I make a change to cause it? I really hope others are not “just making due” with the error, I actually want it to work for everyone.

I could also just get rid of the lock feature as well, which is why the ‘-ignore_lock’ was added way back when. But you should not need to use it. But that may fail during an upgrade. Hum…

Thanks for the feedback.
-Joe

unseen · April 15, 2024, 2:42pm

Keep the lock feature. Having two copies of the script trying to compete with each other would be less than optimal…

It would be much easier to see if something you have done recently had provoked the problem if you used git properly and checked in the script file multi_report.sh as one of the controlled source files and made releases in github. Then it would be simple to diff the current release with the previous release and see the changes made between each release. As it stands now, you have to download each “multi_report-xxx.txt” file and manually diff them to see what changed between each release.

At the moment, you create a subprocess to run the updated script. That subprocess includes a three second sleep

    # How do we run the script when the script is the same name?  Temporarily copy the new script to a new name and run that name.
        (
        cd $SCRIPT_DIR"/"
        cp $SCRIPT_DIR"/"$runfilename $SCRIPT_DIR"/"temprunfile.sh ; chmod 755 $SCRIPT_DIR"/"temprunfile.sh
        ./temprunfile.sh $1
        sleep 3
        rm $SCRIPT_DIR"/"temprunfile.sh
        )
        exit 0

Seemingly expecting the new script to take less than three seconds to run. You could skip the sleep and just immediately delete the copy of the script. It won’t actually get deleted while a process still has it open.

As the first thing the temporary copy of the script does is create the lock directory (which will fail) a better strategy would be for the script to look at $0 when it starts. If it can see that it is running as “tempfilerun.sh” then it should skip creating the lock directory as you know it will already exist.

Eventually, both copies will exit and the first copy to exit will delete the lock directory. As your exit trap deletes the lock directory with -rf, nothing will fail when the last copy to exit doesn’t find the directory to delete.

I can’t explain why this has started happening now. Personally, I only just recently turned on the auto-update feature (it’s off by default) and it has failed each time the script updated itself (which is what I’d expect looking at how the script works).

joeschmuck · April 15, 2024, 9:05pm

That is true but I’ve never used GitHub before but I know how to use it to store data, so that is half the battle.

Welcome to my world. Was it the script or TrueNAS?

Exactly. I personally do not like the auto-update feature, I like to have some control over what software is updated. But someone asked for it and it was an easy thing to add.

I may not need the 3 second sleep however early in the development there were people having problems, timing of the files, and adding a sleep statement helped. Maybe it is some BASH craziness.

I will try a few things to try and recreate the problem but to date I have not been able to. But never say never.

joeschmuck · April 15, 2024, 9:50pm

I was able to recreate it, as soon as that happened I fixed it. This has been like this for a while. What I can assume that means is not many people use the automatic update feature, which of course is fine. I need to run the test again on SCALE just to make 100% certain nothing else slips by.

Davvo · April 16, 2024, 6:02am

I run the auto update, but I just reboot when asked.

joeschmuck · April 16, 2024, 8:53am

It’s fixed and tested, well the fix was tested, now to test to make sure I didn’t break something else. I will release it before the weekend ends.

Protopia · April 16, 2024, 4:34pm

I would like to run Spencer.py as well, and I had to hunt this down.

Unfortunately, because the reporting in Cobia changed (possible the Netdata change), it is currently incompatible with Cobia (and presumably Dragonfish).

Davvo · April 16, 2024, 6:07pm

Wasn’t it integrated a while ago?

Protopia · April 16, 2024, 6:28pm

The issue is with Spencer not Multi-Report - and I haven’t tried it on mu Cobia system, just going on what it says on the Spencer Community page I linked to.

joeschmuck · April 16, 2024, 9:55pm

I do not use Spencer myself but I worked with it’s creator to integrate (call it) with multi_report. If you have any problems with it, I’d reach out to @NickF1227 (I hope this is the right Nick after the unfortunate lack of account migrations to this new forum.) He should be able to help out. I will assist as well should Nick reach out to me.

NickF1227 · April 16, 2024, 10:31pm

Yessir its me @joeschmuck

I have done a poor job at maintaining Spencer. I have been doing work towards adding reporting for zpool iostat and integrating that into Spencer, but I have not gotten anything to share just yet.

@Protopia While the logging in Cobia is different, the kernel logs Spencer is processing /var/log/messages still exist, and so Spencer should still function in that capacity.

Tested on Cobia, and Dragonfish RC1

I wasn’t sure if that would continue to be the case when I wrote the statement about Cobia, I welcome you to try it out and provide me with any feedback. I haven’t gotten much in the way of user feedback in general, so any information is helpful.