Weekly drive maintenance

…just got me a

Critical
Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors.
2024-10-07 05:32:45 (America/New_York)

Critical
Device: /dev/sdb [SAT], 1 Offline uncorrectable sectors.
2024-10-07 05:32:45 (America/New_York)

and to quote @joeschmuck

So I’ll have to SSH to SCALE and run sudo smartctl -a /dev/??? and keep a note of changes over time. ( as noted above )

Since I don’t know the first thing about Linux to script something in it, I’ll SSH from windows and run a bunch of commands to gather how many drives there are, the smart tests and put all the results in a SQLite DB to keep track of what happened when ( already coding something for SNMP to gather the “oops” a box may have ( got 3 SCALE and 1 CORE ) from my Win PC.

What I need from a forum member is a list of commands to run in SSH ( that I’ve never used before but I guess I’ll learn ) to gather all relevant data.
If that can be in some formatted way, better but I can work with the human readable text too.
That’s my idea. If you think of something better, tell me that too.

Thanks

Wring your own script may be a learning experience, but why not just use Multi-Report?

wow, over 9000 lines.
I would have to understand “the ways of bash” to use it. :scream_cat:

The way I see it, writing something that runs from my PC is easier since I would not have to learn Linux or BSD, just run the commands and get the data I need.
It would surely be a discovery process but am up for it.
I was just now looking at the “/api/v2.0” but I think I’ll get better info from SSH.

Reading the .sh file …nicely done. Again, what I was aiming for is to catalog events over time to go against the HDD warranty. Also to see the “urgency level” to act on it based on the speed of the degradation.

Some of my drives failing are 6 years old, …the one I posted about is 305 days old so I have time for the warranty. Then again, the drive has to be so bad, say X error count in field ???, not a clue, to go against the warranty I would guess.

Is not a big deal for me to code ( in AutoIt ) and the SNMP is just a UDP message that in a local network works just fine for what I’ve seen.

The script you gave me does the testing but not all these other things.
But I have not read the whole code

Unless you feel in your experience that is not worth it because … experience !.
Experience that I don’t have and tinkering, I think that what I want to do is a good tool to have.

It’s a complete piece of software, you do not need to understand it to use it. Copy it to your NAS and invoke it regularly from a cron job. Done.

You don’t read the source of all other software you are using, do you? :wink:

No I don’t … always :stuck_out_tongue:

Is that I couldn’t even put it there. :baby:
But let’s say that I do and set the cron job. That would not give me the contrast of before and after that am looking for. Unless it does and I didn’t know :confused:

…reading these bash scripts found midclt call disk.query and it dumps a json with the list of drives that I was looking for :smiley:

This bash language is quite a thing. Me like it :nerd_face:

You can generate a daily or weekly email with that script to catch unexpected changes.

1 Like

BASH Sucks to write a program with! But it has its uses.
9000+ lines, it started out significantly smaller. Why so much larger? Features and almost half of it is for the -config routine. But then you have to account for every possible non-standard message/report you get. It is not fun having a report of a problem when you thought you covered everything. And it still happens after this long. Part of those 9000 lines is also a built in simulator where I can take the -dump data and run the script to experience what the end user is seeing, for the most part. This is for the sole purpose if troubleshooting a problem.

When you are writing a program just for yourself, it is easy and the code can be very small. When you give it to others and problems happen, those fixes add a lot.

The TrueNAS API is nice and I have been using it in a small script (1229 lines right now but looking to make it smaller). The next version of Multi-Report will be using it almost exclusively and hopefully only use smartctl to issue testing commands only. I have a script in testing right now but that is testing. Once all looks safe, it will be released.

The purpose of Multi-Report (originally called FreeNAS Report or something like that 10+ years ago) was to catch drives that were not being tested. FreeNAS 8.x and 9.x at the time had a habit of if you replaced a drive, the drive (call it ada0) would drop off the SMART testing list. Why? Who knows. But you could go months without even realizing it. So a script was born, for my personal use. I offered it to others and it became sort of popular.

The script does a lot more these days, including running SMART Self-test on NVMe drives. TrueNAS will be supporting NVMe drive testing finally but until it is in all products, I will still support it. Once I no longer need it, I will disable it and eventually remove it.

And Multi-Report does that as well in the form of a comma separated variable (CSV) file, AKA, spreadsheet.

Hum, other things it can do (by request), it can monitor drive temps and send you an email when it passes a threshold. It can log data each time your run it without sending out a full report. It does a lot that I would think, most people would not use, but some have asked for so I rolled it in.

Right out of the box, Multi-Report should run with only one setup change, adding your email address to send the report to. The defaults should be fine for most people. Oh yea, a big one… the script will attach a copy of your TrueNAS config file by default on each Monday the script is run. This is a big CYA for many people.

Feel free to examine the script and hopefully something will help you out. FreeNAS-Report is now maintained by @dak180 and does a good job at keeping it up to date as issues occur. His version is significantly smaller.

And if you want to learn BASH or the TrueNAS API, finding a project you are passionate about is a good way to learn. And if you find an issue with Multi-Report, please toss me a message.

2 Likes

Then that’s that. I don’t have to reinvent the wheel. Thank you for the explanation !.

hmm, not really, am old and my brain is quite cooked. On top of that I don’t have the time to be jumping on something like this head first ( that being obsessive by nature is how I find myself doing everything ).

I’ll be happy without the dread of “will I brake this thing by doing this or that ?” that comes from inexperience. Hence my “I want too understand BASH and every line in the script”. I don’t even know how to … anything outside M$ Windows. But I’ll get there.

The only thing I’d like to have is the reports in an SMB share and not use the email.
And I understand that if the path is just not there because the pool,dataset,whatnot broke, I will not have a report on what happened. But that is what I’d like to have. If it don’t take a big patch to get that done. No clue, maybe is just a configuration. Could you tell me how to go that way ?

Warning: Do not try to do that with the current version of Multi-Report. It is a mess! I think that all the time, which is why a new version is in progress. I am creating more individual functions and trying to add good comments. Why? Because I too have to look at it all and as I said, it is a mess. Trying to figure out why I did something which was generally a fix for someones unique problem.

Oh yes, that is an easy change to make :wink: :wink:
If you just want the statistical data (CSV) then run the script with the -s switch. No email, just the spreadsheet updated. You can then check out the drive values. If you run the script once every 15 minutes, you can watch the temperature of the drives change over time.

There is a variable you may want to change, that is how long the data remains before being purged by the script. Default is ~2 years as I recall. And of course you can change the path of the statistical data file but by default it will be saved in the script running directory.

If you wanted a copy of all the data that would have been sent in the email, that would require a minor script modification and the end user would be responsible for cleaning up old data.

1 Like

lol, is like you’re talking about me and my coding :laughing:

I marked this as solved but if I have a question I’ll post it here. ( hopefully is not strongly against this forum’s etiquette )

Feel free to toss me a message anytime. I’m more than happy to help.

1 Like

Can I go against the warranty if it was too hot ?


and would that 63 degrees be an exorbitant temperature ?

Some manufacturers deny warranty claims if they see trip temps like that.

1 Like