Multi-Report

joeschmuck · June 9, 2024, 1:07pm

That depends on what version of CORE you are running. Version 13.3 Beta now has smartmontools version 7.4 which supports NVMe drives and you would have that extra data if you were using it. Don’t get that confused with TrueNAS supporting NVMe self-testing, it does not, YET.

joeschmuck · June 9, 2024, 2:45pm

@oxyde I found the reason you did not get a warning message. One character is all it takes to throw a wrench into it. It will be in the next update which should be posted in a few hours. I have a Honey Do list I’m working on, this one takes me out of the house for about an hour. I just want to make sure the change did not affect when SCALE runs. I doubt it but I prefer to test what I can.

Actually, hold that thought. Looking at the script I do not have Compensation established for this value as I do for say UDMA_CRC errors. I will roll that in first and then release the script.

joeschmuck · June 9, 2024, 6:44pm

The script is posted. This will fix two issues:

Some NVMe drives reporting self-test results which includes ‘white space’.
No alarm/warning for NVMe Media Errors.

I also added drive compensation for Media Errors because @oxyde will want this with all the media errors present. I don’t know if those eventually return back to zero but I suspect not.

moelassus · June 20, 2024, 5:28am

First of all, thank you for creating this script. You’ve clearly put a lot of work into it given how configurable it is and the documentation is fantastic.

When I first ran the cron job it attached config files to the email. Then, I read about encrypting them so I set the password using -config. At first the cron job would throw an error about 7zip not being installed. I ran the script manually and it installed 7zip. My cron jobs no longer show any errors but I’m still not receiving any attachments since setting the password. Any ideas what might be causing this? I’m on Dragonfsh.

joeschmuck · June 20, 2024, 9:45am

To be honest with you, I have not tested that feature on Dragonfish however I will when I return home from work tonight while it is fresh in my head.

As for sending the attachments, by default it is once a week when the script is run on Monday. You can verify your setup by editing the multi_report_config.txt file, look for the section titled “###### TrueNAS config backup settings” (about 160 lines down) and then TrueNASConfigEmailDay=“Mon”. There are some notes there which show you the options available. It can only be one of those, not multiple values. Ensure you make the change maintaining the same formatting or the script may fail.

As for the encryption of this file, this feature only remains because it was requested. The TrueNAS Config file in itself is an encrypted file so does a person need to encrypt an encrypted file? Some folks still like to. When Dragonfish came out I decided to not automatically install 7Zip unless the encryption password was present. I will look at why it didn’t install as it should have but I do appreciate you letting me know. Whatever the fix is, I will have it in the next update.

Thanks for the kind words on the documentation, I know it needs to be changed for ease of use. Even I hate to read it. It will come, but since this is a labor done during my spare time, it could be a while. I’m a slow old man about to retire for the last time (I hope). 103 days and counting.

EDIT: User error, but this happens. Script works, Whew.

thomas-hn · June 22, 2024, 7:04am

Hello,

I really like the tool to monitor my harddrives and also a lot of thanks for the good documentation.
However, for one topic I was not able to find details and that is the column “Last Test Type (time conducted)”.

What exactly ist this column reporting? I assume it is the type of the last test that was run, right? But what does the “time conducted” mean? It does not look like the runtime of the test, because 200 hours for a Short Test seems a bit too long.
Also why is it called “Short OFFLINE”, what does the offline mean?
If a percentage and “remaining” is shown, does it mean that a test is running?
And last but not least, why is one of the cells in my report with a blue background color?

Thanks a lot,

Thomas

Davvo · June 22, 2024, 8:26am

It’s the time at which it was conducted in the hdd’s total-lifetime value (starting from 0 and adding up for each hour it is powered on).

Read here. TL;DR: it’s a very basic data collection.

Yes.

joeschmuck · June 22, 2024, 12:38pm

Take a look at this. I tried to make it as easy as possible to read. Anything which is not labeled, “should” be self-explanatory (I hope). If not, then I can update it.

Anatomy of Multi-Report Report

I will update this presentation when I make any changes to the Multi-Report output. The next version will have a few minor changes in the text section just to make things look nicer and cleaner, and add a tiny bit more data, but nothing significant. I’m working to reduce this over 400,000 character script into something more manageable for me. For you, it just works.

xeromist · June 30, 2024, 4:40am

Hi Joe,

I have a whole bunch of 4TB drives that for whatever reason have whitespace in the serial numbers. Actually appears as a line break in the report so the serial number is on 2 lines in the cell. Wreaks havoc with Multi-Report in several ways. Has anyone else reported this?

In 3.0 and below it seemed like this just messed up the csv data and made the report less useful, but with some of the more recent releases, it appears the CSV actually stops being written and the report ends up incomplete and distorted. I understand this might be enough of an edge case that it isn’t worth working around. I might just stay on 3.0 until all of these drives have been replaced.

dak180 · June 30, 2024, 11:32am

If you do not mind I would be very interested to know if you have similar issues with my report script.

joeschmuck · June 30, 2024, 2:18pm

If you would not mind running the script using -dump emailextra switches to email me some data I can troubleshoot with.

But no, I’ve never had a single report of this issue. That is very odd. I’ve never seen a serial number have white space in it.

Also, what email client are you using? This can and has in the past caused problems, not saying that is the situation, just trying to isolate the problem.

xeromist · June 30, 2024, 6:00pm

Ran the dump. Will be interesting to see if the dump can even capture the whole s/n. There should be two of the drives in this dump, but I have a dozen just like them if you want to see more. I think these were some pulls from an appliance, so the builder might have done all sorts of unholy things with the firmware, including getting cheeky with the s/n.

I receive my reports in gmail. Always have. The default view of gmail visually cuts off part of the report, but opening the full message in a new window lets you see everything. It’s all there. In the case of the broken report, the rust summary stops before all disks were reported and it just starts the nvme summary. Very weird.

xeromist · June 30, 2024, 6:47pm

I’m on Scale, so it took me a beat to figure out the bc situation. Anyway, the report ran but I’m not sure what it should look like normally. There is no spinning rust summary, at all. It just goes from the zpool status directly to the nvme status. I’m guessing that’s not working as intended.

joeschmuck · June 30, 2024, 6:49pm

Received the dump and the serial number is recorded that way on the drive, with four spaces.

"serial_number": "Z1Z5ST67    0000W510842M"

This is the first time I’d seen this. The data I received also looks like the format seems fine. You received the same email I did, open up the file labeled “email_body.html” in Chrome, FireFox, or Edge web browser. On my system all display correctly. What surprised me the most was a few of the files from the dump were not generated so I do see a point to update.

I see 22 drives and one NVMe drive. Please tell me exactly what is missing, drive ID (sda) or the last 6 characters of the drive serial number are fine.

xeromist · June 30, 2024, 6:53pm

I downgraded to 3.0. The broken report was on a newer release than 3.0. This is the mild scenario.

Let me update again and see if I can replicate the weird report I saw.

xeromist · June 30, 2024, 7:44pm

OK, you should have received a broken report from 3.0.5. Takes 10x longer than usual too. Not sure if the dump gives you this, but I also see this on the console:

./multi_report.sh: line 3541: /tmp/${tempfilepath}${drive}_${serial}_a.txt: ambiguous redirect
./multi_report.sh: line 3542: /tmp/${tempfilepath}${drive}_${serial}_x.txt: ambiguous redirect
grep: 0000W510842M: No such file or directory
grep: 0000W510842M: No such file or directory
./multi_report.sh: line 4765: [[: ---: syntax error: operand expected (error token is "-")
./multi_report.sh: line 4766: ( 19264000 - --- ): syntax error: operand expected (error token is ")")
./multi_report.sh: line 4765: [[: ---: syntax error: operand expected (error token is "-")
./multi_report.sh: line 4766: ( 19262203 - --- ): syntax error: operand expected (error token is ")")

joeschmuck · June 30, 2024, 8:14pm

Well yes, that is very broken. I can reproduce the results. Working on a fix now, if this new kitten will leave me alone.

xeromist · June 30, 2024, 8:23pm

Don’t rush. I understand I’m one guy with a batch of D-tier drives. My inconvenience shouldn’t wreck your weekend.

But also, thank you for your interest in my issue.

joeschmuck · June 30, 2024, 10:49pm

Too late. The fix was two minor changes, adding some quotation marks. However since I do not have an actual drive with a serial number with whitespace, you get to verify that it works. I will send you version 3.0.7a in a few minutes. And I have rolled it into 3.0.8 which I’m working on already but not ready to come out.

xeromist · July 1, 2024, 12:01am

Can confirm that the report works again and all drives are reported. Kudos for the fast response!