Ok, so I obviously misunderstood how smartd was supposed to work, and as I understand it now, it won’t be working at all in 25.10 and beyond unless we enable / trigger it ourselves?
Beyond that, I presume this also means that OEM/SKU specific alerts previously associated with smartd (like dropping helium levels) will no longer trigger warnings from the NAS, cause it to mail us unhappy news automatically, etc, correct?
I genuinely think it was intended to run, otherwise I have no idea how the drives are being monitored, but then again, I most certainly don’t know everything, not even close.
I guess they could have rolled a version of their own, but that is time and money when you already have a product which performs this action, and does it very well. I’m hoping @HoneyBadger responds, or maybe @kris does with some sort of answer. I hope the answer is “Thanks for letting us know, we will work on that for the next release.”
As for running smartd ourselves, we might be able to run it from Post-Init in the GUI. With that said, I’d rather know more about what is going on before I assume something and run down the wrong path.
In community edition, the only SMART attribute that is monitored is “uncorrected errors”. There is also monitoring of test failure, but I don’t consider that a attribute.
With SATA drives only the attribute with ID 187 is monitored. Which is an attribute that isn’t even used in my drives. So the entire thing is useless for me and probably many others.
All of this points to the need for better documentation re: the changes being made under the hood.
We went from making some changes and removing the SMART GUI scheduler pane due to insurmountable coding issues to allegedly jettisoning most of SMART monitoring other than 1 or 2 attributes. More documentation is needed.
Sysadmins shouldn’t have to parse through GitHub commits to figure out what infrastructure is being removed. If a lot of thought went into these changes then the documentation should be available, easily condensed and presented in the release notes.
For example, a simple table that illustrates what SMART attributes TrueNAS will continue to monitor, which it previously monitored, and what Scrutiny and @joeschmuck excellent collection of scripts can do to re-enable monitoring of various attributes that 25.10 and beyond no longer intend to review.
Thank you. So they did spin a custom version. But they sure did use the term ‘smartd’ in the video a lot.
Not being a python programmer, a lot of that escapes me.
I guess the point really is, so long as TrueNAS is monitoring the drives for serious errors in real-time, then the outcome is the same, regardless of using smartd or a different method.
EDIT:
Thank you for the kind words. My scripts will continue to report as they currently do until such a time that they prove to be useless, or my death, or if it is time to pass the buck to someone else.