Problem/Justification
The drive / filesystem health metrics which are monitored by truenas are not made visible / transparent to the user. Right now the only drive health indicator I can find is a drive temperature graph and with the removal of SMART test scheduling and result display the only other indicator of trust into the drive health got removed. The user has no clue what is monitored and how the current state of these metrics is. So he has to trust the statements of the developers (in case he searches and reads them) that monitoring is happening and alerts are raised if something is wrong. A user which does not actively search for an answer to this question may think that nothing is monitored. A user who knows that checks are performed may still wonder if the monitoring is really working right now if no metrics are visible. This is a bit like trusting the integrity of a backup without having at least once tried to restore it: the level of trust is not optimal even if everything is OK. Truenas states that it is the best system for data integrity, so the current state of metrics which are monitored to ensure this data integrity should be made clearly visible to the user at any point of time.
In the “Bring back SMART scheduling” feature request the Truenas developer Kris said:
“It (SMART data) is still being actively monitored with smartd, as well as coretemp and other mechanisms to check for various drive health factors by the disk management layers of TrueNAS.”
So then I would suggest to show the effort you are making and make these metrics visible in a dedicated menu.
Impact
This feature will benefit all Truenas users. It will improve the level of trust in Truenas as the best system for data integrity in the market. Most users of a NAS expect to see drive health or filesystem health metrics in some menu for being able to estimate or to proof the health of their system. This is important for them since they store valuable data on the NAS. If this is missing many users will become nervous or dissatisfied even if the best checks in the world are working in the background. Users will demand to get SMART scheduling back because it gives them metrics. And even if SMART checks are not the best tool to proof the drive health the users will continue to demand it and eventually become dissatisfied until the better metrics the system is using are made visible to the user.
User Story
The user can open a dedicated menu where he can see the drive health / filesystem health metrics which are monitored and their current state. Drive health metrics should not be buried somewhere between other metrics, see the drive temperature.
As said, Hopeless. The powers that be, have deemed this not needed. But they do tell you the Scrutiny or smartmontools (CLI) is still there.
Not sure if you have looked into Multi-Report, I’m not even sure that is what you want, but it is not GUI, but you do get a report. If you download it, Do not grab the Beta versions. They are on GitHub for those folks who want to test out the Disk Layout feature which is in development, and very close to a release. The Beta “should” work just as well as then non-beta, but it is beta for a reason.
I don’t want that discussion here go too much into the SMART direction. My suggestion is to make the drive health metrics that are checked by truenas visible to the user. I don’t want to dictate which metrics should be used. As far as I understood the point of the truenas developers is that they have chosen a better algorithm / a better choice of metrics.
…’better’ is a strong word & I’ll argue unproven as of yet. That being said, I’m still all for there being built-in visibility for whatever different metric(s) IX is now using.
Yes, that is what has been said and the algorithm is all they want you to care about.
You need to see this from a corporation perspective. Who in IT is going to be looking at hundreds of drives a day to see what the metrics are? A simple alert of a failing or failed drive is all they care about, replace the drive in a few minutes and that IT person is free to do better things with his/her time. It is always about money. And from that perspective, I agree with them. It doesn’t mean I like it.
Just because I agree doesn’t mean I wouldn’t like the ability to see some values, but this really only matters to small offices and home users, we want to see this kind of data. We don’t have a case of new drives in a corner of the room and would easily swap it out.
I understand the point that people in a corporation environment don’t have the time to look at metrics the whole day.
But I think that it is not good that they can’t even if they want. There may be some guy in IT who is wondering if he can trust this truenas thing he just bought especially if he is responsible for hundreds of servers. Or the IT guy of a small business with just 3 servers. He maybe has the time to look at metrics once a week and is feeling better with it.
Why hiding the things that are evaluated in the background anyway? What is the downside of having the ability to see the decisions? Is it bad to give the BUSINESS users the choice?
Any IT person who knows anything would/should be able to go look at drive SMART data. It isn’t difficult and there are third party apps to handle it via a nice GUI. Scrutiny lets you look but you can’t touch. That is what I think you are asking for. It already exists. I will admit that it is not being maintained but this is what this company put forward when they made the decision.
If you really want to know the metrics, what this company is looking at, look at the code and you should be able to see exactly what they are looking at. But that will not display them for you. However, you could use those metric and develop a nice GUI application yourself and offer it up for folks to use.
This will be my last posting on this thread. The horse is dead.