[Not accepted] Bring back SMART scheduling to UI

Did you ever consider TrueNAS’s logo colors?

No need to elaborate on FreeBSD’s Beastie’s colors etc. We’ve all seen the matrix. Haven’t we?

2 Likes

We of course are keeping an eye on this thread. We knew this would be a controversial design decision, as with anything these days, even cosmetic changes at times. :slight_smile: A full blog post with our rationale will be forthcoming in the next few days about why the direction change here.

Just a few quick things I want to clarify again though. SMART task scheduling being removed does NOT mean SMART is not in play at all. It is still being actively monitored with smartd, as well as coretemp and other mechanisms to check for various drive health factors by the disk management layers of TrueNAS. This is active for all detected drives, even if not actively configured in a pool. Alerts still can be and will be raised if a critical alarm condition is reached, sufficient enough that the drive should be replaced, or seriously looked at due to a problem.

Additionally a few of you old-timers maybe of heard of this little thing called ZFS, which was architected from day 1 around premise of “Your drives are lying to you” which we view as the true and final authority on when a drive has legitimately gone bad and needs to be evicted. Our decades of telemetry shows that more often than not, it’s ZFS itself being used as the trigger point for when to replace a disk. The old SMART mechanisms were anything but smart. A very mixed bag at the best of times, including the additional hassle to the administrator of “one more thing to configure” and keep an eye on.

So to answer the burning question for the group here, no, I don’t see us changing this direction and “bringing back” the old scheduling functionality verbatim. What I do see happening is we enhance the background monitoring further to look for additional triggers (Leveraging SMART or potentially other mechanisms) worthy of an alert being raised where we have a high degree of confidence that it’s not just noise and that you, the admin should take some action to touch the system and correct the situation. Thats what enterprises expect, a system smart (haha) enough to only really send me alarms when I’m legitimately going to send a tech to replace a disk, which costs everybody time and money. All while keeping my priceless data safe and accessible.

Of course one of the challenges of adding new alarms will be using probes that are consistent to monitor across a wide range of hardware and configurations. Always a big challenge with the great diversity of DIY builds. (Have you seen our bug tracker? Y’all sometimes have some crazy-ass wild setups)

As an example of a good item we’d consider, we’ve got one in the bug queue right now to make the temp monitoring a bit more robust to raise alarm at an earlier temp “warning” threshold. Requested changes like that will be considered as either features or bugs to solve where necessary. Of course even that is challenging to get right and may or may not fully work on all hardware. We’ve already seen some data where certain drives are reporting -100C operating temperature. Fun. Always keeps things interesting here in how we have to code around these misfit drives and why the ZFS creators in their wisdom architected where the assumption is that the drive is lying about everything it can.

2 Likes

Somehow, yours truly never manages to remember what the colours mean—and does not think highly enough of the movie to watch it a second time to check. This meme might eventually help me memorize that red is the good choice.
(However, I do keep a clip of the scene from the sequel where Lambert Wilson swears… Laughed my socks out on this one.)

And now we have another datapoint for the “how much iX truly cares about community input” question.

8 Likes

Ok, so a quick update! In just 6 days this feature request has reached new heights. Thank you everyone! :rocket:

:trophy: All time number 1 most voted Feature Request (with a massive margin)
:2nd_place_medal: All time number 2 most liked topic

Most active topic (Top)
:trophy: Week: #1
:trophy: Month: #1
:trophy: Quarter: #1
:trophy: Year: #1
:2nd_place_medal: All time: #2

At this pace, it’s honestly not far-fetched to think we might end up as the most active topic in TrueNAS forum history.

The current response appear more focused on dismissing community concerns and defending existing decisions, rather than engaging in constructive discussion about the feedback itself.

An overwhelming amount of the community has voted in favor of bringing scheduling back, and that message should not be ignored.

9 Likes

Hi Kris, and thank you for replying.

Given the above, I am somewhat surprised that a compelling rationale was not released alongside the removal of integrated GUI SMART scheduling. Moreover, I don’t see this change as enhancing the product, rather it’s the opposite by removing a GUI feature in isolation that a large percentage of your CE user base (and likely enterprise customers too?) are relying on.

Flawed as SMART data can be, it is used for a reason - some errors invariably point to issues that signal the end of life for a drive. I’ve also had my brushes with ZFS lying such as being told to destroy my pool just because a single electrical connector had gotten loose. If there are SMART improvements coming, then any GUI change should be made as part of your feature upgrades rather than the removal of a feature most sysadmins have used as part of pretty much every NAS setup.

I’d also like to think that your paying customers either have a really good idea what SMART errors are serious or not, either by virtue of having experts internally or being able to lean on your technical support staff as part of a paid support contract. If hard drives report crummy data, etc. that is an issue with SMART - but those issues have zero to do with enabling GUI SMART scheduling.

Again, I’m not against change, I jumped to SCALE after all, but removing GUI SMART scheduling is not like removing AFP, for example. After years of refinement (both at Apple and iXsystems), SMB was a fully-functional replacement / upgrade. Here, we are being pointed to an App for a project that is seeking a new maintainer and whose underlying software was last updated 1.5 years ago. Getting SMART set up now requires a bunch more steps / installs / configurations, whereas the previous process was a simple couple of clicks.

This is not an upgrade, this is a downgrade, which is why 1% of your forum user base voted for this “feature request” in a week. if iXsystems is serious about soliciting CE feedback, some further reflection on this UI change is warranted.

7 Likes

Big difference between “caring” and “running to service every ask from the community”.

If we ran this business purely on the asks and requests of the community based solely on voting for where to spend our limited resources, we’d have failed as a business a LONG time ago and now nobody would be getting TrueNAS. For free. Like the vast majority of folks on this forum do now. So while I can say that we absolutely do care, it is also carefully weighed against what we deem is the most appropriate direction for the product as a whole that is maintainable, sustainable and allows us to keep on providing TrueNAS to the world with the resources we have.

2 Likes

…and doesn’t run SMART self-tests in any event.

4 Likes

Hey folks. Thanks for all the feedback. And yes, we DO want to hear and like to hear opinions on this and all topics. The original ask here was to bring SMART scheduling back to the UI, we have discussed internally and opted to NOT proceed that route.

That said, we ARE open to constructive asks around what specific type of enhancements might be desired to the new active background drive health monitoring (of which SMART is already being leveraged). If specific examples of types of alerts & thresholds that make sense to the broader user-base are raised, those will be happily reviewed individually and decided based on their individual merits, maintainability and consistency across broad hardware configurations. I mentioned previously we already have one in the queue now for more flexible temp thresholds / monitoring, which is an example of a good request that hopefully we see more of in the future.

9 Likes