We’re bringing some SMART options back

It comes as a surprise to absolutely no one that the changes to SMART monitoring in TrueNAS 25.10 have been … controversial to say the least.

TrueNAS 25.10 removed the UI option to manually schedule SMART short and long testing. Notably, it didn’t “remove SMART” or prevent access to any of the more detailed metrics that were being polled by community scripts or solutions in the background. SMART has been, and will continue to be, actively used to monitor all connected disks. It will still react to critical alerts that require your attention, in conjunction with the much more reliable ZFS drive health monitoring and alerting.

These changes were made to streamline SMART monitoring and have greatly reduced the incidence of false-positive alerts. However, we understand that these changes didn’t perfectly align with the desires of the TrueNAS Home Lab Community for greater control and self-governance of their home built platforms.

So, the team is currently working on making some changes to TrueNAS to re-introduce some options to give you more advanced visibility and control mechanisms for manual scheduling of your SMART long and short testing tasks.

When will this happen?
We’re firming up the details. We’ll announce the official plans in March.

Are we just going to get the old cron-style scheduler back?
No. We want to make this something that’s able to be used by the community in their scripts and custom solutions - so think “API endpoints” and UI trigger buttons to start/view and not “manual CLI editing.”

I’m on TrueNAS 25.04, I don’t want to upgrade and “lose” SMART.
If you’re running 25.04 and have scheduled tests, you can upgrade to 25.10 and those tests will migrate and continue to run. TrueNAS will raise alerts on drives that fail a scheduled SMART short or long test, or trigger its threshold on a watched value. Again - TrueNAS 25.10 never “removed” SMART functionality.

I want to see every individual SMART statistic and graph it and trendline it over time and-
This was always possible, and still is - but it always required manual (or scripted) collection and parsing and will vary greatly depending on your drive models, firmware revisions, HBA/storage controller, device protocol, and an assortment of other factors.

Thank you for your attention.

  • Chris “HoneyBadger”
30 Likes

Could you - maybe - have a look on the Multi Report Tool which is great, please?

If you could integrate something with these features would be stunning.

See:

5 Likes

I’ll be the red t-shirt guy from blizzcon and ask: is this an early April fools joke? :rofl:

4 Likes

that’s actually in the draft notes but it got bumped to page 2 so I thought they didn’t want me to do it

“No.”

5 Likes

16 Likes

I’m still on 24.10 because of this! :stuck_out_tongue:

Anyway, this is the least bad news I’ve heard in a bit, though I’m sure I’ll find something to complain about.

Thank you.

I think a lot of people on this forum are part of a corporate world and know that the messenger has to communicate a message regardless of what is going on in the back room. And manage the fallout.

I’ve not been around that long, but the…ending…to the feature request thread was a good indicator that the previous line was unlikely to coexist with a strong community presence from forum users.

As it stands, I’m mildly optimistic (I liked the UI for setting up tasks, I don’t understand why that was removed as it logically didn’t fit with the excuse that results are misinterpreted), but sitting on 25.04 whilst still putting in the work (migrated HA VM today, docker stack to be moved, working VM to be clonezilla’d) so that I can move swiftly away from Truenas in the future if there are further issues.

But maybe that’s a little further away than where we were yesterday, maybe a swift move to 26 is on the cards.

7 Likes

ty honeybadger for the detailed explanation.

i already understood based on the previous explanations nothing was truly gone. just that it was one step away from an easily clickable UI button to do the short/long smart test, or to view the detailed smart data.

to do those things, you still could, except, now instead of having the smart ui in truenas where u can do a few clicks, you instead had to go command line and type the commands in shell.

e.g.

check current s.m.a.r.t status check

smartctl -H /dev/sdx

do short smart test

smartctl -t short /dev/sdx

do long smart test

smartctl -t long /dex/sdx

it was there all along. but what they may have under-estimated, was the home labbers that may not be into commandline stuff, and would rather have these things more easily accessible via the UI with a few clicks. like maybe they want to run a manual smart test but don’t know how. or maybe they dont read forums and realize it does still run scheduled smart tests. or… maybe they want to be control when it runs those tests, instead of it being set to some default outside their control?

I myself prefer clicking and ui, but i am still aware of the commandline stuff, and with google ai search you can figure out what to do. But…. this changed caused the most anxiety for the less tech savvy users i imagine?

also for people who thought scrutiny was abandonware, i noticed a recent update 2 weeks ago. to install that go to truenas apps, select it, then install. to access web ui for it, click on the app link also shown there after it’s installed.

3 Likes

I am one of those :smiley:

1 Like

Covered here:

Otherwise, the announcement is good news… but it’s hard not to make a parallel with British politics.

3 Likes

This is good news.:partying_face: I was wondering if there was any new information regarding disks spinning down.

6 Likes

Let me know if this should be a feature request.

I’m rather opinionated on the changes to SMART in 25.10 but as a gesture of goodwill, may I recommend something like a bright orange banner in the SMART webUI (however that returns) with something like…

“iXsystems recommends/cautions against the use of SMART tests and recommends most installations rely on ZFS scrubs. Read more here.”

…then link to (version-specific) documentation on iXsystems’ rationale? This would accomplish a few things and be a reasonable compromise:

  1. It points to new users that ZFS scrubs are the priority (I don’t think any reasonable person with all the facts would disagree with this).
  2. It articulates iXsystems’ rationale in the documentation and allows unsure users to become better informed before making decisions.
  3. It comes as a recommendation but not a rule.
1 Like

:hand_with_fingers_splayed:
I beg to disagree.

SMART tests and ZFS scrubs are both useful, and none should be given precedence, or priority, over the other. The former check the container; the later check the content.
SMART tests can not and do not repair corrupted data.
Scrubs do not assess the hardware. Contrary to a long SMART test, a scrub does not touch free sectors. At most, a scrub-initiated read may cause the drive firmware to assess that a sector is dubious and cause data to be rewritten or reallocated to a different sector, but such internal drive activity will not be reported to ZFS. By design, ZFS will strive to maintain data integrity even on dubious hardware—but maintaining data integrity silently is actually dangerous!

I plainly do not understand the rationale for promoting ZFS scrubs above, or as a substitute to, SMART tests.

11 Likes

Will it then allow us to turn off SMART polling on individual disks? Or implement the submitted patch that stops TrueNAS from polling drives that are in standby mode (link)? This is important for those of us who like to spin down drives (which I know you say we shouldn’t do).

2 Likes

The disagreement is welcome. That said, I hope you’ll understand if I stick to the softer “change management” elements rather than hard technicals as I think the technical discussion is best done somewhere other than announcements. My original comment was already bordering on relevance/topic.

  1. I suspect if you ran a poll of all TN users and asked them “Which do you care most about: (A) That the used (allocated) data in your pools is without errors, or (B) That unallocated space is without errors?” 90+ % of them would look at you a little strange and then answer “that the used data is without error”.

  2. SMART tests are essentially delegating to the drive firmware “go test yourself” which is fundamentally putting trust in the drive. Meanwhile the ZFS mantra (more or less) has been “always assume that all your drives are always lying to you all the time.” To say that SMART tests and ZFS scrubs are equal in priority IMO isn’t supported by reality or decades of storage engineering.

  3. Yes, drives may obscure problems on a model-by-model or firmware-to-firmware basis. But…

    1. …iXsystems says they’re making SMART monitoring improvements. Let’s give them a chance to include cases/situations like the one you raise.
    2. …ZFS cares about the data not the drives, so you’re fighting an uphill battle (IMO).
      1. Edit: For clarity, I still think the advocacy for easy SMART testing is an important one, this comment is still within the context of “which is more important?”.

My original comment is really trying to strive for a balance. I really appreciate what iXsystems is doing to keep TN going. I think this SMART drama is an unnecessarilly dirty chapter in the books.

To move forward, the lesson learned is to do change management and documentation much better, including communicating that change in the UI. Art of the possible. I think it’s possible to ask iXsystems to do this in addition to what they’re already doing.

If you still disagree with the guidance in this hypothetical documentation (I’m putting the cart before the horse here), then you can still criticize that after we have it (or suggest changes via a github PR).

Disagreement is welcome, BUT you are misunderstanding and misrepresenting what I wrote.

This is neither what I wrote nor what I want.
“Unallocated without error” is meaningless. “Unallocated with/without defect” has a meaning.
I care that data be without errors but I also care that drives are without defects—be that in allocated or unallocated sectors.

Yes. There is literally No Other Choice, as only the drive manufacturer knows its internals.
And I have absolutely no issue with this. Let hardware monitor hardware, and software monitor software—“walking on two legs” and all that.

Any hard data to support this impression?

Again, this is not what I wrote and what I meant. When a drive detects that a sector is dubious and needs to be reallocated, this is an internal bookkeeping operation: It need not report it to the host and its file system; it need only report it when queried about its health status. No obscuring here.

My whole point… but you’re missing it.
ZFS cares about data. I do not ask that TrueNAS cares about hardware; I ask that TrueNAS stay in its lane (software) and let the hardware caring about hardware—so let there be SMART tests for the hardware alongside scrubs for the data.

:index_pointing_up:
THIS
iXsystems pretends, or pretended, that ZFS scrubs can make up for SMART tests but the full rationale is not documented.
I do NOT trust iXsystems on that. (Or in general, by the way. Between the silent abandon of CORE, the successive app systems and the latest haphazard reworking of system containers there have been way too many cases of untrustworthy communication and overpromising-then-underdelivering on engineering.)

5 Likes

I’m going to be brief. Please assume goodwill. I had no intent of misrepresenting your comments, I was simply responding to how I interpreted them. If that was a misinterpretation, I apologize.

You and I probably agree on many things. Let’s leave it at that.

5 Likes

I fully agree to leave it at that, and let the thread be an “announcement.”
The “controversy” has been documented.

2 Likes

I will note that management took note of our suggestions on occasion.

For example, there are much better release notes now included in the update section of the UI. As of 25.10, there also seems to be far more robust replication recovery from transmission interruptions. This was promised in scale way back when but it finally seems to be a functional feature.

Apps seem to work and are better maintained and documented than during the Core days. Yes, there is still improvement potential but it’s a lot less dependent on tribal knowledge. (Never mind VMs)

There are likely many more examples that escape my failing wetware.

The ongoing conflict between vocal fans of SMART and engineering management at ixsystems continues to be a head scratcher for me. “Bringing back the UI feature to schedule SMART cron jobs” was easily the most popular feature request, ever, likely by an order of magnitude, garnering more votes in a week (before management closed the thread) than other feature requests had received in months.

Why management had to dig in its heels and insist it knew better than the user base was simply weird. Especially if it seemingly did so on the basis of vibes, not a well-thought through (at least publicly) basis. No white papers, no detailed description of the how and why what got changed, etc.

In my world, SMART and scrubs both have their place.

The whole reason that SMART came into being was that OEMs responded to market demands for indicators that a drive might be failing soon. It would be data integrity malpractice not to let users know when drives are exhibiting symptoms consistent with a looming failure, just as it’s driving malpractice to ignore screeching brakes on a car.

As far as I know, we still have not been told in detail how the SMART triggers in 25.10 are different from the before times, what the default scan intervals are, etc. This is the kind of documentation that should be developed, published and discussed well in advance of major system updates, not after.

SMART scans are not apps, not a hypervisor, or some other cool add-on to TrueNAS. No, they are integral to ensuring data integrity by helping gage what drives may be on the way out and should be replaced (even preemptively), depending on uptime / data integrity requirements.

It is likely because the ZFS adherents here care so much about data integrity / bit - rot / preservation that reactions around the SMART GUI scheduling feature removal were as pointed as they were.

2 Likes

We agree there’s a desire for SMART testing, but the above statement is not quite correct.

TrueNAS runs a business with a lot of hardware that we support; many hundreds of thousands of drives… some individual systems have over 1,000 drives. TrueNAS software manages all that hardware.

We changed the TrueNAS software to better support that hardware and reduce false failures while still getting alerts and warnings earlier. The efficiency of managing and supporting that hardware is critical to our business. Fewer drive failures increases customer satisfaction.

The new software (25.10) uses SMART data and ZFS data to make decisions and provide alerts. It actually makes more use of SMART data.. but it is selective. It doesn’t wait for a SMART test to make those decisions. It also doesn’t wait for ZFS scrubs… though they are still recommended.

So, what has changed is less use of SMART tests, not less use of SMART data. ZFS better detects corruption and uncorrectable drive errors. ZFS also better detects slow drives in a vdev or a pool. Scrubs are still important.

So, the debate is not between SMART and ZFS… it’s how to mix the 2 for best operational results.

2 Likes