Question about the behavior of SMART monitoring in 25.10

Hi

A quick question about the behavior of SMART monitoring in 25.10

The release notes indicate:

SMART test scheduling UI is removed

SMART monitoring is handled through dedicated applications or user-managed scripts

TrueNAS continues to run continuous background monitoring that periodically polls SMART attributes from all drives. The system automatically detects and alerts on critical disk health indicators:
Uncorrected read, write, and verify errors
SMART self-test failures
Critical SMART attributes that indicate imminent drive failure
Drive temperatures using the enhanced drivetemp kernel module

Do those indicators update by themselves, or is it required to schedule SMART tests for that?

I feel like no one really knows right now. We don’t really know which pointers are monitored and if or how they’re updated. There’s a huge lack of documentation of how the “new” system is implemented…

1 Like

#1, 3 and 4 do not require SMART self-testing.
I suspect #2 depends on a SMART test having been performed. So in other words the default configuration does use the self-test log, but doesn’t contribute to keeping it updated.

In contrast to some other users here, I question how common SMART self-test failures are without any other indicator what-so-ever.

My experience with failures recorded in the self-test log is that they are accompanied by raising errors numbers in the attributes. In fact, raising error numbers are often the reason you specifically want to run a SMART test.

Having SMART tests run proactively every day or several times per month is something I have come to view with scepticism.

I suggest a couple of steps:

  1. implement daily short and weekly long smart tests with crontab - see documentation.
  2. install multi-report and set it up. It will take a while since it’s pretty comprehensive and also depends on being able to email you.
  3. install scrutiny app. It may be abandonware, not updated since over a year ago, and may be looking for a new maintainer, but it’s better than nothing.
1 Like

IX marketing take note: “TrueNAS, it’s better than nothing.”

5 Likes

Hijacking the thread slightly, but it is related to the behaviour of SMART under 25.10..

Has anyone figured out where the tasks are scheduled under this release? I have a set of noisy spinning rust, and I’d like to stop smart from running in the middle of my working day, as the NAS is on my desk.

They should be set up as cronjobs. Existing smart tests from 25.04 get migrated to cronjobs.

example cronjob:

midclt call disk.smart_test LONG '["*"]'
midclt call disk.smart_test SHORT '["*"]'

will run a long/shot test on all disks at a time you can choose.

Edit:
If you started on 25.10 there are no smart tests scheduled by default. It may be that the new temperature monitoring kernel module keeps your disks from going to sleep.

I started on 25.04, and I don’t remember setting it to run at an odd time, but the noise/behaviour absolutely sounds like SMART to me (to the point that I can replicate it by manually triggering a short test against these drives)

I’ll go digging for those jobs, thanks!

In 25.04, there is a menu to set up SMART schedules yourself. IIRC it’s inside the storage GUI section.

A separate GUI window allows you to set up the NAS re: whether to contact you given a particular emergency. That includes high temperature, disk failure, etc. I forget where that was done in 25.04 because I just kept my old settings from Freenas 9-13 and TrueNAS 22-25.x

If you find out where / how all those jobs are scheduled, that would benefit the community too.

You only need to do the second step which is to install multi-report. With it you may schedule automatic short and long SMART tests and get detailed reports as well as a csv stats file for ong term drive stats.. The script wil provide much more data than just smart tests and will also backup your config file automatically. It’s not hard to set up.

1 Like

Hi all, My First post and its a wall of text.. hope you endure! :sweat_smile:

I am new to TrueNAS, and I wanted to share my first experience because it was a bit of a shock and I think it shows why removing the SMART UI is more than a cosmetic change.

My setup and expectations

I have just built what, for me, is a very big investment (for a normal home user):

  • Aoostar WTR Max, maxed out with 96GB ECC Memory

  • 4 x 18 TB refurbished HDDs from Amazon (planning RAIDZ2)

  • 5 SSDs (boot in raid, l2arc cache, fastpool drives in raid)

  • TrueNAS SCALE (latest)

At this point I was All In, this is heading towards 3000 USD with everything included. I have owned a Synology nas and several Netgear ReadyNAS units over the years, so I am not new to NAS in general. But I am new to ZFS and TrueNAS specifically.

Im a Senior DevOps Engineer and having managed IT systems all my career for 20+years and what I know from the traditional NAS world is:

Before trusting disks with important data, you burn them in and run SMART tests. Especially large drives. Especially refurbs.

So after installing SCALE, my very first instinct was:

“Let’s run SMART tests on these 18 TB refurbs from the GUI before I build the pool.”

The surprise: there is no SMART Tests UI

I went looking for the familiar SMART Tests section in the web UI, and it simply is not there anymore.

No obvious place to:

  • schedule short or long SMART tests

  • run a quick long test on a specific drive

  • see recent test results

For a product that focuses so much on data integrity, that felt really strange.

The messaging from iX (from reading around now) seems to be something like:

“Just trust us, TrueNAS will monitor SMART and disk health in the background and alert you if there is an issue.”

That might be fine for ongoing monitoring once a system is in production, but it completely misses a critical phase:

Initial qualification and burn-in of drives while you still have an RMA window.

I am not willing to “just trust” refurbished 18 TB drives that I have not stress tested, especially when I only have a limited time to send them back to whereever I bought them from if they look bad.

What I had to do instead

I dropped to the shell and did it manually.

For each drive I:

  1. Ran a destructive burn-in with badblocks

    I used badblocks -wsv on each of the 18 TB drives. That alone has taken about 46 hours of continuous testing so far.

  2. Ran SMART long tests

    After that, I started SMART long tests on the drives. That is going to be another ~24 hours or so before I am comfortable saying:

    “Ok, these drives are probably safe to start using in a pool.”

Only after all of this will I feel OK about building my RAIDZ2 and storing real data.

This is exactly the sort of workflow that a NAS appliance GUI should help with, not hide. I managed it because I am comfortable with the shell, but:

  • A non technical user will never go through all of that.

  • They will just accept the default, build the pool, and “trust TrueNAS” to let them know later.

  • If a refurb drive turns out to be marginal 3 to 6 months down the line, the RMA window is gone and they eat the cost.

For large, expensive disks, that is not great.

Why this feels like a regression

Even if iX considers the old SMART UI obsolete or buggy, it was still:

  • easy to find in the GUI

  • a clear place to run tests right now

  • a simple way to see “this disk passed or failed SMART long on date X”

Right now, in 25.x:

  • There is no built in GUI based “burn in” path for disks.

  • There is no obvious, guided way for a new user to run SMART tests before trusting their hardware.

  • The only real answer is: use the shell or install a third party app like Scrutiny.

Maybe the target audience is supposed to be only experienced admins who are comfortable with CLI and rolling their own testing. But from the outside, TrueNAS still looks like it wants advanced home users and prosumers too. Those people absolutely know what SMART tests are and expect to see them in a storage UI.

I am very new to ZFS and pool management, so I am relying heavily on the TrueNAS UI to guide me. SMART tests are one of the few things I already know from previous NAS setups, and it was jarring to discover they were removed and effectively replaced by “just trust us, the health system will handle it.”

I still like TrueNAS, but this was a bad first impression

To be clear, I am not leaving TrueNAS over this.

  • I really like TrueNAS SCALE so far outside of this, it looks truely solid.

  • I have heard great things about it from YouTube channels like Lawrence Systems and other NAS videos.

  • I love that I truly own my NAS now: my own hardware, my own layout, no vendor lock in like with Synology or ReadyNAS (this was a major factor for me in purchasing a beast nas for home usage in homelab that will be used extensivly).

Going from those platforms to TrueNAS and feeling like I finally own the hardware is a big positive. But after spending this much money on a new build, not being able to easily verify that my drives are healthy from the GUI was a big shock and a letdown as a first time experience.

What I would love to see

Even if iX does not want to maintain a complex SMART scheduler UI anymore, I think there is a reasonable middle ground.

For example:

  • A simple “Disk Health” page for drives (there is already a drive page so why not there) with:

    • a clear health indicator

    • a short explanation of why it is in that state (for example “No errors found including our internal smart interpretations, last tested x and y”) (you get the point)

    • buttons to “Run short test” and “Run long test” for traditional users (to ease those who do like to run their own)

  • A “Burn in” button that:

    • warns that it is destructive and for new or empty drives only

    • runs a sequence like write test + SMART long test

    • then clearly reports “burn in completed, no issues found” or “burn in found problems, consider replacing this drive”

That kind of UI would keep the background health system and alerts that TrueNas prefers, but would also give users a basic, guided way to do what most people consider best practice for disks, especially big and expensive ones.

Right now there is a gap between “we will watch your disks for you later” and “I need to know if these 18 TB refurbs are trustworthy before I ever put data on them.”

Thanks for reading. I will keep using SCALE and learning ZFS, but I hope this feedback helps show why the removal of the SMART UI matters to users who are trying to do things the careful way.

3 Likes

It matters to many of us, which is why you saw not one but two threads in the feature request section of the forum asking ixsystems to reconsider and / or provider better alternatives than an abandonware app or a cron job workaround that is likely every bit as buggy as the old GUI middleware approach was.

You’d also have my vote for a disk qualification / burn in feature request inside the storage menu where disks can get qualified in parallel by the middleware. That would eliminate the possibility of someone accidentally bad blocking an assigned pool drive, could allow the addition of a progress bar, no need to learn tmux, etc.

I doubt it will be implemented though as that kind of CE-user friendly feature would intrude on their paid support model, where NAS’ are only shipped with pre-qualified drives and any replacement drives are also pre-qualified before being shipped to customers. In other words, if you’re a paid customer, this is a feature you’d never need.