Responding to your Feedback on 25.10, SMART, NVIDIA, and more | TrueNAS Tech Talk (T3) E045

On today’s episode of TrueNAS Tech Talk, Kris and Chris tackle the TrueNAS 25.10 update feedback head-on. They’ll explain the details behind the NVIDIA driver update, the SMART testing change, and even drive spin-down being not the best choice for your ZFS powered TrueNAS. There’s even more viewer questions coming today on native S3 services in TrueNAS, Intel GPU support for 26.04, and how you can figure out what API version to call without calling the API.

4 Likes

Maybe you don’t need a GUI page that allows users to run smart tests or interfere with truenas but possibly a page with information about what is going on would be useful.

Information is power.

I am not a power user.

3 Likes

Soooo , we don’t want to talk about a working SMB service I guess.

2 Likes

2 Likes

This has probably been discussed, I don’t come to the forums much. But, what I don’t understand about the SMART changes is you talk about wanting to prevent false positives and make the process of interpreting the data more robust. I can only assume with your knowledge way above mine that you know SMART is not an industry standard and each vendor has fields some of which they don’t publicly publish how they are to be decoded (needing to be reverse engineered in certain ways for certain models). For example, I have some old 12TB Seagate drives that Scrutiny thinks have failed because of the way it incorrectly interprets the “188 (0xBC) Command Timeout” field. But CrystalDiskInfo seems to have built a broader database for decoding the drive data and does not fail the drives. Are you really going to respond to bug reports about specific drive models causing false positive alarms. I doubt it. Are you leveraging an existing SMART decodde database or rolling your own? (rhetorical).

I could load up 25.10 and see what it does, but I was bitten by the Instances fiasco, so not really chomping at the bit to do that.

Sorry, maybe the comment is off base, but I think this is leading to a 737-MAX type situation; half-baked, partially hidden automation with undocumented rules that’s either going to pass some failing drives of fail some healthy drives because of unexpected values in data fields for certain drive models.

EDIT:
I guess you do mention this problem later in the episode. But you don’t really say how you’re addressing it.

3 Likes

To me, the scheduling of and interpretation of SMART tests / results are two entirely different things. They are related, for sure!

Moving SMART test scheduling into a background task that is done automatically has some merit. Though some admins like the ability to schedule long tests themselves since that allows them to test when the system is not in as heavy use (ie weekend for some systems)

I’d suggest there should be some more consideration around to what extent the scheduling ability of a system should be taken away from the sysadmin. It likely would be better to have an auto setting that is only triggered whenever the sysadmin sets nothing.

As for the interpretation of SMART, that is a whole different ball of wax and like you said will be generalist as well as specialist, depending on what drive, OEM, what edition / firmware of said drive is in use. Tons of potential permutations that will likely require a rather large DB that can handle all that.

A very good SMART interpreter that can diagnose properly among all those permutations would indeed be a big feather in the engineering hat. Will be interested to see if it can be done with the resources that ixsystems can bring to bear.

1 Like

Remember: .0 is beta, .1 is RC, and .2 is stable.

Don’t let the word “STABLE” on the download page fool you.

5 Likes

Well, IX has certainly made it clear what direction they are taking with diagnostic drive testing but it seems the end result has yet to be determined. While I disagree with removing SMART scheduling from the UI, I can understand it because as I recall other NAS manufacturers, like Synology software, will flag the status of a questionable drive but provide little info. Hope that as IX matures the backend processes in development they will disclose the secret sauce of testing going on there which I think will increase the confidence of the end users in the alerts/emails received. Those of us who like sifting through the various SMART data will continue periodic testing whether by CLI, scripts, or whatever just as always so I don’t understand the high emotions about this change.

1 Like

iX’ initial messaging about the reason for the change boiled down to “you can’t trust SMART, trust ZFS instead,” which is stupid, dishonest, and wrong. It didn’t help that their recommended replacement, Scrutiny, (1) doesn’t do SMART self-tests, so it isn’t quite a replacement; and (2) is unmaintained. And when we asked, in the most-active topic ever on this forum, and the most-voted-for feature request in forum history by a wide margin, that this critical feature be reinstated, they responded with even more gaslighting and shut down the thread.

If they’re now saying that they’re going to do SMART self-tests automatically, without the user needing to schedule them, great–we’ve been asking for that for 15 years. But that’s a very different message than they were giving even a few days ago.

8 Likes

It’s really not that complicated.

TrueNAS had a GUI for scheduling short and long SMART tests on the disks of your choosing.

If a SMART test fails, especially an LBA error, it’s an unambiguous sign that you should replace the disk (or check the HBA or connections).

There was another page that shows you the results of previous tests.

Failing SMART tests is not a false alarm.

Let use decide if we want to schedule weekly short tests or maybe quarterly long tests. Or whatever we choose. Give us a GUI page to manage that. They did. Then they removed it after many years. “A solution in search of a problem.”

No. Removing these simple GUI pages for configuring tests and reviewing the results does not change anything about “background checks”. They can coexist.

No. ZFS checksums and scrubs are not a substitute for short and long SMART tests.

It’s being made more complicated than it should be. This has nothing to do with “obscure SMART stats and metrics”. A failed test is a failed test.

On maybe an off topic note: I also hope that newer versions of TrueNAS do not automatically decide when to run SMART tests.

6 Likes

Agree with the rest of your post, except this part. With many drives needing over 24 hours to run a long SMART selftest, it should be up to the user if or when to schedule these.

6 Likes

A perfectly sensible course of action would be to treat them like they do scrubs–set up a reasonable default schedule,[1] expose it to the user, and let the user adjust as necessary.

Running them without telling the user about it is (IMO) better than not running them at all, but less good than telling the user when they’re going to run and letting the user adjust that as desired.


  1. I’d personally say the default scrub schedule is on the infrequent side of “sensible,” but it’s something. ↩︎

7 Likes

As an aside here, how cool would it be if ixsystems automagically determined if a HBA, disk drive, etc. is eligible for a firmware upgrade, TLER reconfig, and like operations that are known best practices but can be hard to implement for a novice sysadmin?

This would be a feature aimed at the CE market for sure, as it’s unlikely that any hardware shipped / supported by ixsystems wouldn’t feature preconfigured drives, HBAs, and so on. But it would be a big differentiator towards making TrueNAS different from the other prosumer NAS Systems.

1 Like

I can see lots of logistical challenges, but sounds like a great idea. Unfortunately, the heat death of the universe might happen first.

5 Likes

One of Asimov’s favourites. Who will need TrueNAS when we’ll have Multivac?

So, in the end, what should we do with the cron job that is sending a NULL result email every day? Should it be disabled, or removed?

1 Like

Do you want the SMART tests to run as you’ve scheduled them? If so, keep it as is.

It’s more a more confusing. In the podcast they said it’s always running. And the daily email says “result: NULL”. Very confusing. What is this cron doing? Is it essential?

I wonder if something is wrong with your cron command, i don’t get those emails with the converted cronjob in 25.10 and if i manually trigger the cronjob a window pops up which says that the job ran successfully and when it will run again automatically…

Looking over the smartctl documentation over here, tests in background are possible, ie smartctl allegedly can sense when drives are or are not idle, test only on idle.

So that is a credible argument for putting more stuff into background, ie if the user experiences no performance impact then they won’t mind not knowing that SMART is enabled / running / monitoring.

That said, I’d prefer a system where SMART is enabled by default on routine settings, ie short daily and long weekly, which the admin can adjust. This is a CE concern, but multiple folk here like to spin down drives and it’ll potentially freak them out to hear their system running drives for no apparent reason.

it would be smart for the GUI to indicate in the task section when smart tests are running, how long they expect to continue running, etc. so that someone trouble-shooting a system can see instantly what might be occupying the pool.

1 Like