SMART test configuration - boot survival?

To configure SMART tests I grouped drives following my pool setup. Whatever the rational for choosing which drives to be tested the basic idea is to know what drives in what pool are being tested.

The configuration in Truenas Scale follows sda, sdb, sd"X" However, my understanding (I may be naive here) is that the assignment of letters “sd"X”" can change through boots, particularly if disks are added. If this happens then I do not know where the disks, being tested by SMART, actually sit in my pool setup.

Is this a concern? Am I on the wrong track in someway here?

Thanks!

That is true, the drive ID’s can change. Which ever drive is recognized first gets sda, second is sdb, and to on.

If you are using the TrueNAS GUI to assign specific drives (sda, sdb) and he values do change, they you are in the situation in which you described. And yes, it is possible to miss a drive being tested, or not in the order you desire.

Is this a deal breaker for you? I don’t know.

Shameless plug time: I created the original and now maintain a version of a SMART test monitor. This set of scripts also automatically test all the drives in a given period of time (week or month). Check out the link in my signature.

If you prefer to not use that, feel free to ask and we will give you sound advice.

2 Likes

Nothing like a reply from a TrueNas Guru!

Your script is already alive and (hopefully - if I’ve got the config right) working on my server. The disk check is confined to NVMEs whilst the reporting should be accross all disks. So I may be back as I try to tune that up.

I guess, with SMART the main idea is just to make sure all the disks (you want) are regularly getting tested? The main issue in this case, is scheduling to keep other disk activity (e.g., Scrubs) away from smart test times.

I know the topic of smart schedules have been beaten around, but what is your thinking – should I be doing long tests twice or once a month?

Thanks for all the work on the scripts. I appreciate the detail over the pass/fail in the GUI.

As anticipated, I’m back to interpret the report and because of an error.

The smart test report shows some lines with orange marking and others without. I’m not sure of the significance here. Also a blank for the last test age does that mean less than a day?

In another case there is not test report (right side). Is this disk getting overlooked?
spinning_rust_interpret_2

I think these relate to (my) interpretation and potentially adjusting some settings.

However, perhaps more problematically, it seems the nvme (4) are getting overlooked since I’m getting a mail with these errors:

NVMe status: Invalid Log Page: The log page indicated is invalid(0x2109)
NVMe status: Invalid Log Page: The log page indicated is invalid(0x2109)
NVMe status: Invalid Log Page: The log page indicated is invalid(0x2109)
NVMe status: Invalid Log Page: The log page indicated is invalid(0x2109)
NVMe status: Invalid Log Page: The log page indicated is invalid(0x2109)
type or paste code here

The start of the report starts with:

Multi-Report v3.0.7 dtd:2024-06-08 (TrueNAS Scale 24.04.2.3)
Report Run 15-Apr-2025 Tuesday @ 03:00:05
Execution Time: 4 Minutes : 15 Seconds
UPDATE AVAILABLE --> multi

I’m not sure what teh update is!

Thanks for any advice