NVMe Short S.M.A.R.T. Test Returns "failed segments" For All Tests

JMalland · August 4, 2025, 6:08am

I’m still pretty new to TrueNAS, so please forgive me if I’m yet another bozo who didn’t take the time to research similar issues. I haven’t found anything similar to what I’m experiencing, so now I’m posting about it.

I have 5 drives on my TrueNAS server. 4x 12TB SAS HDDs, and 1x 128GB NVMe boot drive. After configuring my server, I set up a regular interval for SMART checking, and went to test my drives. The 4 SAS drives checked out, and in the TrueNAS web GUI I could see the successful tests in the SMART Test Results. The only drive not listed is my NVMe.

For my own sake of mind, I tried running smartctl -t short /dev/nvme0n1p1 and took a look at the results:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       ORICO
Serial Number:                      P0C03PDTYQS2N4049VDB
Firmware Version:                   W0830B
PCI Vendor ID:                      0x126f
PCI Vendor Subsystem ID:            0x2261
IEEE OUI Identifier:                0x5cd2e4
Total NVM Capacity:                 128,035,676,160 [128 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          128,035,676,160 [128 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            efcdab 0000000000
Local Time is:                      Mon Aug  4 00:36:35 2025 CDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x0015):     Comp DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        42 Celsius
Available Spare:                    100%
Available Spare Threshold:          42%
Percentage Used:                    0%
Data Units Read:                    150,195 [76.8 GB]
Data Units Written:                 563,197 [288 GB]
Host Read Commands:                 1,544,060
Host Write Commands:                15,201,384
Controller Busy Time:               59
Power Cycles:                       21
Power On Hours:                     314
Unsafe Shutdowns:                   6
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Short             Completed: failed segments              313            -     -   2   -    -
 1   Short             Completed: failed segments              313            -     -   2   -    -
 2   Short             Completed: failed segments              313            -     -   2   -    -
 3   Short             Completed: failed segments              313            -     -   2   -    -
 4   Short             Completed: failed segments              312            -     -   2   -    -

From a little bit of research, it sounds like “failed segments” could mean a failure to read the drive. In the few weeks I’ve had it, no more than 300GB has been written to it, so I can’t understand why I encounter this problem.

Looking into this some more, I tried using Multi-Report, just as a tool to give more detailed SMART results for all of my drives, in addition to the NVMe. It, too, reflected the failed segments as a critical error.

I think it’s pretty clear that I don’t know what’s going on, but if anyone could point me in the right direction, I would truly appreciate the help.

NugentS · August 4, 2025, 11:05am

Try running a long test - as that (at least in theory) actually tests the media rather than just a brief test.

Having said that - it doesn’t look good. Whilst running the long test, work out what the RMA process is - or just buy an alternative

Make sure you have a config backup available (and not stored on the NAS)

joeschmuck · August 4, 2025, 2:35pm

@JMalland
TrueNAS currently will not actually test NVMe drives. I have not checked 25.04.2 yet but I have no reason it was introduced in this version. I hope for 25.10 to include it but I actually stopped hoping a while ago.

Your drive supports NVMe Standard 1.3, guess what, this standard does not need to support SMART testing. If it does, it may not show the results as you planned. I’ve seen it a few times in the recent past. That does not mean we cannot try to make it happen.

As @NugentS said, run the long test smartctl -t long /dev/nvme0 and with it being a 128GB drive, wait 5 minutes and then run smartctl -a /dev/nvme0 and post post results.

Let’s say the test doesn’t look like it ran properly, we can also run it the old fashion way, (SCALE ONLY) nvme device-self-test /dev/nvme0 -s long and this should run the test. Again, wait 5 minutes or longer so the test can complete. Smartctl will still show the results so use smartctl -a /dev/nvme0

Let us know if you need more assistance.

JMalland · August 4, 2025, 3:29pm

Thanks for the reply.

I ran the long test through both smartctl and nvme device-self-test to no avail. The results continue to display “Completed: failed segments”.

I guess this just means my drive doesn’t support SMART testing.

Although it doesn’t appear to run any of the tests, the smartctl output has always indicated the tests were passed. Is this accurate, or does it just display this because the tests couldn’t be done?

JMalland · August 4, 2025, 3:32pm

Smartctl report:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       ORICO
Serial Number:                      P0C03PDTYQS2N4049VDB
Firmware Version:                   W0830B
PCI Vendor ID:                      0x126f
PCI Vendor Subsystem ID:            0x2261
IEEE OUI Identifier:                0x5cd2e4
Total NVM Capacity:                 128,035,676,160 [128 GB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          128,035,676,160 [128 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            efcdab 0000000000
Local Time is:                      Mon Aug  4 10:26:00 2025 CDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x0015):     Comp DS_Mngmt Sav/Sel_Feat
Log Page Attributes (0x03):         S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        44 Celsius
Available Spare:                    100%
Available Spare Threshold:          42%
Percentage Used:                    0%
Data Units Read:                    150,232 [76.9 GB]
Data Units Written:                 574,833 [294 GB]
Host Read Commands:                 1,544,592
Host Write Commands:                15,732,822
Controller Busy Time:               60
Power Cycles:                       21
Power On Hours:                     323
Unsafe Shutdowns:                   6
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Extended          Completed: failed segments              323            -     -   2   -    -
 1   Extended          Completed: failed segments              323            -     -   2   -    -
 2   Short             Completed: failed segments              313            -     -   2   -    -
 3   Short             Completed: failed segments              313            -     -   2   -    -
 4   Short             Completed: failed segments              313            -     -   2   -    -
 5   Short             Completed: failed segments              313            -     -   2   -    -
 6   Short             Completed: failed segments              312            -     -   2   -    -

joeschmuck · August 4, 2025, 3:48pm

First of all let me say that I do think your drive is okay.

Run this command nvme id-ctrl /dev/nvme0 -H | grep -i "self-test" and you are looking for a result that states it supports SMART Self-test or Not.

Let me add a warning as well. The nvme command if misused can wipe your NVMe drive. I’m saying this for those folks who read these commands and go find the manual for nvme, then start poking around using different commands. No one should be doing that without knowing the outcome of the command, or just accepting the risk and learning the hard way what not to do.

The command above is safe as written.

JMalland · August 4, 2025, 3:57pm

You’re correct, it does say Device Self-test Supported.

I’m glad to hear you think the device is okay. It’s been operating fine, but I honestly have been a bit paranoid.

I ordered two of the same drives, and the first one I used failed within 24 hours. My HDD data was still backed up, but it was the unofficial storage pool recovery I didn’t need .

Do you know if there is any configuration setting in multi-report which would disable or restrict the NVMe drive from flagging as having a critical error?

In the emailed report, the Last Test Type column is flagged critical (presuming because the drive has no last successfully executed test?)

joeschmuck · August 4, 2025, 4:33pm

Maybe this will work…
Open the file multi_report_config.txt and roll down to about line # 368 and look for Ignore_Drives_List="" and change it to Ignore_Drives_List="P0C03PDTYQS2N4049VDB" which is the serial number of the drive to ignore.

This should ignore the drive all together, which means it will not be listed at all in Multi-Report and that I don’t think this is a great alternative, especially if you plan to add more of these model drives. If you send me a dump then I can see about making an update to ignore only the smart test results. I just need to think about how I want to do it and I suspect it will be in the custom drive list. Then I can send you the latest version and you get to test it out for me

Also, I will look into the smart testing results/output some more to see if this manufacturer has something unique I can exploit. That would be good to find a valid Self-test result.

joeschmuck · August 4, 2025, 4:40pm

@JMalland
When you get those other drives in, run a SMART Long test on them. If they test okay, then the drive you have is likely bad. If they test identically then it is just crappy firmware.

A few questions:
How is the NVMe connected to the computer?
What is the actual model number of the NVMe drive? ORICO means not much other than the company selling it.

JMalland · August 4, 2025, 4:58pm

So, I have my replacement drive coming in today for the failed drive. I’ll be sure to run a SMART test on it when I get it set up. It’ll have to be on a different computer though.

I bought two SSDs just for fast host OS read-write capability. Running TrueNAS on a Dell PowerEdge T330, with a RAIDZ2 setup for my bulky data, and a ZFS-Over-iSCSI share to run my Proxmox VMs on ZFS volumes. The Proxmox server has a good amount of resources, RAM, CPU, but I ran out of HDD bays and figured the PowerEdge was a good expansion.

The first SSD I installed in the PowerEdge with a PCIe M2 NVMe connector:
Non-Volatile memory controller: Silicon Motion, Inc. SM2261XT x2 NVMe SSD Controller (DRAM-less)

I’m not 100% certain it failed, but after leaving an overnight data transfer of 8TB, I woke up to find the VM booted into a read-only filesystem. I couldn’t figure out how to fix this, and at the time I didn’t run a SMART test (stupid, I know). Since I had a second drive and no data loss, returning it just seemed easier. They were cheap enough I figured I’d buy another, and if any more fail I just won’t ever buy from Orico again.

The current SSD is my second one, and I’m guessing it’s just crappy firmware. It was a cheap $15 drive from Amazon. It’s an 128GB Orico D10 M2 NVMe SSD.

Even if the boot drive completely fails I can still recover my storage pools, which is my new favorite thing about TrueNAS.

joeschmuck · August 4, 2025, 6:18pm

I found that D10 model after I posted the my previous message. For $15, you cannot expect a lot.

If the drives all test with the same type of fault, I’d just run the NAS with this as a boot drive. The worst thing to happen is it fails to bootstrap. Just keep a current copy of your TrueNAS configuration file on a different computer or flash drive and restoration is a breeze. I will start to update my script to disregard a SMART test failure, but it will be under the Custom Drive Configuration section of the script.

Since you have Multi-Report already, it will be an easy upgrade for you. It will take me some time to generate the update, I tend to take my time and it is a complex script, so it is best to get things correct the first time.

JMalland · August 4, 2025, 7:15pm

Yeah, it is the boot drive. I just figured that, if I’m checking the other drives, may as well check this one. Also handy to have a regular reminder of how much wear is on it.

Take whatever time you need; I’m in no hurry for any updates. Your script is really a work of art and its dynamic configuration has an insane amount of potential use.

I appreciate your help, and continued development.