Bug? Optane nvme drives after update to 25.10.2.1 failed

I had a mirrored vdev of 1tb optane memory < 1tb nvme 32 gig optane> testing as a cache vdev on my box and they have failed. I need to remove them from system and try them out in another box to see if this is a bug with truenas or both drives died at the same time. they show in disks but now show as 0 byte drives.

What commands do you all want me to run?

lsblk -a

nvme2n1 259:0 0 0B 0 disk
nvme0n1 259:1 0 0B 0 disk
nvme1n1 259:4 0 8G 0 disk
└─nvme1n1p1 259:5 0 7.9G 0 part


smartctl --all /dev/nvme0n1

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.33-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error


I’d start there if it is at all convinient - likely faster to do manual labour for 20 minutes than wait for us to check rando things in cli.

Other thought without removing anything is to check how they show in bios as that should also show how much space is on them.

thanks

i would start with lspci to see if they are detected by the kernel

there is no general issue with optane and 25.10.2.1 (it sees mine fine)

also did you do a genuine power down and reboot?

after a reboot they show again:

image

2 Likes

83:00.0 PCI bridge: ASMedia Technology Inc. ASM2812 6-Port PCIe x4 Gen3 Packet Switch (rev 01)
84:00.0 PCI bridge: ASMedia Technology Inc. ASM2812 6-Port PCIe x4 Gen3 Packet Switch (rev 01)
84:08.0 PCI bridge: ASMedia Technology Inc. ASM2812 6-Port PCIe x4 Gen3 Packet Switch (rev 01)
85:00.0 Non-Volatile memory controller: Intel Corporation Optane NVME SSD H20 with Solid State Storage [Pyramid Glacier] (rev 03)
86:00.0 Non-Volatile memory controller: Intel Corporation Optane NVME SSD H20 with Solid State Storage [Pyramid Glacier] (rev 03)

nvme1n1 259:0 0 953.9G 0 disk
└─nvme1n1p1 259:1 0 951.9G 0 part
nvme0n1 259:2 0 953.9G 0 disk
└─nvme0n1p1 259:3 0 951.9G 0 part

Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number: INTEL HBRPEKNL0203AH
Serial Number: BTPG11400CW61P0B-1
Firmware Version: HPS1
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Tue Mar 10 16:57:37 2026 CDT
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x06): Cmd_Eff_Lg Ext_Get_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 4.00W - - 0 0 0 0 0 0
1 + 3.00W - - 1 1 1 1 0 0
2 + 2.20W - - 2 2 2 2 0 0
3 - 0.0300W - - 3 3 3 3 2000 5000
4 - 0.0040W - - 4 4 4 4 5000 9000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 44 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 3,203,625 [1.64 TB]
Data Units Written: 4,307,290 [2.20 TB]
Host Read Commands: 16,682,335
Host Write Commands: 26,839,873
Controller Busy Time: 506
Power Cycles: 122
Power On Hours: 1,792
Unsafe Shutdowns: 26
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
0 Short Completed without error 1191 - - - - -
1 Extended Completed without error 884 - - - - -
2 Short Completed without error 572 - - - - -
3 Extended Completed without error 180 - - - - -
4 Short Completed without error 117 - - - - -

root@truenas-01[/mnt/truenas01-9tb/home/admin]#

I’ll repost here if this happens again.

second optane:

=== START OF INFORMATION SECTION ===
Model Number: INTEL HBRPEKNL0203AH
Serial Number: BTPG1050042E1P0B-1
Firmware Version: HPS1
PCI Vendor/Subsystem ID: 0x8086
IEEE OUI Identifier: 0x5cd2e4
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Tue Mar 10 17:02:58 2026 CDT
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x06): Cmd_Eff_Lg Ext_Get_Lg
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 75 Celsius
Critical Comp. Temp. Threshold: 80 Celsius

Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 4.00W - - 0 0 0 0 0 0
1 + 3.00W - - 1 1 1 1 0 0
2 + 2.20W - - 2 2 2 2 0 0
3 - 0.0300W - - 3 3 3 3 2000 5000
4 - 0.0040W - - 4 4 4 4 5000 9000

Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 41 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 2,480,832 [1.27 TB]
Data Units Written: 2,831,808 [1.44 TB]
Host Read Commands: 18,415,541
Host Write Commands: 18,594,321
Controller Busy Time: 507
Power Cycles: 400
Power On Hours: 1,800
Unsafe Shutdowns: 152
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
0 Short Completed without error 1197 - - - - -
1 Extended Completed without error 890 - - - - -
2 Short Completed without error 576 - - - - -
3 Extended Completed without error 184 - - - - -
4 Short Completed without error 120 - - - - -

root@truenas-01[/mnt/truenas01-9tb/home/admin]#

Maybe its a cooling problem. Does it get very hot ?

I’ve had some issues with nvme on linux before that hang on very low power states & required a full power off (psu disconnected) reboot to resolve… maybe something similar?

Edit: nothing that I see of concern in the smart results

not really, in a supermicro server chassis, temps for that show around 48 to 50 c.

yeah- just weird they both stopped working.now they are back. I’ll keep an eye on this.

H20 is NOT an Optane drive: It is a 1 TB QLC drive alongside a small Optane drive, each exposed as a distinct x2 PCIe device. Some selected Intel chipsets were supposed to detect the contraption and automagically handle the x4 → x2x2 lane bifurcation and caching, but Intel has now pulled out support for this. In the absence of support, as seems to be the case here, you only see the “first” of the two half drives, which happens to be the QLC part here.

This is unsuitable as either SLOG or L2ARC.

Take the drives out your NAS, and try to find a use for them in a desktop.

3 Likes

Reminds me of this Unsuitable SSD/NVMe hardware for ZFS - WD BLACK SN770 and others · openzfs/zfs · Discussion #14793 · GitHub

1 Like

In addition to what @etorix wrote:

L2ARC device - probably doesnt see much use, so a regular consumer NVME probably will be fine
SLOG device - depending on the workload, can be hammered , but doesnt need to be big (16-32 GB). → Optane M10 32Gb ist still available for cheap. Unfortunatly all other suitable NVMe SSD (PLP, High endurance) got ridiculously expensive.

The 32 GB Optane part would actually be suitable for SLOG… if it were accessible. It is a M10 drive joined in the same PCB with a QLC NAND drive, each with its own PCIe lanes.
The 1 TB QLC is far from optimal for L2ARC duties, and would require quite some RAM to begin with.

It is questionable whether OP has a use case for either SLOG or L2ARC in what I assume to be a home NAS.

2 Likes

I would have expected the parts of the drive to be visible as their own separate name spaces, an example of that can be seen here:

Perhaps it may be worth trying a BIOS update?
Your issues may, or may not, have something to do with the usage of that PCIe card interfering with the motherboards ability to properly recognise what is connected.

Bingo. The Optane H20 is actually an interesting SLOG/L2ARC combination device provided that your PCIe bridge properly supports the required x2x2 bifurcation.

The problem is that the ASMedia chip on the expander is getting in the way of this.

But for the systems where it works, you get 32GB of 3DXPoint NAND and 1TB of QLC (which isn’t stellar, but you could undersize it, use it for apps, or just accept that it isn’t going to be super-fast)

The Optane H-series is an odd duck in that it literally shows up as two PCIe devices, not a single device with multiple namespaces. Hence the bifurcation requirements which are hard to satisfy on most motherboards, let alone PCIe add-in switch cards.

2 Likes