NVMe Identify Controller failed

6X Crucial P310 4TB SSDs, running ZFS1 for more than one month.

What does this error mean?

LONG S.M.A.R.T. Test Logs: nvme0n1
smartctl failed for disk nvme0n1:
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error

It’s always very helpful if you can share details about your setup ; whether TrueNAS is running bare-metal or under a hypervisor, and how the drives are connected (motherboard ports, PCIe HBA, USB enclosure, port multipliers, bifurcation setups, etc.).
This information can help to identify the root cause faster.

Regarding your SMART error:
Seems to mean that smartctl cannot send native NVMe admin commands to the drive.
This can happen when the NVMe is not directly exposed to the OS, such as:

the system is running inside a VM without full NVMe passthrough
the NVMe is behind a RAID/HBA that abstracts the device
the NVMe is inside a USB enclosure
a driver/compatibility issue prevents direct NVMe access

I think, this does not necessarily mean the drive is failing, but seems to indicate a limitation of the way the device is connected.

If you can share more specific your hardware / software and attachment method, people here maybe can give more accurate guidance.

EDIT: typical typo…
bad spellers of the World: UNTIE !

I am very new to Truenas. I have decades of computer and *NIX hands-on experience. This NAS project is new to me, including the hardware and software.

Yes, I know adequate information drives accuracy of response; but, I do not know how to gather all of the information that you deem necessary.

Here is what I can tell you:

Beelink ME mini 6-slot Home Storage NAS Mini PC
12GB LPDDR5-4800 memory (soldered) +64G eMMC+2TB Crucial SSD

All six (6) slots are filled with identical:
Crucial 2280 NVMe M.2 SSD P310 4TB

This has been running without any know errors for more than one month.

Now, I am confused about this degraded status. All TrueNAS screen show no errors - except for this one degraded status.

Whats the smart output for the nvme in question?

smartctl -a /dev/nvme0

How does it compare to a working nvme like lets say… smartctl -a /dev/nvme1

The Beelink is a very compact device, and the 6 NVMe slots are most likely connected through one or more PCIe switches/expanders. Compact designs like this can cause hardware-level problems that show up in TrueNAS as disk dropouts or SMART errors.

Six Crucial P310 SSDs can also generate a lot of heat, and thermal throttling or even brief thermal shutdowns can make a drive disappear for a moment.
Power-rail limits (3.3V spikes) are another possibility in such a small system.

According to your screenshots all drives are currently online.
like @Fleshmauler mentioned, can you access SMART data from the other NVMe devices via smartctl?

To see what ZFS thinks happened, run:

zpool status -v Pool_6SSDs_x_4TB

AFAIK TrueNAS does not clear the degraded state automatically, so after verifying everything looks healthy, you can try to reset the error counters with:

zpool clear Pool_6SSDs_x_4TB

What’s odd is that the device ran fine for about a month. That usually points to an intermittent thermal or power issue that appears only under certain workloads or temperatures.
Could happen again…

EDIT:
Sorry… I looked at the screenshot again and I can see that nvme0n1 is marked as “removed.”

The first thing you should try is simply bringing the drive back online, ideally through the GUI

If the system still detects the NVMe device, this should reattach it to the pool and start a resilver. If the “Online” option does not appear, the disk may not be detected at the hardware level, which would point to a different issue…(maybe with the disk itself)

1 Like

Have you performed any research? Here is why:

This “may” be part of the problem. I understand the company was replacing those units that had power issues. This could be your issue, but I do not have one of these devices so I can’t really say.

3 Likes