I have a NAS that has USB/TB4 ports.
I have nvidia GPU, BMC 10gb network card and Mellanox ConnectX-4 LX all connected by TB4/USB4.
None of these devices work.
All these devices work fine on the right mix of linux kernels. For example the BMC and NVIDIA work fine this way on ZimaOS.
the common root to all is “Unable to change power state from D3cold to D0, device inaccessible” I have had this issue on ubuntu 24.04 and it was resolved with kernel
this was solved by upgrading to kernel 6.8.4-060804 Ubuntu 24.04 - Unable to change power state from D3cold to D0, device inaccessible - Graphics / Linux / Linux - NVIDIA Developer Forums
With the rise in USB-40 capable hardware (i.e. USB4 / TB4) PCIE tunneling enabled by this technology is interesting in multiple NAS scenarios in the coming year. I don’t think it is urgent to fix this i would advocate it for the first 2025 release as the first round of USB4/TB4 motherboards supporting software connection manager will be released in Sept / Oct 2024.
for the NVidia 2080ti dmesg only shows this (device 25 is the NVidia)
[ 471.106763] pci 0000:25:00.2: Unable to change power state from D3cold to D0, device inaccessible
[ 583.781488] pci 0000:25:00.2: Unable to change power state from D3cold to D0, device inaccessible
for the mellanox dmesg shows
oot@truenas[~]# dmesg | grep mlx
[ 1.536529] mlx5_core 0000:39:00.0: firmware version: 14.32.1010
[ 1.536568] mlx5_core 0000:39:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.2 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link)
[ 3.546198] mlx5_core 0000:39:00.0: poll_health:819:(pid 0): Fatal error 1 detected
[ 3.546234] mlx5_core 0000:39:00.0: print_health_info:423:(pid 0): PCI slot is unavailable
[ 62.582296] mlx5_core 0000:39:00.0: wait_func:1172:(pid 257): INIT_HCA(0x102) timeout. Will cause a leak of a command resource
[ 62.582308] mlx5_core 0000:39:00.0: mlx5_function_open:1242:(pid 257): init hca failed
[ 62.597225] mlx5_core 0000:39:00.0: probe_one:1952:(pid 257): mlx5_init_one failed with error code -110
[ 62.597250] mlx5_core 0000:39:00.0: mlx5_fw_fatal_reporter_err_work:679:(pid 98): health works are not permitted at this stage
[ 62.598666] mlx5_core: probe of 0000:39:00.0 failed with error -110
[ 62.600153] mlx5_core 0000:39:00.1: Unable to change power state from D3cold to D0, device inaccessible
[ 62.600263] mlx5_core 0000:39:00.1: mlx5_pci_vsc_init:61:(pid 257): Failed to get valid vendor specific ID
[ 62.600271] mlx5_core 0000:39:00.1: firmware version: 65535.65535.65535
[ 62.600277] mlx5_core 0000:39:00.1: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s PCIe x4 link at 0000:00:07.2 (capable of 4032.000 Gb/s with 64.0 GT/s PCIe x63 link)
[ 82.602352] mlx5_core 0000:39:00.1: wait_fw_init:206:(pid 257): Waiting for FW initialization, timeout abort in 100s (0xffffffff)
[ 102.606349] mlx5_core 0000:39:00.1: wait_fw_init:206:(pid 257): Waiting for FW initialization, timeout abort in 79s (0xffffffff)
[ 122.610351] mlx5_core 0000:39:00.1: wait_fw_init:206:(pid 257): Waiting for FW initialization, timeout abort in 59s (0xffffffff)
[ 142.614353] mlx5_core 0000:39:00.1: wait_fw_init:206:(pid 257): Waiting for FW initialization, timeout abort in 39s (0xffffffff)
[ 162.618353] mlx5_core 0000:39:00.1: wait_fw_init:206:(pid 257): Waiting for FW initialization, timeout abort in 19s (0xffffffff)
[ 182.610350] mlx5_core 0000:39:00.1: mlx5_function_enable:1145:(pid 257): Firmware over 120000 MS in pre-initializing state, aborting
[ 182.610408] mlx5_core 0000:39:00.1: probe_one:1952:(pid 257): mlx5_init_one failed with error code -16
[ 182.614454] mlx5_core: probe of 0000:39:00.1 failed with error -16
root@truenas[~]# dmesg | grep bnx2x
[ 1.479752] bnx2x 0000:0f:00.0: msix capability found
[ 1.480196] bnx2x 0000:0f:00.0: part number 0-0-0-0
[ 11.510310] bnx2x: [bnx2x_fw_command:3054(eth%d)]FW failed to respond!
[ 11.510317] bnx2x 0000:0f:00.0 (unnamed net_device) (uninitialized): bc 7.13.75
[ 11.510322] bnx2x: [bnx2x_fw_dump_lvl:794(eth%d)]\x013MCP PC at 0xffffffff
[ 11.510324] bnx2x: [bnx2x_fw_dump_lvl:815(eth%d)]Trace buffer signature is missing.
[ 11.510326] bnx2x: [bnx2x_prev_unload:10893(eth%d)]MCP response failure, aborting
[ 11.510474] bnx2x 0000:0f:00.1: msix capability found
[ 11.510485] bnx2x 0000:0f:00.0: msix capability found
[ 11.510780] bnx2x 0000:0f:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 11.510789] bnx2x 0000:0f:00.1: Unable to change power state from D3cold to D0, device inaccessible
[ 11.510945] bnx2x: PCI device error, probably due to fan failure, aborting
[ 11.511035] bnx2x: PCI device error, probably due to fan failure, aborting
(funny as the card has no fan and works perfectly with ZimaOS - a no name startup NAS OS)