No network: Chelsio NIC fails to initialize since upgrade to SCALE

System:

  • Supermicro H13SSL-NT with AMD EPYC 9124 (16-core 3GHz)

  • 256GB DDR5 ECC JEDEC RAM

  • Chelsio T6225-CR 2x100Gb NIC

  • LR-Link LRNV9F24 PCIe4.0x16 Retimer adapter

  • Boot: Seagate 512GB M.2 FireCuda 530

  • 2 x Broadcom LSI 9500-16i HBAs (fw ver 37.00.00.00)

  • AIC BBPHHD40013B backplane (LSI 35X28)

  • 60 x Seagate Exos X20, 20TB HDDs (6×10raidz2)

  • 2 x Micron 7450 U.3 3TB SSDs for metadata (special vdev, mirror) (latest fw)

  • 1 x 16TB Solidigm D5-P5336 PCIe SSD (it’s a scratch disk for users)

  • 4 x AcBel R1CA2801A PSU

Behavior:

Since upgrading from CORE to SCALE [25.10.1], the T62100 NIC stopped working so the NAS is mostly unusable (onboard NICs notwithstanding).

  • lspci sees it normally

  • ifconfig or ip link don’t see it at all

  • dmesg and lspci -vv reports keep giving these errors:

cxgb4 0000:01:00.0: can’t ioremap BAR 0: [??? 0x00000000 flags 0x0]

cxgb4 0000:01:00.0: cannot map device registers

So the hardware is seen, but the driver fails to initialize.

Tried:

  1. Moving it to a different slot.
  2. These BIOS settings:
    • Above 4G Decoding → Enabled

    • Re-Size BAR → Disabled

    • SR-IOVBME DMA mitigation → Disabled

    • IOMMU → reverts to Auto (due to dependency on APIC?)

    • PCIe AER → tried disabled & enabled

No change.

  1. Added kernel parameters pci=realloc=on , and pci=nocrs .

After that, I was able to get

pci [..] BAR 0 [mem … 64bit]: assigned

BAR 2, BAR 4, etc. but now I’m stuck with

cxgb4: probe with driver cxgb4 failed with error -5

And I’m out of ideas that don’t involve replacing the NIC. This one should work.. I picked Chelsio because it is supported and recommended by iX.

Should I open a bug at this point? I can provide lspci and dmesg.

Still down…
I booted to a live image of Ubuntu (26.04), and it picked up the NIC right away. So I know the NIC hardware is fine.

Updated to 25.10.3 now that it’s in the general channel — no change. It uses the same kernel version so I didn’t expect much, but had to try.

So this is something between the driver, Linux kernel, and my particular hardware setup. I had a Broadcom before TrueNAS, and exchanged it for Chelsio specifically for it :frowning: But I’m beyond my depth on anything I can do to make this NIC work.

You have a Chelsio T6225-CR 2x100Gb NIC under system but mention T62100 NIC under Behavior. Do you have two models or just one?
You can try using Report a Bug along with submitting a Debug dump. If you do, post a link or the ticket number.
I didn’t see anything helpful when I searched the forums for those two Chelsio model numbers. Are the Chelsios tied to branded tranceivers like some Intel fiber NICs? I had to bypass unsupported SFP+ on TrueNAS Core and Scale before I got ‘Intel branded’ ones.

I don’t think the included drivers have changed for the Chelsio models you listed so it should work. Did it imediataly start acting up when you moved from Core to Scale and on all the versions you upgraded through?

Posting the results from sudo dmesg | grep cxgb4 may help others to see normal and abnormal messages.

1 Like

Apologies, the 6225 was a mistake. It’s a T62100-LP-CR .

Chelsio support found a solution / workaround: pci=realloc=off in the kernel boot flags. I knew of realloc=on so I assumed off was the default and it wasn’t on my radar to try.

As soon as I added that specific flag, the driver was able to initialize properly and everything fell into place.

Now off to performance tuning and troubleshooting… :slight_smile:

2 Likes