Second SFP+ never reports LINK UP

I have a question which is repeated often enough (on the forum and elsewhere), but somehow is never properly resolved or explained.

I have freshly instaled truenas scale on supermicro X11SPH-NCTPF motherboard. This board includes two 10gbit spf+ ports (+ IPMI).

Both SFP ports are connected by using identical gtek optical pigtail cables to the mikrotik 10gbit switch (each to correct port coresponding to the LAN subnet I want them connected).

My mikrotik reports that both links are sucessfully connected, but somehow only port eno1 reports LINK STATE UP. Ping from network does not work towards eno2, local ping from shell to eno2 IP address responds correctly.

Both ports have static addresses, I have tried to replace pigtail with DAC cable and even used transciever to connect it with rj45. I have exchanged pigtail with one another and still only eno1 link went up. There is no link aggregation, no bridges. Physical connections are 100% ok, lights on eth ports are lit, some kind of connection is there.

this is ip link view:

At some point, it worked for a while, but stopped again after truenas reboot.

I see this as lspci response (a little confused since this mb should have Inphi CS4227 and not X722:

This is ethtool for eno1

image

This is ethtool for eno2

image

Any ideas on why this happens and how to resolve this issue (what to try, how to fix)?

Thanks!

p.s. update 2024-08-29 I have checked within mb bios and it shows both nics as operational and connected, so it works correctly; my best guess is that for some reason or the other truenas scale does not recognize SFP+ in second nic (although I have tried three different ways to connect to the switch and additionaly tried fo use sfp used in eno1 in eno2 without success)

Couple of things.

I have found auto-negotiation to be less than stellar for Mikrotik. Set it to 10GbE fixed on both ends. Add flow control, custom MTU if you like.

I don’t see the IP addresses for these units so I hope you have them on completely different subnets. Otherwise, it will not work.

Some Mikrotiks allow link aggregation, IIRC, but I would stay away from that unless you can justify the loss of additional hair of your head to implement it. No doubt, the Mikrotik gods here can figure this out in a heartbeat but mere mortals like myself earned nothing but misery for something as simple as a failover LAGG.

1 Like

I don’t have that experience with mikrotik, but again, exactly the same setup (same nic, same sfp pigtail, same mikrotik switch) works on eno1 without a problem.

Yes, of course they are on different subnets. Again, my problem is that truenas, although connected reports that my second nic eno2 is not connected at all (which is wrong). Also, at some point eno2 worked but stopped after reboot.

p.s. I am not interested in link aggrefation because it would not be benefitial to my purpose, I am interested in MPIO iSCSI connection for which I need two nics operating

Apologies, recovering from C19 vaccine here. I’m at the end of my rope.

Perhaps @pmh can help further?

Additional update 2024-08-31:

After lengthy testing, googling/researching, it appears that my problem is actually related to specifics of Intel X722 eth NIC inside of the supermicro motherboard and nor truenas itself (I managed to easily reproduce same behaviour on plain debian install).
Research shows that many people have same or similar problems which (might) be connected to particular firmware of the NIC card.

In my case, it is particulary confusing because X722 is dual port card and apparently only first port works correctly with SFP+ and DAC cable I tried to use (to be even more confusing, I managed to get 2nd port of the NIC to work at some point, and the NIC itself correctly reports SFP+ inserted and my network equipment tells me that both ends are connected but on software level it simply does not work).

There are newer firmware options avaliable to my x722/mb combo and I am wating to try that, but I have also purchased and waiting for additional (and different, and specificaly intel compatible) SFP+ devices (I also ordered additonal XXV710 dual card in the case all of the above fails for some reason).

I will keep this updated as soon as I narrow down to exact reason on why it did not work.

2 Likes

Was going to suggest it may be faulty a NIC…

TrueNAS Scale is based on Debian, so it still could be a Debian issue.

Would be worth trying a distro with a bleeding edge kernel if you haven’t already

Tried with windows, multiple drivers. Tested in the meantime with additional combination of DAC/pigtails and switches. Upgraded mb firmware/nic firmware…

Based on suggestions from supermicro, started RMA for the mb on the storage.

In the end, investigation shoved that supermicro motherboard was faulty (in a really weird way).

Replacement mb resolved the issue.

4 Likes