I’ve got a Mellanox Connect-X4 in my system. It’s working great.
I vaguely recall an earlier discussion here about there being a tool included with TrueNAS to check the status of the Mellanox NIC or update its firmware, instead of having to download the tool from NVIDIA’s website.
But search is proving unavailing.
Is this an actual thing, or did I just hallucinate it? I feel like this was a conversation with either @dan or @etorix .
I don’t remember discussing it here (or on the old forum), but I do have CX2 cards in two of my Proxmox nodes (they were the only 10G cards I could find that fit in the mezzanine slots in my Dell C6220), and had some questions about them here:
From what I see there, it looks like either mstflint or mft would be the tools, but neither of those appears to be included in 25.04.1.
I might make a feature request for this. There’s an updated firmware out for my Mellanox Conect-X4, but I have no way of installing it withing TrueNAS CE, since I’d need to be able to download a .deb and run a dpkg command to get the utility from NVIDIA.
A few minutes of downtime won’t kill me, so I can boot into a Debian ISO and do it there, but that’s not an ideal solution for a lot of deployments.
Not that I’m recommending this or have ever done it ()…
But you can decompress a .deb file and place the resulting binary on the filesystem. Not on the boot pool.
And really probably don’t do that if you’re running TrueNAS in HA.
And of course the kernel in TrueNAS is compiled for TrueNAS with some tweaks.
But, it works for lsusb
Edit to add explicitly — I wouldn’t recommend anyone reading this do this unless you really know what you’re doing. I happen to recognize @SinisterPisces from another ZFS related forum and commented based on that. Usual caveats apply — one does this at one’s own risk.
Not practically, no–assuming the NIC is being used by the TrueNAS host itself. If the NIC belongs to a VM or container via passthrough, that’s different. Because then the VM or container could download and run the necessary software to flash it.
But if it’s a NIC that belongs to TrueNAS itself and is in active use, then no.
For instance, on my system, my 10 GbE NIC provides 2x10 GbE (LACP) access to my file server on my storage VLAN. I would have to break that configuration to pass it into a VM or container just to update the firmware, and then repair that configuration to get normal function back.
That’s overcomplicated, will lead to more downtime, and is error-prone.
It’s faster just to boot off a Debian or Mint ISO and just run the necessary installer and then reboot TrueNAS. TrueNAS will continue on like nothing happened.
But that’s still not an easy or straightforward solution. Most people don’t have a monitor, keyboard, and mouse hooked up to their TrueNAS server, which Debian would need.
I’m a single person in a home office with a bunch of home-based self-hosted projects. I think I’ll live.
But yes, for some users (many users?) needing to reboot the system at all is a problem. Especially when the firmware flasher is designed to flash and reset the card without rebooting.
We’ve had feature requests to add other common administration CLI tools to TrueNAS. I think I’m going to open one for this. Before I do, though, I wonder if @HoneyBadger might have an opinion on whether that’s worthwhile.
They provide a .deb for a current and LTS version of MFT on that page.
I’m not sure which one would be a better fit for TrueNAS.
I’m not aware of an actual repo that can be added to apt, and it doesn’t appear to be in the standard Debian repos for Debian 12 or 13.
(They also provide RPMs, in case you’d like to use a Red Hat-based machine.)
Installing the .deb was as simple as running dpkg -i on my Proxmox node. The firmware flasher takes the card offline, flashes the firmware, and resets the card’s PCIe connection to bring it back up, so there’s no reboot required.
It is slow when it’s running though, so don’t be alarmed if it takes a while. Otherwise, it’s a really svelte tool. I actually enjoyed using it.
EDIT: Re-reading your reply, @HoneyBadger , makes me think I’ve missed some way to add an official NVIDIA repo to my Debian-based systems that might benefit from it. Do you have a link to somewhere I could learn more about that? I just failed at Google.
I might not be saying it right, as far as “an NVIDIA repo” - it’s mostly “is it in the official Debian repositories” which it sounds like the answer is “no” so I’d ask the Engineering team to pull it in from outside for use. If there’s a .dpkg that’s a good start though.
I’d say put in the Feature Request and I’ll ping the internal Engineering team about this as well.
Interesting. That would address potential licensing/redistribution issues.
Pretty much every tool will run in a container if you pass through /dev - I admit it’s not as smooth as a GUI/integrated/automated check, but if you need it “now” …
So on this front, the official NVIDIA/MFT license throws a wrench.
You may distribute the Software and accompanying documentation solely as integrated with or installed on Your products that incorporate the NVIDIA Products.
TrueNAS The Software does not guarantee the presence of NVIDIA hardware in the system it’s installed on - heck, even our appliances aren’t uniform there, we don’t use Mellanox across the board, so we couldn’t distribute the binaries without violating the license here, unfortunately.
As per my usual disclaimer, I’m not Engineering so this is not a definitive answer - but my personal Magic 8-Ball is pointing to the idea that CE users are best served to stand up a privileged container and pass /dev to it if you want to update the firmware on a live system - and Enterprise users should of course contact our support team (or expect proactive contact when we identify the need to push a new firmware) and shouldn’t self-update component firmware.
(Aside: TrueNAS the Software sounds like possibly the greatest 1987 alternative electro-grunge band to ever have not existed.)
Thanks for explaining how the NVIDIA license fits into all this; that’s really helpful. (Also, I apologize for making you read an NVIDIA software license on a Saturday.)
but my personal Magic 8-Ball is pointing to the idea that CE users are best served to stand up a privileged container and pass /dev to it if you want to update the firmware on a live system
Thanks for mentioning this again. @madmalkav mentioned doing it upthread and I decided it wouldn’t work, but now I’m wondering if maybe I misunderstand how containers lock (or don’t lock) physical hardware on the host. Sorry for my confusion, @madmalkav --sounds like you had the right of it.
Over on Proxmox, for example, I know that if I passed a VM a real PCIe hardware device, it’s effectively gone from the host system as far as interacting with it. (SR-IOV gets weird, but that’s a different thing.) I’ve never tried hardware passthrough with an LXC in Proxmox.
I was (I think wrongfully) assuming that it would work the same way with privileged LXCs: that the host would complain about not wanting to share if a privileged container started messing with hardware the host has full control of (like a NIC).
Or, is it the case that since the container in privileged mode is sharing the host’s kernel and has full (privileged) access to the host system’s PCIe bus, the host system doesn’t really complain because the privileged container isn’t really taking the device away from the host so much as pretending to be the host as far as the device is concerned?
All that to say, does this mean the privileged container can interact with the NIC without TrueNAS itself freaking out about something taking one of its network cards away … because it doesn’t really leave the host’s control?
(I’m sure that’s not technically completely accurate…)
This is closer to how it’ll work - it’s sharing the ability to modify the device node(s) under /dev - now, if the firmware update you’re pushing to a /dev/something device causes it to restart/reinitialize as part of it, that could potentially cause problems if there’s active use.
Depends on how the driver itself handles it - if it can consciously say “hold up, the card’s about to have a few milliseconds where it goes catatonic” then it may be able to do something like a PCIe power state transition, with the upper levels of the driver not noticing anything other than a brief pause/buffer.
If the hardware or driver is less elegant about it, and/or it tears down the enpXsY interface or /dev node - then it might have upstream effects with things like SMB/NFS/etc that would say “hey, that interface I was bound to just kinda disappeared.”