Hi all,
I am tuning a Chelsio T540-CR (4 x 10G) on SCALE 25.10, and hitting a problem related to card configuration.
As best I can see, the cxgb4 driver defaults to basing queue count on logical CPU count. I have a 20 core CPU (40 logical cores, E5 2698 v4) to prevent CPU starvation which affected my 8 core CPU. The problem seems to be that the driver is set to initialise one RX+TX queue per logical CPU per port, or 40 RX/TX queues for every port (160 RX + 160 TX total), which immediately exhausts the ASIC’s onboard SRAM/SGE contexts, and I can’t seem to find a way around errors when I run the commands that should mitigate this.
I’m also hitting issues controlling TX coalescing and ring buffer sizes as a secondary mitigation.
Symptoms:
-
Ring buffers show RX: 64 / TX: 1024, and I dont seem able to change them. Attempts to increase RX ring buffers via
ethtool -G(even to 128 or 256) returnnetlink error: Device or resource busyeven with the interface down. -
Attempts to reduce the number of queues to prevent card RAM overuse via
ethtool -LreturnsOperation not supported, preventing me from reducing the 40 queues to a more sensible 8 or 16 to free up descriptor memory. -
Attempts to reduce IRQ floods by increasing tx coalescing intervals via
tx-usecortx-framesfail, and seem to be rejected by either firmware or driver. (rx-usec coalescing succeeds). Current parameters are:rx-usecs: 100 rx-frames: 8 rx-usecs-irq: n/a rx-frames-irq: n/a tx-usecs: 0 tx-frames: n/a tx-usecs-irq: 0 tx-frames-irq: n/aI dont seem to be able to modify the tx values.
Impact:
High packet drops (rx_nodesc_drop) and rx_runt_frames because a 64-packet RX buffer is insufficient at this speed. (I gather runt frames can be a flag for descriptor exhaustion on Chelsio)
Verified via ethtool -S that write_coal_fail is high.
Current Tuning Applied:
MTU 9000
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 87380 67108864
net.core.netdev_max_backlog = 250000
T540-CR firmware versions: 6.12.33-production+truenas, firmware-version: 1.27.5.0, TP 0.1.4.9, expansion-rom-version: 1.0.0.90
Questions:
-
Queue count: Is there a persistent way in SCALE to reduce the number of queues to be less than the number of logical CPUs, and force a lower queue count at boot?
-
Ring buffer sizes: Is there a known way to break the
Device or resource busylock on these descriptors and change the RX (and ideally also TX) ring buffer sizes from 64/1024? -
TX coalescing parameters: Is there a way to amend TX coalescing parameters, either by usec or frame count, in order to reduce IRQ and context switch burdens on the CPU?
cxgbtool:
This Chelsio tool might shed light on useful diagnostic info but isnt included in 25.10. It’s the only way I know to see the SGE table directly (among other diagnostic data) and confirm SGE context allocation and identify the exact issues for Chelsio on TrueNAS. Is there a way to forcibly run it on SCALE anyway today, or do I need to put in a feature request/suggestion for future?