TrueNAS Core 13 File Transfer Starts Fast then Slows to 0MB/s

Hello!

I’ve got a new TrueNAS server (CORE 13) and I am moving files from one pool to another. I am doing this from a Windows 10 client over SMB shares. Currently I’m doing a rather large transfer of ~3TB. The files are mostly ISO files so the individual files are generally pretty large 5GB-100GB.

I’m seeing speeds from 250MB/s to 0bytes. With the speed most commonly sitting around 100MB/s.

Every 5-20min the speed drops from “normal” speed to 0bytes and then starts to slowly speed back up after around 1 min to around 250MB/s before settling in back to around 100MB/s and the cycle repeats.

Screen Recording of this: https://youtu.be/a6WAygnYzLs

This doesn’t seem to have anything to do with the size of files as I’ve seen cycle play out on a single larger file as well as many smaller files.

I’m pretty new to TrueNAS and ZFS in general but this doesn’t seem like it could be normal.

Hardware:
CPU: E5-2650L v3
MB: Asrock X99 Taichi
RAM: 128GB Samsung 2Rx4 PC4-2133P-RA0-10-D0 ECC REG
HBA: LSI 9300 16i
NICs: Intel i340-T4, Mellenox ConnectX-3 Dual Port 10GbE
Chassis: Rosewill RSV-L4412U

Software/Setup:

-Proxmox Host
-TrueNAS Core 13 is VM (guest):

  • 8 cores
  • 64GB ram
  • 64GB boot drive
  • LSI 9300 16i Passed through
    All TrueNas Drives are on the HBA except for boot drive.

TrueNAS Drives:
RAIDZ2: 6 x 8TB WD80EMAZ
Mirror: 2 x 14TB WD Ultrastar DC HC530
Stripe: 1 x 14TB WD Ultrastar DC HC530

The current transfer I mentioned above is from the single 14TB to the Mirrored 14TB drives, but I’ve witnessed similar behavior on the RAIDZ2 array.

One thing I tried was changing “zfs_dirty_data_max” (default: 4294967296) but that still yields similar results.

Edit: Forgot to mention the Windows 10 Client doing the transfer is a VM on the same machine. It has a 10Gb VirtIO NIC assigned and running iperf to the Truenas server shows 2-3Gbps speeds so I believe the network speed is fine.

Any ideas?

Thank you!

Have you got specific cooling fan pointed at the HBA?

16i is probably big and hot, and the LSI HBAs are notorious for overheating if not installed in a server style case with cross flow ventilation

Overheating, stalling and then recovering would explain the symptoms.

1 Like

Thanks! Yes I have a 120MM fan positioned right behind the card blowing over it. I had read that this card runs very hot. When I initially got it I flashed the firmware. I had thought, “No need to worry about cooling yet, just flashing firmware.” When I was done flashing I went to move the card from my work station to the server. WOW! I nearly burnt my hand.

So, upon installing in the server I positioned a 120mm fan right behind it and tested temps using one of those generic laser gun thermometers. Temps on the heatsink and the opposite side of the board were around 35 C and the heatsink was just mildly warm to the touch.

Of course I was only doing smaller/shorter transfers during this test phase so I suppose it could be overheating, but I sincerely doubt that. There are 3 120mm fans at the front of the case and 1 right behind the card. I can check temps again though.

I’m using the Rosewill RSV-L4412U chassis on this.

This may be a networking issue. How are the clients physically connected to the TrueNAS? How is the networking setup in Proxmox?

I re read your post. One question I have particularly regarding the Proxmox… it may be possible the traffic is egressing to the switch and back in, depending on how you have it setup

I have been suspicious that networking may be the cause. The networking is a bit unusual so please bare with me on this.

One of the VMs in Proxmox is pfSense. That is my only router. Every VM (including TrueNAS and pfSense) is using the vmbr0 bridge for their NIC. The vmbr0 bridge is one of the LAN interfaces in the pfSense VM.

The Win10 client doing the file transfer is one of the VMs, so it is also using the vmbr0 bridge for its NIC.

Hypothetically this should mean that all networking is internal for this file transfer and that’s what appears to be going on. The only thing I can see as suspect is that the traffic still ends up passing through pfSense and I wonder if that may be causing an issue.

However, pfSense resources are barely being used at under 10% CPU usage and plenty of ram available.

“I re read your post. One question I have particularly regarding the Proxmox… it may be possible the traffic is egressing to the switch and back in, depending on how you have it setup”

Didn’t see that last part before I posted. So the traffic isn’t leaving this one physical machine (or shouldn’t be).

When I view network traffic on Truenas and Win10 (client) it is very minimal 10-30 Kbps. Even while the file transfer is going fast at ~250Mbps. So this would seem to indicate the networking is all internal although this is all new territory for me and I’m just speculating really.

Nice! I have a similar setup, though I’m doing everything on TrueNAS SCALE instead of in ProxMox. I have a guest pfSense VM under it.

So your network looks something like this?

On Proxmox, can you share the configuration of vmbr0
It’ll look something like the example on that page:

iface vmbr0 inet manual
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

Network Configuration - Proxmox VE

On the pFsense side…Can you go to this page and filter like this while a file transfer is going? Screenshot the “States” below. You can redact whatever as long as I can read it. 445 is the port SMB works on. If we can see a state between the two, the firewall is actually aware of the stream which is not what you want to see.

Also the ARP table, we want to make sure both clients, the TrueNAS and WIndows, show up under LAN or whatever the virtual switch network is:

The point of all of this is to verify that the TrueNAS and the Client are properly talking inside of the same L2 network, we don’t want to see indications they are talking through the Firewall.

Awesome! Thank you. Yes my network is basically as your diagram describes.

For vmbr0 it has two physical ports assigned to it but neither one has a cable connected to it and I’m planning to move one to a different bridge. In pfSense I setup a switch which currently includes 3 interfaces which are vmbr0, vmbr1 (2ports on intel i340-t4), and vmbr3 (mellenox 10Gb dual port card). The 1 port on vmbr1 goes to a physical 24 port switch and the other port on vmbr1 goes to a physical 8 port POE switch. vmbr3 (mellenox 10Gb) connects to my work station.

Ok here is what I have on those screens with transfer running. I’m including both a screen of the “LAN” and “MYSWITCH” states because I’m not sure which one was relevant.




You are already switching on the Proxmox side with vmbr0 and vmbr1. Those are bridges.

You’re saying in pfSense you configured them as bridges again on this page?

You should probably just “pass thru” the 4 port NIC to pfSense. There’s no need for proxmox to have them. You can make a virtual NIC on proxmox with no uplink for “internal” communication. In your current setup, you’ve created a “virtual” network loop…which is bad.

Switching loop - Wikipedia

Yes I think that’s right. When you break it down like that, what I did doesn’t seem to make much sense. This is what I have in pfSense interfaces and bridges.


Ok, thank you so much! I will go about correcting these things and report my findings.

For more insight as to the negative result I have, here is a screen recording from the start of one slow down to the end of a second: https://youtu.be/a6WAygnYzLs

Ok so I must be missing something. I don’t understand how I can get VMs to talk to the rest of the network if I don’t make them part of the pfSense bridge.

To test that I understand this I created a new proxmox bridge vmbr5. I removed all network devices from a win 10 VM and added vmbr5 as the network device. I then added vmbr5 to pfsense as well.

In pfSense under Interfaces->Assignments I added and enabled vmbr5 but did not assign it to the “MYSWITCH” bridge.

This win10 VM can’t talk to anything on the network now even if I manually assign an IP.

What am I missing here? Thank you once again!

Lets walk backwards here a bit, as you have a very specific thing you are trying to do.

On ProxMox you created network bridges, which are effectively SDN/virtual switches.

vmbr0 has ports enp11s0 and enp8s0f3 which means that these two network interfaces are in the same “network”, it’s like a switch of two ports.

same thing with vmbr1 which has ports enp8s0f0 and enp8s0f1, a switch with two ports.

vmbr2 is a switch with only enp0s25

Then in pfSense you made another switch. it takes vmbr0, vmbr1 and vmbr3 and combines them together, but because they are already switches in Proxmox, you’re seeing weird traffic behaviors as a result of the loop.

If this was me, I would remove the configuration for all of the ports in the i350 in Proxmox, and pass that whole card to the pfSense VM just like you did for the HBA card in TrueNAS. These can be bridged in pfSense like you have now, but with ONLY those 4 interfaces. pfSense really isn’t a switch, its a Firewall, so this is fiddly, but it’s what you’ve planned.

For the Mellanox card, you can use one port for Proxmox VM’s to communicate over the network, and the other to uplink your desktop. These could be your “new” vmbr0 and vmbr1

If you want to connect this to the network so that these devices can talk to the rest of the network, you can do that several different ways.

The easiest way would be to just create a virtual interface for each individual physical interface in Proxmox, identical to how your other VMs would get one, and then assign both to the same switch in pfSense above.

So your pfSense switch would look like:

graph LR;
    WAN("WAN: Intel i350 (eno1)")
    LAN("LAN: Intel i350 (eno2) --> Switch 8 port")
    LAN2("LAN2: Intel i350 (eno3) --> Switch 24 port")
    LAN3("LAN3: Intel i350 (eno4) --> Reserved")
    LAN4("LAN4: Mellanox (vnet0) --> VMs")
    LAN5("LAN5: Mellanox (vnet1) --> Desktop")

    WAN -->|Connects to| Internet
    LAN -->|Connects to| Local_Network
    LAN2 -->|Connects to| Local_Network
    LAN3 -->|Connects to| Local_Network
    LAN4 -->|Connects to| Local_Network
    LAN5 -->|Connects to| Local_Network

Thank you so much for taking the time to provide so much detail and help me with this.

I understand what you are saying the issue is and what the setup needs to be. I’m just struggling a bit with exactly how to do that.

I think the big thing I’m still not getting is how to connect the VM’s/Proxmox/vmbr0 to pfSense. It would seem there is no way to do this without connecting a physical cable or via the previous method I had setup.

When you say, “just create a virtual interface for each individual physical interface” how is that done exactly? Are you talking create a “Linux bridge” for each individual physical device? I see options for “Linux bridge”, “Linux bond”, “Linux VLAN”, “OVS bridge”, “OVS bond”, and “OVS IntPort”. My understanding is that I must assign it to one of these in order to then assign it to a VM. Is “Linux bridge” the one to go with? I thought I thought adding a linux bridge to the pfSense bridge/switch was the issue? Or was it just that having multiple ports inside the linux bridge the problem?

Today I removed all i340 ports from vmbr0 and remove vmbr0 from the pfsense “bridge/switch” and removed Mellanox ports from pfSense and put them in vmbr0.

With that done I now have “enp11s0” (onboard nic), “ens4” and “ens4d1” (Mellanox) all on vmbr0. This allowed my workstation (Mellanox) to talk to proxmox but I couldn’t talk to pfSense or get internet at my workstation.

Connecting a physical cable from vmbr0 “enp11s0” (onboard nic) to my 24 port switch then allowed me to talk to pfSense again and get internet as the i340 ports are still in proxmox vmbr1 and vmbr1 is in the pfSense bridge/switch.

Everything else is still the same but I plan to pass through the Intel i340-t4 to pfSense and make that a pfSense switch/bridge.

I like your idea of having 1 port of the Mellanox for workstation and one to the network but I currently don’t have a physical switch with any 10Gb SFP+ ports so for now just 1 port is going to be used.

I’m a VMWare guy historically, and I do all of my home lab stuff on SCALE, so I won’t be able to precisely show you what I mean in Proxmox. Unfortunately, beyond using it for testing and messing around, I’ve not spent alot of time using it.

You are correct. Now, in the diagram I just posted, you would be able to make vmbr0 and vmbr1 with bridges with SINGLE interfaces (each port from the Mellanox). These physical NICs would now have virtual networking tied to them and you can go about your business. This is okay (just not as fast as a physical interface), it’s just not okay if you have multiples here (like adding both interfaces to vmbr0), given the rest of the topology.

something like this?
image

This playlist on YouTube may have some information to help
(40) Level1 Mini Series: The Forbidden Router Trilogy - YouTube

Hey thanks again! This has been invaluable for me. It’s been difficult to find targeted answers on my somewhat unique goals and setup.

Now with vmbr0 out of pfSense I restarted a portion of the transfer I was doing last night. I only had the opportunity to watch it for about 10min but I didn’t see a single drop. With speeds hovering around 200Mbps the entire time. Now that’s more like it!

Will report back if it turns out this wasn’t the fix but it’s looking good now.

By the way, you mentioned that you are running Truenas scale with pfSense as a VM/Jail. How have you been liking that? I nearly went that route instead of Proxmox with Truenas VM. I am planning to setup a second machine with Trunas Scale and pfSense where pfSense will only be used when I have to take the Proxmox server offline for maintenance.

1 Like

I’ve been running pfsense as TrueNAS VM with two hardware Ethernet ports via PCIe pass thru for many years now.

Upgrading to scale (kvm) from
Core (Bhyve) solved all the issues I had previously.

It’s great. I use the pfsense for all my routing, vpns, firewalls, haproxy and acme certificate management.

Almost all SSL wrapping is done via the pfsense which I give 2 cores and 1GB.

Then I run all my apps via docker compose in a sandbox in scale.

Very performant. And the only port exposed to the net is the VPN access port.

(Okay. I do expose the pfsense ui via an 8xxx port with a v strong password, because I find it very handy)

2 Likes

Always happy to help! :slight_smile:

I wrote a guide on the old forums:
Resource - Getting Started with Virtualization on TrueNAS SCALE | TrueNAS Community

Resource - TrueNAS SCALE: A “Datacenter-in-a-box" | TrueNAS Community

Check it out when you have time

1 Like

Its funny you say that. I have been planning on doing a whole “TrueNAS as a Firewall?!” Resource for a while now, because for this use case it’s fantastic!

I love mine. pfSense has a nasty habbit of breaking during updates for a whole host of (somtimes) stupid reasons. Maybe less so in recent years…but anyway When it’s in a VM it’s much easier to recover LOL.

1 Like

Yeah

The one thing that sucks is taking down your router/firewall every time you want to tinker with your NAS hardware :wink:

A good reason to have separate devices… but I’m really enjoying the “hyperconverged life-style”. Ie one box to rule them all… and everything else is a laptop/mobile etc.

Yup LOL thats why I still have two xD, and really I should have three.