SMB / NFS Remote Subnet Oddity / Possible bug?

Hey all, have a scale 24.10.0.2 box on decent hardware (56 spindles, Xeon 6138, lsi jbod, 192GB ram) and am having an issue with smb. I have isolated it down to when there is activity from multiple connections to the same server, simple operations are severely delayed / do not work until the first operation is complete.
The client is a linux box.
For this test for example, I have a mount connected via cifs and two putty sessions, on one session if i execute a dd to the mount point, a simple ls -lh will hang on the other putty session until the dd is completed or after a good bit, if I do an strace on a directory with a lot more files I’ll get a resource temporarily unavailable.

Any hints?

apologies, I can’t attach pictures yet. Edit, and it seems I can’t post links either.

So it seems that it also get’s stuck on NFS as well, so that’s curious. So I guess it’s not just isolated to SMB

Browse some other topics and do the tutorial to bring up your trust level. You should be able to post more after that.

TrueNAS-Bot
Type this in a new reply and send to bring up the tutorial, if you haven’t done it already.

@TrueNAS-Bot start tutorial

Probably need more details on your problem too.

thank you, i’ve started that and learned a few things. I’ll post more up tomorrow. right now it seems to be a problem with oracle linux / fedora as I can’t get ubuntu to replicate this behavior. but I am still not sure if it’s os related or truenas related.

1 Like

@TrueNAS-Bot start tutorial

1 Like

We might need more information on your storage controller, as well as possibly your dataset and pool configuration. Strange that Ubuntu seems immune to it though.

Specifically - are you using deduplication?

No dedup, 4x raidz2 with 13 a vdev and 4x spares, and I think I’ve isolated it down to probably a networking thing. the same subnet as the truenas box seems to work okay, but on a remote subnet cifs / nfs seems to lock up the os or cause a huge pause.I do not think it’s a hardware thing.


take a look here at a 7 min freeze from writing a file, at that point the os was unresponsive if you tried to watch ls -lh the mounted directory. I originally thought it was hanging on getting the acl, however this seems to indicate something else is going on.

Subnet specific makes me thing there’s some manner of IDS/IPS in the middle trying to inspect the entire file maybe, but perhaps that’s residual trauma from working in the enterprise space for too long. :smiley: NFS will also default to synchronous write behavior but that size of file (1G) should be able to be handled well enough if it’s being written all as a single chunk.

Does the TrueNAS system still respond as expected to local console input or SSH directly to it?

13wZ2 is a bit wider than most recommendations, but with large files you should be able to leverage that width.

1 Like

Same… Same…haha, this is the enterprise space. they are all large back up files. I am leaning down that path as well and am actually opening a ticket with the firewall vendor as we speak.

That’s a good question and required some quick testing.

You’re on the right track I think.
While locking up a remote subnet client, the local subnet client was able to perform multiple dd jobs and watch the directory at the same time.
Further pointing to a firewall issue.

Not sure if you’re permitted to share the make/model of your firewall, but if it’s a Palo Alto it’s probably doing file inspection, and when you write that whole file the PAN box is just saying “no further traffic until I look at this thing” and it’s having to scan through a gigabyte.

Check the “outside-network” client to see if the delay/stall is proportional to file size, test with a 256M and 512M for example. :slight_smile:

1 Like

it’s much much worse, fortinet, at least it’s a 1000F. with internal interfaces on a lagg, supposedly seems like things are allowed, but we shall see.

Will report back after the ticket is gone through.
Thank you!

2 Likes

sadly, fortinet did not see any traffic being blocked, however it was not experiencing slowness during this time. BUT we did learn how to remove traffic from the asic so that we can troubleshoot correctly.

Time was off by 1.5ish mins and I corrected that. I shall update in two weeks when I’m back if nothing else is discovered to be hugely wrong.

SMB linux DD is writing at 320MB/s, so speed is there and nice and happy with iperf3 running at around 15gb/s