Truenas SSH Issue

First post on the new forum! Woohoo.

I have been facing this issue for quite a while now, long enough where I have no clue what may have caused it (years…).

I can not keep an active SSH client open with my truenas machine, but what is way stranger, is the web besed shell you can open from truenas’s webUI also seems to hang and stop responding.

When SSHed in, after 30 seconds to a few minutes, I get a SSH hang and a:

Connection to server.name.here closed.
client_loop: send disconnect: Broken pipe

When in the webUI Shell, the screen just hangs and I can’t make inputs any longer.

I admit, earlier in my truenas/homelab career I likely copy/pasted some stuff into CLI, but I have no idea what any of that was. I was trying to get plex permissions working back on FreeNAS jails (was this called beehive maybe? It was literally a deacde ago). My truenas “works fine” and seemingly has for years and years. SMB and NFS performance always seems fine, the system itself is stable and has migrated to truenas scale years ago, etc. Things seem to work fine (except my timemachine backups really don’t like working, not sure if this is related, really not sure…).

To try and fix this a few months ago, I fully restarted from ground 0. I fresh installed Scale and imported my ZFS Array. I didn’t copy my config, I went through the relative pain (although, it was good to do it all again since it had been a decade…) of resetting up everything. Vlans, network adapters, users, shares, etc. Obviously the data on the array persisted, including my home directory. But this is where my linux ignorance comes in - I know enough to be dangerous these days, but I don’t understand how such an issue can persist such a nuclear option. I am in full believe my early year copy/past into CLI to try and alter permissions and stuff for jails or some other “thing I thought was smart” could have caused issues, but is this something that can persists OS wipes like this? I was “dumb” and installed oh my ZSH on my user account, but this same issue happens with root, and root is not modified in any way that I know of.

I had thought this could be networking related, but I can’t figure out how or what that would be. I run a pfsense network with unifi switching hardware, and no over VM/physical host has any such issues. Literally no other VM or machine has any weird SSH issue, or weird hiccup like this, and again, its only SSH and Shell via the webUI… it seems very specific to something internal to truenas.

I am at a total loss on how to fix this or what to even try and do to narrow in on possible issues.

Any help would be greatly apprecaited.

ssh -vvv shows:

debug3: receive packet: type 98
debug1: client_input_channel_req: channel 0 rtype keepalive@openssh.com reply 1
debug3: send packet: type 100
debug3: receive packet: type 98
debug1: client_input_channel_req: channel 0 rtype keepalive@openssh.com reply 1
debug3: send packet: type 100
Connection to the.server.name closed by remote host.
Connection to the.server.name closed.
debug3: send packet: type 1
client_loop: send disconnect: Broken pipe

Is there a router or firewall between TrueNAS and the desktop with the SSH session?

I assure you it isn’t. If it was, there would be way more complaints than just from you. SSH is a very integral service that so many people use.

How are you running the TrueNAS? Is it a VM or a physical machine?

Whoops… hit delete on the post right after posting. Thats painful.

Yes, pfsense is my firewall, and my Macbook (machine I am SSHing in with) and truenas live on seperate vlans. To give more info here, Truenas is virtualized under proxmox. Proxmox physically is conneceted to the network with fiber, and proxmox, a few other VM’s and truenas all live on the same subnet. I don’t have issues like this, or any other issue that I can perceive, on any of my other VM’s on this sytem, or other physical hosts on my network. And since there are VM’s in the same subnet at truenas that don’t have issues, I am fairly sure its not a pfsense firewall issue, but, I am obviously open to the possability. That said, I very much do not think its a physical network issue.

To be more clear, I do not think this is an upstream SSH issue, as I agree, this would be way more publically reported. I also doubt its an inherent truenas issue, what I meant here was something internal to Truenas that I dorked up myself, not something broken that is shipped from iXsystems.

See first part of this reply, it is in a VM, but I got into more detail about why I dout that is the issue.

Also, within the webUI (which seems to be acting normally), I am able to run journalctl -u ssh -f and I see this during the same time period. I don’t actually see the connection closed for ligistx from my laptop which is at 10.70.5.11. I have proxmox SSHing in to check harddrive temps for a script I run to control fans which is why you see the proxmox ssh user info.

Aug 27 11:42:03 thoth sshd[134727]: Accepted publickey for ligistx from 10.70.5.11 port 60968 ssh2: key
Aug 27 11:42:03 thoth sshd[134727]: pam_unix(sshd:session): session opened for user ligistx(uid=1000) by (uid=0)
Aug 27 11:42:03 thoth sshd[134727]: pam_env(sshd:session): deprecated reading of user environment enabled
Aug 27 11:43:22 thoth sshd[134766]: Accepted publickey for proxmox_ssh from 10.90.5.50 port 49666 ssh2: key
Aug 27 11:43:22 thoth sshd[134766]: pam_unix(sshd:session): session opened for user proxmox_ssh(uid=1002) by (uid=0)
Aug 27 11:43:22 thoth sshd[134766]: pam_env(sshd:session): deprecated reading of user environment enabled
Aug 27 11:43:22 thoth sshd[134766]: pam_unix(sshd:session): session closed for user proxmox_ssh```

You might either have a case of asymmetric routing or a NAT or firewall state timeout. Try a packet trace on your firewall to observe what passes on the wire when the disconnection happens.

1 Like

Hmm, starting to think this could be it… seemingly no issue when I log in via Proxmox with the proxmox_ssh user.

So, I suppose this narrows it down to either a user issue, or a network issue since proxmox is truena’s host, same subnet, no need to traverse the firewall at all… Curious.

This is what I see from a packet trace against the truenas IP, from before I initiate the SSH session to after the broken pipe issue:

12:17:21.634140 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.637008 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.639594 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 21
12:17:21.656134 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.657127 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 1448
12:17:21.657134 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 120
12:17:21.659223 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.673548 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 1208
12:17:21.695845 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.700655 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 16
12:17:21.744036 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 44
12:17:21.746912 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.747911 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 68
12:17:21.761607 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.767348 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 276
12:17:21.773088 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.793005 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 524
12:17:21.804822 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.805338 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 112
12:17:21.853562 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.856357 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.888105 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 596
12:17:21.892779 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.953320 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:21.954617 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:51.985622 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 0
12:17:51.986617 IP 10.70.5.11.61147 > 10.90.5.100.22: tcp 36
12:19:31.529415 IP 10.90.5.100.57410 > 23.186.168.127.123: UDP, length 48
12:19:36.681172 ARP, Request who-has 10.90.5.1 tell 10.90.5.100, length 46
12:19:36.681197 ARP, Reply 10.90.5.1 is-at 00:26:55:e1:78:d4, length 28

I am not a network wizard, I know enough to do my homelab stuff, but this is clearly over my head. To me, I don’t see anything going wrong in the above packet trace, but maybe this is not verbose enough? I am not sure how to make it any more verbose either, if that is needed.

The replies from TrueNAS back to 10.70.5.11 take a different way or you would see them in the packet trace. Does TrueNAS have an interface in that subnet?

TrueNAS does in fact have an interface on 10.70.5.1. I have SSH bound to only my management interface though, which is the 10.90.5.100… I suppose this is a lack of my understanding, just because SSH is bound to that specific interface, I suppose that doesn’t mean it couldn’t reply back on a different interface?

I set it up this way to attempt to be more secure. Only allowing SSH and WebUI on my management interface, which my Macbook does have a rule in the firewall to allow y macbook to the management subnet. But, to reduce load on the firewall, I do have an interface within truenas on the 10.70 subnet which my macbook lives on so SMB can go over this interface without having to traverse the firewall at all… maybe this is “not smart” and is actually causing this issue? Realistically, only my phone and macbook are on the .70 subnet anyways, I could bind SSH to both the managmenet subnet and the .70 without creating any heartache.

Welp, it appears as though you have both taught me something, and genuinly helped me solve an issue that has been driving me nuts for a while…

I added the .70 network to the SSH binding option so it can be reached over both interfaces, and things appear to be working perfectly.

Just for my knowledge, is there a way to set it up how I imagined I wanted it? Or was that just a poor implimentation on my part?

It will reply back through the directly connected interface bypassing pfSense. That’s how routing works. Always along the shortest way if there is one.

You could use 10.70.5.1 interface to connect via SSH or NAT the connection on pfSense on the “90” interface so TrueNAS does not see the original IP address of the client.

The cause of all this is that you cannot separate e.g. file sharing and SSH in TrueNAS because it all runs on a single IP stack.

1 Like

I will likely leave it be… the .70 subnet is a trusted subnet anyways.

This has been very informative, it made me a little bit more inteligeant on networking. Like I said, I know enough to be dangerous, but still fairly noob.

Thanks again!