VM Network issues

Hi guys odd one here…

I have 2 TrueNAS Scale NAS up and running on the same network and same subnet. Each NAS has VMs running on it. I have a bridge setup on both so that the VMs can see their respective host

On the 1st NAS both VMs are working fine, they can ping the host NAS, internal/external network locations, each other and even VMs on the other host

On the 2nd NAS one of the VMs is as above yet the second VM cannot ping out past the gateway

Everything on the network is fine and reachable i can access the router GUI from the VM but no traffic will travel beyond the gateway. I have tried static IPs for the VM, i have deleted and re-added the NIC to the VM, i have made sure that its not just a DNS issue by entering googles IP in the browser

The worst thing is it was working, the VM was fine when i created it, i activated it, updated it, installed my apps, left it for 2 days and came back to it not working.

Any ideas?

I half tempted to just delete it and go again but i feel like that shouldn’t be the solution

Its windows 11 by the way and i get a general failure when ping external IPs or DNS entries

Cheers

Can you share your network configuration (bridge, interface etc.) for the not working machine, the NIC attached to the VM and can you maybe post the output of ipconfig from the VM?

If you want meaningful answers you need to provide information, we cannot see what you can see :wink:

This is the output from the machine thats not working, both a route printe and ipconfig /all

reads nearly the same as those that are working, interface name is different etc.

These VMs are not clones either they are fresh builds independent of one another

C:\Users\test>route print

Interface List
2…00 a0 98 02 56 ac …Intel(R) PRO/1000 MT Network Connection
1…Software Loopback Interface 1

IPv4 Route Table

Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 192.168.0.254 192.168.0.145 25
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
192.168.0.0 255.255.255.0 On-link 192.168.0.145 281
192.168.0.145 255.255.255.255 On-link 192.168.0.145 281
192.168.0.255 255.255.255.255 On-link 192.168.0.145 281
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 192.168.0.145 281
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 192.168.0.145 281

Persistent Routes:
None

IPv6 Route Table

Active Routes:
If Metric Network Destination Gateway
1 331 ::1/128 On-link
1 331 ff00::/8 On-link

Persistent Routes:
None

C:\Users\test>ipconfig /all

Windows IP Configuration

Host Name . . . . . . . . . . . . : DKP01
Primary Dns Suffix . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : home

Ethernet adapter Ethernet:

Connection-specific DNS Suffix . : home
Description . . . . . . . . . . . : Intel(R) PRO/1000 MT Network Connection
Physical Address. . . . . . . . . : 00-A0-98-02-56-AC
DHCP Enabled. . . . . . . . . . . : Yes
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 192.168.0.145(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Lease Obtained. . . . . . . . . . : 14 May 2024 07:12:16
Lease Expires . . . . . . . . . . : 15 May 2024 07:12:17
Default Gateway . . . . . . . . . : 192.168.0.254
DHCP Server . . . . . . . . . . . : 192.168.0.254

one thing i have noticed is if you check the NIC device in SCALE for this VM the MAC doesnt match the output of the IPCONFIG

Please add screenshots from your truenas GUI:

This is the bridge setup on the NAS, as i said theres one VM fine and this VM that will not work. used the same method on the second NAS and both VMs are fine on that

image

From the dashboard

Just to clarify, br0 is the only NIC attached to the VM?

The mismatch of MAC addresses was a bug in a certain scale version iirc.

Any firewalls active or the likes? Bride is looking good imo, and the gateway is correct in windows.

Yep only BR0 is attached

The VMs are brand new builds so its not like they have been used and its a windows malware/virus issue. Ive made no firewall adjustments and again the other VM on this host plus the two VMs on the other host are fine

Might try spinning another VM up and see if the issue repeats, so odd that it started fine, i could activate it, update it etc… it was working. begining to this its a windows issue rather then TrueNAS but itsnot obvious, at least not to me

The plot thickens

So after a bunch of trial and error i binned the VM off and rebuilt it. created the image, snapshotted it, activated it and setup RDP etc and snapshot, ran updates snap shot… on and on until the VM was 100% where i wanted it to be. Working 100% no issues, left it on over night, all good.

Started using it, was downloaded a an NVIDIA driver file about 800mb and suddenly it drops the intenet connection, restart the VM and it comes back on and finishes the download

then as a test from the 1st VM on the same host i was moving data from one location on the NAS to another and that broke the internet connection of the 2nd VM, network always remains up but the internet connection is lost

Whilst typing this i restart the VM again leaving the first VM moving data and its come back on and is fine. Its like network activity related to the storage is killing just the VMs internet connection

I cant make that make sense in my head

Please post your complete hardware and the exact version of scale you are on.

The VM restarted itself? Can you check the logs please.

No irestarted the VM to resolve the issue

Im running

TrueNAS SCale Dragonfish-24.04.0

Arous B550 MB (latest firmware)
AMD 5600g
64gb DDR4 RAM
2x Dell Perc H310 flashed to LSI-9211

Pool 1
7 x 6TB Dell Enterprise SAS + 1x Hotspare (Drives have all be reformatted and Long SMART test run prior to pool creation) Also there are 2 x 256gb SSD mirrored as a log cache

Pool 2
7x 400gb Dell Enterprise SSD + 1xHotspare

Pool 3
2 x 256gb NVME Mirrored

I’m running 2 VMs, Windows 11 which are fully up to date. 1st VM runs flawlessly i haven’t had one issue

The second VM runs behind an Express VPN client but i have tried with and without the VPN client installed

I had this exact hardware setup as a TrueNAS Core build before i rebuilt it with SCALE and i had none of these issues

i tested with the VPN software and it errored, then i removed the VPN software and restarted and it errored. Im wondering if something is lingering

I am going to try and restore the snapshop prior to VPN installation and try that test again

You didn’t mention a VPN before.

What Network interface is used? I tried to google it quickly, it looked like it was proprietary. Which is probably not the best prerequisite for truenanas.

Hard to tell, if it has been working in core before there may be a but in dragonfish. Can you try 23.10.2 and if that works file a bug report for dragonfish?

Sorry forgot about the VPN, im on automatic at the minute and thats just always been there

Right now its the onboard which i believe is a Realtek. I had ruled out hardware in my head because the Host has always been up and the 1st VM is fine but i have im retiring a couple of R630s in work this week and i can have the 10gb X520-T2 out of that or there was a couple of 2.5gb TP-Link NICs i was thinking of investing in

Havent had a chance to see whats compatible yet

I might be seeing similar issues.

I’m finding my VM, which is connected via a bridge, has network issues. To early to say what exactly is happening, but when its happening it ssh will cut out while I’m typing into the vm.

I’m going back to Cobia on this system for a while to see if its stable…

it gets stranger… by the time i got home the VM had gone off and when i tried to restart it it compalined it could no longer find the Disk for the VM.

I recreated the disk and restored my VM snap shot to it and for the last 16hrs (ish) the VM has been perfectly fine, not dropped the internet once.

I dont get it but at time of writing this is working

Been stable for 24 hrs.

So far, each time i switch to Dragonfish, the VM which previously was 100% stable, has network issues within a few hours

lru_gen is disabled

Will leave Cobia for a few more days.

Well this is and has stayed working so far (touches wood)

I’m going to leave it running and see what happens, wondering if its a bug in the creation on the VM disk, not sure why it would manifest in windows the way it did but there we go

we’ll see i guess

1 Like