DNS resolution refused on TrueNAS SCALE?

I’ve recently moved from CORE to SCALE (latest Dragonfish). I’ve done a brand new installation and am re-applying the CORE configuration from captured CORE screenshots. I am struggling to have the same network configuration resolve DNS entries on SCALE for some reason, and I am baffled as to why.

Here is my CORE network configuration that worked without a problem (i.e. it could reach out to Internet and fetch updates).

Here is the SCALE configuration that no longer works.

Extra info:

  • 172.16.0.1:53 is where pfSense firewall’s DNS Resolver service is running
  • 10.0.50.2 is where the PVI interface gateway is defined (inter-VLAN routing)
  • there are no firewall changes between previous CORE and new SCALE installation
  • this setup works for the rest of the network and did work for CORE - but no longer works on SCALE

I can ping both the PVI gateway and the firewall - basic connectivity works.

admin@storage[~]$ ping 172.16.0.1
PING 172.16.0.1 (172.16.0.1) 56(84) bytes of data.
64 bytes from 172.16.0.1: icmp_seq=1 ttl=64 time=0.041 ms
64 bytes from 172.16.0.1: icmp_seq=2 ttl=64 time=0.058 ms
64 bytes from 172.16.0.1: icmp_seq=3 ttl=64 time=0.055 ms
64 bytes from 172.16.0.1: icmp_seq=4 ttl=64 time=0.065 ms
--- 172.16.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3070ms
rtt min/avg/max/mdev = 0.041/0.054/0.065/0.008 ms
admin@storage[~]$ ping 10.0.50.2
PING 10.0.50.2 (10.0.50.2) 56(84) bytes of data.
64 bytes from 10.0.50.2: icmp_seq=1 ttl=64 time=0.804 ms
64 bytes from 10.0.50.2: icmp_seq=2 ttl=64 time=0.817 ms
64 bytes from 10.0.50.2: icmp_seq=3 ttl=64 time=0.865 ms
64 bytes from 10.0.50.2: icmp_seq=4 ttl=64 time=0.925 ms
--- 10.0.50.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3056ms
rtt min/avg/max/mdev = 0.804/0.852/0.925/0.047 ms

Digging google.com through the default nameserver doesn’t work.

admin@storage[~]$ dig google.com
;; communications error to 172.16.0.1#53: connection refused
;; communications error to 172.16.0.1#53: connection refused
;; communications error to 172.16.0.1#53: connection refused

; <<>> DiG 9.18.19-1~deb12u1-Debian <<>> google.com
;; global options: +cmd
;; no servers could be reached

…but it does work through Google’s DNS.

admin@storage[~]$ dig @8.8.8.8 google.com

; <<>> DiG 9.18.19-1~deb12u1-Debian <<>> @8.8.8.8 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27401
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             153     IN      A       142.250.69.206

;; Query time: 16 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sat Jun 22 00:33:13 MDT 2024
;; MSG SIZE  rcvd: 55

I’ve checked /etc/resolv.conf and everything looks fine:

admin@storage[~]$ cat /etc/resolv.conf
domain <redacted>
nameserver 172.16.0.1

Also, I am not seeing anything listening on port 53 (should there be something?)

admin@storage[~]$ netstat -an | grep "LISTEN "
tcp        0      0 127.0.0.1:6999          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:6444          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:10010         0.0.0.0:*               LISTEN
tcp        0      0 10.0.60.1:5357          0.0.0.0:*               LISTEN
tcp        0      0 10.0.30.10:5357         0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:10257         0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:10259         0.0.0.0:*               LISTEN
tcp        0      0 10.0.50.1:5357          0.0.0.0:*               LISTEN
tcp        0      0 10.0.50.1:179           0.0.0.0:*               LISTEN
tcp        0      0 10.0.50.1:50051         0.0.0.0:*               LISTEN
tcp        0      0 10.0.100.101:5357       0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:50051         0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:6000            0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:445             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:3260            0.0.0.0:*               LISTEN
tcp6       0      0 ::1:179                 :::*                    LISTEN
tcp6       0      0 :::29652                :::*                    LISTEN
tcp6       0      0 :::29653                :::*                    LISTEN
tcp6       0      0 :::29642                :::*                    LISTEN
tcp6       0      0 :::29643                :::*                    LISTEN
tcp6       0      0 :::29644                :::*                    LISTEN
tcp6       0      0 :::20244                :::*                    LISTEN
tcp6       0      0 :::10250                :::*                    LISTEN
tcp6       0      0 :::6443                 :::*                    LISTEN
tcp6       0      0 :::443                  :::*                    LISTEN
tcp6       0      0 :::445                  :::*                    LISTEN
tcp6       0      0 :::139                  :::*                    LISTEN
tcp6       0      0 :::22                   :::*                    LISTEN
tcp6       0      0 :::111                  :::*                    LISTEN
tcp6       0      0 :::80                   :::*                    LISTEN
tcp6       0      0 :::3260                 :::*                    LISTEN

My latest thinking is to try and open the port 53 via iptables manually, but I really really don’t want to do that as it feels very wrong.

Am I missing something really basic here? What else could I try? Any thoughts are appreciated.

EDIT: Internet connectivity is restored if I put 8.8.8.8 as the secondary nameserver but I don’t want to do that as DNS Resolver on the firewall is capable of doing the extra lookup to public DNS.

I’ve also triple-checked to see if there are any remnant firewall rules - there are none. I’ve also made sure that DNS lookups from the IP range of PVI (10.0.50.x) are allowed.

This firewall config works under CORE but doesn’t under SCALE.

EDIT 2: Adding Kernel IP routing table. This is also looking ok to me.

admin@storage[~]$ netstat -r
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
default         10.0.50.2       0.0.0.0         UG        0 0          0 enp134s0f0np0
10.0.30.0       0.0.0.0         255.255.255.240 U         0 0          0 enp7s0f1
10.0.50.0       0.0.0.0         255.255.255.248 U         0 0          0 enp134s0f0np0
10.0.60.0       0.0.0.0         255.255.255.248 U         0 0          0 enp134s0f1np1
10.0.100.0      0.0.0.0         255.255.255.0   U         0 0          0 eno1
172.16.0.0      0.0.0.0         255.255.0.0     U         0 0          0 kube-bridge

Can you describe the pfSense part more? Is it running as an app or something on your new SCALE install?

That’s not something that would just transfer over as is from CORE, so how did you set it up in SCALE?

What you have posted suggests your TrueNAS host can’t reach port 53 on what should be acting as your DNS server. TrueNAS doesn’t run a firewall. That leads me to think something is iffy with the pfSense configuration, either of pfSense itself or the support layers around it.

Edit: I’ll add that I don’t particularly like running DNS as a service on something that itself depends on it, it’s can be quite fragile.

Edit2: For a moment I suspect I conflated it with Pihole serving as DNS but as pmh kindly reminded me, pfSense is only available in the form of a full OS.

Not relevant since it's not available in app form

But if I was running it as a custom app on SCALE I would assign it it’s own IP so that I could use port 53 without restrictions, like so, obviously adapting the IP and possibly interface to suit my network:

pfSense is a dedicated “hardware” firewall. Part of his network infrastructure, apparently.

@dxun pull the big gun and use tcpdump to trace the packets on SCALE and on pfSense. Are the request leaving SCALE out the correct interface? Are they arriving at pfSense?

Thank you. I know what pfSense and the purpose it can serve in a network.

I don’t know what shape it takes in the dxun’s network. Is it running on dedicated HW on another machine? It could also be in a VM.

Edit: My mistake, I don’t know why I was mentioning it could be run as an app, it is obviously not available in that form.

Thank you. I know what pfSense and the purpose it can serve in a network.
I don’t know what shape it takes in the dxun’s network. Is it running on dedicated HW on another machine? It could also be in a VM.

@neofusion It’s a dedicated firewall sitting on bare-metal. FWIW, the SCALE is also run on bare-metal, so no VM shenanigenary.

The 172.16.0.1 address is the actual firewall internal IP; it is the other end of a transit network sitting on a dedicated fibre link that carries all upstream traffic that the L3 switch’s inter-VLAN routing couldn’t match to its internal routing table to the firewall where, among other things, DNS Resolver runs. Similarly, the firewall config contains the switch’s transit nework endpoint so that the traffic can flow in the other direction as needed.

I can provide more details if needed. This setup has been (and is) working for the rest of the network (some 40+ IP addresses) for years now thus I am inclined to think there is some subtlety involved between FreeBSD and Linux network stacks. Or I am just being blind with a stupid oversight somewhere.

@dxun pull the big gun and use tcpdump to trace the packets on SCALE and on pfSense. Are the request leaving SCALE out the correct interface? Are they arriving at pfSense?

Hmm, I haven’t done this before, let me educate myself first a bit. In the meantime, wanted to show a couple of screenshots from the firewall.

This shows how DNS resolution behaviour should work. It uses Cloudflare’s DNS for any DNS entry the local DNS server can’t resolve.

Out of curiousity, I’ve also placed a couple of FW rules for DNS and NTP traffic (unrelated) in order to capture traffic from the rest of the network. From what I can see the port 53 is accessible on the FW.


Let me learn more about the tcpdump scalpel - in the meantime, does any of this look wrong?

EDIT: Here is the sudo tcpdump -i any -s0 port 53 as I do dig google.com in a separate terminal.

I am not sure what to make of this? I am surprised to see traffic for DNS resolution coming in from seemingly random, five-figure ports, but maybe I am not thinking about this right.

Also surprised to see traffic against a DNS entry ix-truenas.<my-redacted-domain>? Where would this come from and why?

EDIT 2: Thought I’d share this bit I captured as I took out the tcpdump process.

09:16:18.127313 kube-bridge In  IP 172.16.0.13.60823 > 172.16.0.1.domain: 26130+ NS? . (17)
^C
1131 packets captured
1192 packets received by filter
0 packets dropped by kernel
admin@storage[~]$

No packets dropped (seemingly) but some packets were…not captured? Also not sure if this has any relevance to the troubleshooting we’re doing.

1 Like

Thank you for taking your time answering my question.

I noticed that your pfSense-box’s IP overlaps with the kube-bridge interface supporting SCALE’s app layer.
I would think that would be a problem. Is that intentional?

Are you using TN apps and are you married to that specific address range? If that subnet isn’t vital, you could try changing kubernetes to use a different private subnet in Apps → Settings → Advanced Settings.

2 Likes

Bingo! Good call! TN cannot reach the DNS server because the network is locally connected!

@dxun Change your Kubernetes network and things will probably work.

2 Likes

I noticed that your pfSense-box’s IP overlaps with the kube-bridge interface supporting SCALE’s app layer.
I would think that would be a problem. Is that intentional?

No, I am not using any apps at the moment - this is a brand new installation of SCALE. I also wasn’t aware of this!

Bingo! Good call! TN cannot reach the DNS server because the network is locally connected!

Alright - let’s give it a shot.

Previous config:

New config (simply incrementing the second octet - should still fall within private IP range):

And…it works!!

admin@storage[~]$ dig google.com

; <<>> DiG 9.18.19-1~deb12u1-Debian <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23982
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             300     IN      A       142.250.217.78

;; Query time: 24 msec
;; SERVER: 172.16.0.1#53(172.16.0.1) (UDP)
;; WHEN: Sat Jun 22 10:38:25 MDT 2024
;; MSG SIZE  rcvd: 55

Great stuff, this indeed did the trick, massive thank you @neofusion and @pmh !! :partying_face:

Before I mark this as resolved, any suggestions on improvements of k3s network stack config that I have right now? There shouldn’t be any overlap with any IP range I am currently running…anything else that you can spot perhaps?

Glad to hear it!

I don’t see anything off in your final configuration screenshot. Keep an eye on it when you eventually swap to 24.10 Electric Eel, since iX is moving over to Docker they will likely tinker with the underpinnings of app networking.

1 Like