Comments (23)
So this is not necessarily a WireGuard issue. Here is an example of opening a UDP server on the host network namespace which also can cause conflicts with transparent proxy mode:
root@kind-worker:/home/cilium# python3
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from socket import *
>>>
>>> serverSocket = socket(AF_INET, SOCK_DGRAM)
>>>
>>> serverSocket.bind(('', 12000))
>>>
Verifying that the sever is running:
root@kind-worker:/home/cilium# ss -ulpn | grep 12000
UNCONN 0 0 0.0.0.0:12000 0.0.0.0:* users:(("python3",pid=1049,fd=3))
Running a dig with client port 12000 (in the pod namespace):
$ kubectl -n cilium-test exec client-59c486cb54-9f74n -- dig -b '10.244.1.163#12000' google.com
;; Warning: ID mismatch: expected ID 38409, got 23676
Error in the cilium pod:
$ kubectl -n kube-system logs cilium-96j2c | grep level=error
level=error msg="Cannot forward proxied DNS lookup" DNSRequestID=38409 dnsName=google.com. endpointID=1079 error="failed to dial connection to 10.244.2.203:53: dial udp 10.244.1.163:12000->10.244.2.203:53: bind: address already in use" identity=19156 ipAddr="10.244.1.163:12000" subsys=fqdn/dnsproxy
from cilium.
got the same problem. is there any way to solve it?
Do you have the problem with WireGuard or with a other host network socket?
There is no solution yet. We have discussed various solutions (e.g. for WireGuard changing the default port and providing a seamless migration path), but so far no easy fix has been found (besides turning off
dnsproxy-enable-transparent-mode
, which does have security implications for encryption though)
WireGuard
from cilium.
got the same problem
level=error msg="Cannot forward proxied DNS lookup" DNSRequestID=41915 dnsName=xxxx.s3.cn-north-1.amazonaws.com.cn. endpointID=3109 error="failed to dial connection to 10.180.7.197:53: dial udp 10.180.8.200:8472->10.180.7.197:53: bind: address already in use" identity=7250 ipAddr="10.180.8.200:8472" subsys=fqdn/dnsproxy
Thank you! What is interesting about your log message is that the client pod (10.180.8.200) is using a non-ephemeral source port (i.e. Linux usually uses source ports 32768–60999). Do you have some information what software is selecting the source port for your client pod?
I set net.ipv4.ip_local_port_range = 1024 65535;port 8472 is cilium for vxlan(UDP)?
from cilium.
Will give reserved ports a go.
Out of curiosity, does your client-side application not perform retries for failed DNS lookups? I would have expected clients to retry after a port collision, since DNS is not a reliable protocol after all
That's true, and it's not a service we own unfortunately. They have a very brittle HA implementation which appeared to completely fail as a result of not being able to look pods up.
from cilium.
It's empty
from cilium.
As discussed offline, this might not be a Wireguard problem. Any port in the ephemeral range which is used by the host network namespace might cause conflicts it seems
from cilium.
got the same problem. is there any way to solve it?
from cilium.
got the same problem. is there any way to solve it?
Do you have the problem with WireGuard or with a other host network socket?
There is no solution yet. We have discussed various solutions (e.g. for WireGuard changing the default port and providing a seamless migration path), but so far no easy fix has been found (besides turning off dnsproxy-enable-transparent-mode
, which does have security implications for encryption though)
from cilium.
got the same problem
level=error msg="Cannot forward proxied DNS lookup" DNSRequestID=41915 dnsName=xxxx.s3.cn-north-1.amazonaws.com.cn. endpointID=3109 error="failed to dial connection to 10.180.7.197:53: dial udp 10.180.8.200:8472->10.180.7.197:53: bind: address already in use" identity=7250 ipAddr="10.180.8.200:8472" subsys=fqdn/dnsproxy
from cilium.
got the same problem
level=error msg="Cannot forward proxied DNS lookup" DNSRequestID=41915 dnsName=xxxx.s3.cn-north-1.amazonaws.com.cn. endpointID=3109 error="failed to dial connection to 10.180.7.197:53: dial udp 10.180.8.200:8472->10.180.7.197:53: bind: address already in use" identity=7250 ipAddr="10.180.8.200:8472" subsys=fqdn/dnsproxy
Thank you! What is interesting about your log message is that the client pod (10.180.8.200) is using a non-ephemeral source port (i.e. Linux usually uses source ports 32768–60999). Do you have some information what software is selecting the source port for your client pod?
from cilium.
Thank you, yes, that explains it. Port 8472 in indeed also used by Cilium VXLAN, which of can collide if you increase the ephemeral range by setting the port range to 1024-65535.
from cilium.
I'm hitting this problem with Wireguard enabled. I get about 5 collisions per hour (not acceptable as it can temporarily take entire services down).
I'm assuming a dirty workaround at the moment is to set the ip_local_port_range to 51872 65535
, but then there is a very small port range within which the cluster can use?
from cilium.
I'm assuming a dirty workaround at the moment is to set the ip_local_port_range to 51872 65535, but then there is a very small port range within which the cluster can use?
Have you tried using net.ipv4.ip_local_reserved_ports? This might be something we could implement to fix also on the Cilium CNI side at least the problem with WireGuard.
I get about 5 collisions per hour (not acceptable as it can temporarily take entire services down).
Out of curiosity, does your client-side application not perform retries for failed DNS lookups? I would have expected clients to retry after a port collision, since DNS is not a reliable protocol after all
from cilium.
I've created a PR which will instruct Cilium to set net.ipv4.ip_local_reserved_ports for the WireGuard and VXLAN port by default: #32128
Hopefully this should improve the situation for most users. We still ought to think about a better solution long-term though.
from cilium.
FYI, I've set 51871
to be a reserved port, but I'm still seeing the binding failures on that port.
from cilium.
FYI, I've set
51871
to be a reserved port, but I'm still seeing the binding failures on that port.
Interesting, in my local testing net.ipv4.ip_local_reserved_ports
did indeed prevent dig
from using that port. The important thing is that the setting needs to be set in the pod network namespace, not the namespace of the host (since the setting is not inherited)
from cilium.
FYI, I've set
51871
to be a reserved port, but I'm still seeing the binding failures on that port.Interesting, in my local testing
net.ipv4.ip_local_reserved_ports
did indeed preventdig
from using that port. The important thing is that the setting needs to be set in the pod network namespace, not the namespace of the host (since the setting is not inherited)
That'll be my issue - apologies. I had a feeling that was the case.
from cilium.
update cilium to 1.14.11 ,the problem still happen;
when i exec -it cilium-xxx sysctl net.ipv4.ip_local_reserved_ports
root@ike-012:/home/cilium# sysctl net.ipv4.ip_local_reserved_ports
net.ipv4.ip_local_reserved_ports = 30000-32767
30000-32767 is for NodePort
from cilium.
Please check the ip_local_reserved_ports
inside the pod namespace, not the host namespace. What's the exact problem that you observe (which ports are affected? what's the error message)?
Edit: Also, because Cilium can't access the pod namespace, the workaround with ip_local_reserved_ports
will only be set for new pods (i.e. pods created after Cilium has been updated). You will have to restart affected pods for the workaround to take effect.
from cilium.
Hello, I just updated to 1.15.5, my understanding was that the WireGuard port (51871) would/should be excluded by default. I still get errors like: bind: address already in use" identity=34960 ipAddr="10.0.2.92:51871" subsys=fqdn/dnsproxy. Fresh install. Searching documentation I cannot find how to exclude the port.
from cilium.
Hello, I just updated to 1.15.5, my understanding was that the WireGuard port (51871) would/should be excluded by default. I still get errors like: bind: address already in use" identity=34960 ipAddr="10.0.2.92:51871" subsys=fqdn/dnsproxy. Fresh install. Searching documentation I cannot find how to exclude the port.
Hi, the port should be automatically excluded without any configuration needed. Are you able to share a sysdump?
from cilium.
Sure, thank you.
cilium-sysdump-20240522-090917.zip
from cilium.
Sure, thank you. cilium-sysdump-20240522-090917.zip
Thanks! The sysdump itself looks alright. Unfortunately we don't dump the sysctl of the pods, the feature could probably use some additional logging for troubleshooting.
Could you share the output of cat /proc/sys/net/ipv4/ip_local_reserved_ports
from one of your client pods (e.g. uptime-kuma-986f65945-bcjdx
?
from cilium.
Related Issues (20)
- Why __builtin_memcmp() is forbidden? HOT 1
- clustermesh-apiserver service sessionAffinity regression
- Routing device discovery SIGSEGV HOT 2
- Envoy standalone DaemonSet does not pick up IPCache changes after cilium-agent restart
- CFP: alternative distro images for cilium HOT 3
- UBSAN reporting an out-of-bounds array access. HOT 3
- nodeinit pods failing in 1.15.5 HOT 15
- Timeout while waiting for initial conntrack scan HOT 15
- envoy runtime config is not cleaned fully after ciliumenvoyconfig CR deletion HOT 4
- CI: Cilium E2E Upgrade - no-errors-in-logs - retrieving device lxc_health: Link not found HOT 37
- CI: Cilium E2E Upgrade - no-policies-extra/pod-to-local-nodeport HOT 3
- BGP Advertised Path Attributes fails to match CiliumLoadBalancerIPPool
- ClusterwideNetworkPolicy causes intermittent connection problems to k8s API HOT 16
- Error attaching XDP program on device bond0.184: "operation not supported HOT 1
- CI: ci-gke tunnel-ingress-controller test failure due to error log HOT 2
- kube-proxy-replacement=strict doesn't enable needed features like kube-proxy-replacement=true does HOT 3
- CFP: ServiceImport auto creation for MCS-API HOT 3
- operator: pod store is never initialized HOT 10
- Packet drops with `externalTrafficPolicy: Cluster` in generic-veth chaining mode HOT 4
- CI: TestNodeChurnXFRMLeaks HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cilium.