Comments (20)
This is a duplicate of #122740
We succeeded to reproduce the issue constantly by :
kill -STOP
after that you can restart the kube dns and proxy pod normally and the problem occurs.
though, this is really something users should take into account when doing rolling updates, and can be mitigated implementing best practices ... is not common to start killing all the pods on a node, you should first drain the node https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/
from kubernetes.
no worries, I think what you are commenting is what Dan Winship describes here #112604
/triage accepted
from kubernetes.
/sig network
from kubernetes.
/area kube-proxy
from kubernetes.
We have the same issue but with ngnix proxy
Once the coreDns pod restarted, the ngnix proxy still tries to resolve names by old coreDns ip.
from kubernetes.
the reproducer seems very invasive, either way, you need to provide logs and timing of the events, run kube-proxy with -v4 per example
from kubernetes.
/assign @aojea
from kubernetes.
We talked about this in the SIG Network meeting today, this may relate #112604
from kubernetes.
We also experienced a similar issue with Envoy in AKS with kube-proxy image mcr.microsoft.com/oss/kubernetes/kube-proxy:v1.28.5-hotfix.20240411
. To validate the cause we deleted stale conntrack entries and traffic to the DNS service began working again. To mitigate until there is a fix we are reconfiguring Envoy to use tcp for DNS requests.
from kubernetes.
@shaneutt what was decided in the meeting?
from kubernetes.
That we need to investigate it and find the root cause, we need logs of kube-proxy identifying the problematic IP that leaves stale entries
from kubernetes.
We succeeded to reproduce the issue constantly by :
kill -STOP <coreDNS pid> <kubeproxy pid>
after that you can restart the kube dns and proxy pod normally and the problem occurs.
Looks like the problem affects only processes using udp protocol and the same source port for the dns queries.
Also they need to retry the dns queries from the same source port constantly in order to prevent the reaching the timeoput of 120seconds. If the envoy/ngnix or any other tool stops trying the resolving attempts, the contrack table will be updated after 120 sec.
The parameter which affects this timeout is
nf_conntrack_udp_timeout_stream
A simple python script for testing. The script uses the udp protocol and the same source port
import socket
import struct
import random
import time
def create_dns_query(domain):
transaction_id = random.randint(0, 65535)
flags = 0x0100 # Standard query with recursion desired
header = struct.pack('!HHHHHH', transaction_id, flags, 1, 0, 0, 0)
question = b''
for part in domain.split('.'):
question += struct.pack('B', len(part)) + part.encode()
question += b'\x00' # Terminating null byte
question += struct.pack('!HH', 1, 1) # QTYPE (A record) and QCLASS (IN)
return header + question
def send_dns_query(sock, domain, dns_server="172.20.0.10"):
try:
query = create_dns_query(domain)
sock.sendto(query, (dns_server, 53))
sock.settimeout(2) # Set a timeout for receiving the response
response, _ = sock.recvfrom(1024)
flags = struct.unpack('!H', response[2:4])[0]
rcode = flags & 0xF
ancount = struct.unpack('!H', response[6:8])[0]
print(f"DNS Response for {domain}:")
print(f"Response Code: {rcode}")
print("Answer Section:")
offset = 12
while response[offset] != 0:
offset += 1
offset += 5
for _ in range(ancount):
if (response[offset] & 0xC0) == 0xC0:
offset += 2
else:
while response[offset] != 0:
offset += 1
offset += 1
rec_type, rec_class, ttl, data_len = struct.unpack('!HHIH', response[offset:offset+10])
offset += 10
if rec_type == 1: # A record
ip = '.'.join(map(str, response[offset:offset+4]))
print(f"{domain} {ttl} IN A {ip}")
offset += data_len
except socket.timeout:
print(f"Error: DNS query timed out for {domain}")
except Exception as e:
print(f"Error occurred while querying {domain}: {str(e)}")
def continuous_dns_resolution(domain, source_port=12345, interval=1):
print(f"Starting continuous DNS resolution for {domain} every {interval} second(s)")
print("Press Ctrl+C to stop the script")
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
try:
sock.bind(('', source_port))
while True:
send_dns_query(sock, domain)
print("\n") # Add a newline for better readability between queries
time.sleep(interval)
except KeyboardInterrupt:
print("\nScript terminated by user")
finally:
sock.close()
# Example usage
if __name__ == "__main__":
domain_to_query = "example.com"
continuous_dns_resolution(domain_to_query)
To sumarize :
Under some conditions The kube-proxy doesn't update the conntrack table if the udp and the same source port are used.
from kubernetes.
@aojea agree with your suggestion on draining nodes safely, but in cases where the node fails, which happened on this issue log, it is the kube-proxy that should respond to such failure, dropping and recreating the conntrack entries appropriately.
from kubernetes.
@aojea agree with your suggestion on draining nodes safely, but in cases where the node fails, which happened on this issue log, it is the kube-proxy that should respond to such failure, dropping and recreating the conntrack entries appropriately.
the bug is legit, but I think that is hard to hit, what happened in this issue description is that the user is manually forcing this scenarios that is known to fail and documented in #122740
from kubernetes.
@aojea agree with your suggestion on draining nodes safely, but in cases where the node fails, which happened on this issue log, it is the kube-proxy that should respond to such failure, dropping and recreating the conntrack entries appropriately.
the bug is legit, but I think that is hard to hit, what happened in this issue description is that the user is manually forcing this scenarios that is known to fail and documented in #122740
In our case the node went to not ready state because of the overload.
So the kubeproxy has stuck for some time.
The core dns has been evicted.
After a while the node recovered but entire cluster became almost malfunctioned because of the dns issue
So we had to restart many pods where affected by that udp bug
from kubernetes.
In our case the node went to not ready state because of the overload.
So the kubeproxy has stuck for some time.
this is interesting ... why kube-proxy got stuck?
from kubernetes.
In our case the node went to not ready state because of the overload.
So the kubeproxy has stuck for some time.
this is interesting ... why kube-proxy got stuck?
The whole node got stuck because of memory overloading.
It's a bit complicated to reproduce in the lab, but happened twice in two different eks clusters.
Aws support recommended us to give more memory to the kubeReserved as a mitigation.
from kubernetes.
you need to provide logs and timing of the events, run kube-proxy with -v4 per example
I0625 13:13:20.351149 1 proxier.go:796] "Syncing iptables rules"
I0625 13:13:20.351454 1 iptables.go:358] "Running" command="iptables-save" arguments=["-t","nat"]
I0625 13:13:20.355238 1 proxier.go:1504] "Reloading service iptables data" numServices=40 numEndpoints=52 numFilterChains=6 numFilterRules=8 numNATChains=8 numNATRules=45
I0625 13:13:20.355259 1 iptables.go:423] "Running" command="iptables-restore" arguments=["-w","5","-W","100000","--noflush","--counters"]
I0625 13:13:20.359194 1 proxier.go:1533] "Network programming" endpoint="kube-system/kube-dns" elapsed=0.35915983
I0625 13:13:20.359258 1 cleanup.go:63] "Deleting conntrack stale entries for services" IPs=[]
I0625 13:13:20.359280 1 cleanup.go:69] "Deleting conntrack stale entries for services" nodePorts=[]
I0625 13:13:20.359306 1 conntrack.go:66] "Clearing conntrack entries" parameters=["-D","--orig-dst","172.20.0.10","--dst-nat","10.240.244.119","-p","udp"]
I0625 13:13:20.361710 1 conntrack.go:71] "Conntrack entries deleted" output=<
conntrack v1.4.4 (conntrack-tools): 17 flow entries have been deleted.
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=55850 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=55850 mark=0 use=1
udp 17 3 src=10.240.244.34 dst=172.20.0.10 sport=59350 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=59350 mark=0 use=1
udp 17 3 src=10.240.244.34 dst=172.20.0.10 sport=59716 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=59716 mark=0 use=1
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=42887 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=42887 mark=0 use=1
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=41925 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=41925 mark=0 use=1
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=40972 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=40972 mark=0 use=1
udp 17 3 src=10.240.244.34 dst=172.20.0.10 sport=51492 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=51492 mark=0 use=2
udp 17 16 src=10.240.244.58 dst=172.20.0.10 sport=55964 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.58 sport=53 dport=55964 mark=0 use=1
udp 17 3 src=10.240.244.34 dst=172.20.0.10 sport=48529 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=48529 mark=0 use=1
udp 17 3 src=10.240.244.34 dst=172.20.0.10 sport=43667 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=43667 mark=0 use=1
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=51934 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=51934 mark=0 use=1
udp 17 5 src=10.240.244.154 dst=172.20.0.10 sport=35714 dport=53 src=10.240.244.119 dst=10.240.244.154 sport=53 dport=35714 mark=0 use=1
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=58990 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=58990 mark=0 use=1
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=40587 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=40587 mark=0 use=1
udp 17 19 src=10.240.244.140 dst=172.20.0.10 sport=48436 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=48436 mark=0 use=1
udp 17 23 src=10.240.244.34 dst=172.20.0.10 sport=40050 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.34 sport=53 dport=40050 mark=0 use=1
udp 17 16 src=10.240.244.191 dst=172.20.0.10 sport=40033 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.191 sport=53 dport=40033 mark=0 use=1
@aojea
if you need anything more, pls let us know
from kubernetes.
/assign
thanks,
from kubernetes.
Reviewing this issue since we still see impact occasionally, in the most recent incidence of this we found and deleted this conntrack entry for a destination service with ClusterIP 192.168.0.10:
conntrack -L | grep 192.168.0.10 | grep UNREPLIED
udp 17 29 src=10.120.1.150 dst=192.168.0.10 sport=49660 dport=53 [UNREPLIED] src=192.168.0.10 dst=10.120.1.150 sport=53 dport=49660 mark=0 use=1
It looks like DNAT isn't set up. After deleting it the next udp request succeeded and requests to the ClusterIP began working again.
Reviewing some related issues, I found the following PR which seems like it would resolve our case: #122741 (related to #122740).
from kubernetes.
Related Issues (20)
- [FG:InPlacePodVerticalScaling] Infeasible resize is actuated by restarting container HOT 2
- JobTemplate validation bug: OnExitCodes should be an optional field for PodFailurePolicyRule HOT 5
- Remove long deprecated gitrepo volume plugin from code base HOT 5
- why build visitor NewFlattenListVisitor func executed twice repeatedly HOT 6
- Optimize volumemanager run loop HOT 4
- Failure cluster [3cbc4688...] [sig-storage] GenericPersistentVolume [Disruptive] When kubelet restarts Should test that a volume mounted to a pod that is deleted while the kubelet is down unmounts when the kubelet returns. HOT 2
- Failure cluster [b925900d...] [It] [sig-storage] [Serial] Volume metrics Ephemeral should create metrics for total number of volumes in A/D Controller HOT 3
- Can only access pod nodePort accessible only from the node where the pod is running HOT 3
- Mention fields' immutability in spec HOT 1
- Exit code of `kubectl config delete-*` is not unified HOT 2
- Document sinceTime for pod log API endpoint HOT 3
- Add kubelet configuration option to adjust eviction period HOT 3
- Failure cluster [12424bf6...][EC2-ONLY] [sig-network] Networking Granular Checks: Services should function for service endpoints using hostNetwork HOT 4
- Failure cluster [5cf7d618...] `[sig-network] Services should preserve source pod IP for traffic thru service cluster IP` HOT 11
- [Failing Test] [sig-node] [Feature:NodeLogQuery] should return the kubelet logs since the current date and time HOT 5
- [Flaking Test] [sig-cloud-provider-gcp] [Disruptive]NodeLease NodeLease deletion node lease should be deleted when corresponding node is deleted HOT 3
- [Flaky Test] gce-cos-master-slow (LoadBalancer Test) HOT 1
- Kubernetes HOT 4
- Can we implement `kubectl get pod -o gron` ? HOT 3
- Add 100/1000s buckets for prometheus workqueue histograms QueueLatencyKey and WorkDurationKey HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes.