What happened? We experienced an EC2 node failure within our EKS c

This is a duplicate of <a class="issue-link js-issue-link" data-error-text="Failed to

/assign <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Conntrack tables having stale entries for UDP connection about kubernetes HOT 20 OPEN

mohideenibrahim08 commented on July 24, 2024

Conntrack tables having stale entries for UDP connection

from kubernetes.

Comments (20)

aojea commented on July 24, 2024 1

This is a duplicate of #122740

We succeeded to reproduce the issue constantly by :
kill -STOP
after that you can restart the kube dns and proxy pod normally and the problem occurs.

though, this is really something users should take into account when doing rolling updates, and can be mitigated implementing best practices ... is not common to start killing all the pods on a node, you should first drain the node https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/

from kubernetes.

aojea commented on July 24, 2024 1

no worries, I think what you are commenting is what Dan Winship describes here #112604

/triage accepted

from kubernetes.

samof76 commented on July 24, 2024

/sig network

from kubernetes.

samof76 commented on July 24, 2024

/area kube-proxy

from kubernetes.

alexku7 commented on July 24, 2024

We have the same issue but with ngnix proxy

Once the coreDns pod restarted, the ngnix proxy still tries to resolve names by old coreDns ip.

from kubernetes.

aojea commented on July 24, 2024

the reproducer seems very invasive, either way, you need to provide logs and timing of the events, run kube-proxy with -v4 per example

from kubernetes.

shaneutt commented on July 24, 2024

/assign @aojea

from kubernetes.

shaneutt commented on July 24, 2024

We talked about this in the SIG Network meeting today, this may relate #112604

from kubernetes.

andrewtagg-db commented on July 24, 2024

We also experienced a similar issue with Envoy in AKS with kube-proxy image mcr.microsoft.com/oss/kubernetes/kube-proxy:v1.28.5-hotfix.20240411. To validate the cause we deleted stale conntrack entries and traffic to the DNS service began working again. To mitigate until there is a fix we are reconfiguring Envoy to use tcp for DNS requests.

from kubernetes.

samof76 commented on July 24, 2024

@shaneutt what was decided in the meeting?

from kubernetes.

aojea commented on July 24, 2024

#125467 (comment)

That we need to investigate it and find the root cause, we need logs of kube-proxy identifying the problematic IP that leaves stale entries

from kubernetes.

alexku7 commented on July 24, 2024

We succeeded to reproduce the issue constantly by :
kill -STOP <coreDNS pid> <kubeproxy pid>
after that you can restart the kube dns and proxy pod normally and the problem occurs.

Looks like the problem affects only processes using udp protocol and the same source port for the dns queries.
Also they need to retry the dns queries from the same source port constantly in order to prevent the reaching the timeoput of 120seconds. If the envoy/ngnix or any other tool stops trying the resolving attempts, the contrack table will be updated after 120 sec.

The parameter which affects this timeout is
nf_conntrack_udp_timeout_stream

A simple python script for testing. The script uses the udp protocol and the same source port

import socket
import struct
import random
import time

def create_dns_query(domain):
    transaction_id = random.randint(0, 65535)
    flags = 0x0100  # Standard query with recursion desired
    header = struct.pack('!HHHHHH', transaction_id, flags, 1, 0, 0, 0)
    
    question = b''
    for part in domain.split('.'):
        question += struct.pack('B', len(part)) + part.encode()
    question += b'\x00'  # Terminating null byte
    
    question += struct.pack('!HH', 1, 1)  # QTYPE (A record) and QCLASS (IN)
    
    return header + question

def send_dns_query(sock, domain, dns_server="172.20.0.10"):
    try:
        query = create_dns_query(domain)
        sock.sendto(query, (dns_server, 53))
        
        sock.settimeout(2)  # Set a timeout for receiving the response
        response, _ = sock.recvfrom(1024)
        
        flags = struct.unpack('!H', response[2:4])[0]
        rcode = flags & 0xF
        
        ancount = struct.unpack('!H', response[6:8])[0]
        
        print(f"DNS Response for {domain}:")
        print(f"Response Code: {rcode}")
        print("Answer Section:")
        
        offset = 12
        while response[offset] != 0:
            offset += 1
        offset += 5
        
        for _ in range(ancount):
            if (response[offset] & 0xC0) == 0xC0:
                offset += 2
            else:
                while response[offset] != 0:
                    offset += 1
                offset += 1
            
            rec_type, rec_class, ttl, data_len = struct.unpack('!HHIH', response[offset:offset+10])
            offset += 10
            
            if rec_type == 1:  # A record
                ip = '.'.join(map(str, response[offset:offset+4]))
                print(f"{domain} {ttl} IN A {ip}")
            
            offset += data_len
        
    except socket.timeout:
        print(f"Error: DNS query timed out for {domain}")
    except Exception as e:
        print(f"Error occurred while querying {domain}: {str(e)}")

def continuous_dns_resolution(domain, source_port=12345, interval=1):
    print(f"Starting continuous DNS resolution for {domain} every {interval} second(s)")
    print("Press Ctrl+C to stop the script")
    
    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    try:
        sock.bind(('', source_port))
        
        while True:
            send_dns_query(sock, domain)
            print("\n")  # Add a newline for better readability between queries
            time.sleep(interval)
    except KeyboardInterrupt:
        print("\nScript terminated by user")
    finally:
        sock.close()

# Example usage
if __name__ == "__main__":
    domain_to_query = "example.com"
    continuous_dns_resolution(domain_to_query)

To sumarize :

Under some conditions The kube-proxy doesn't update the conntrack table if the udp and the same source port are used.

from kubernetes.

samof76 commented on July 24, 2024

@aojea agree with your suggestion on draining nodes safely, but in cases where the node fails, which happened on this issue log, it is the kube-proxy that should respond to such failure, dropping and recreating the conntrack entries appropriately.

from kubernetes.

aojea commented on July 24, 2024

@aojea agree with your suggestion on draining nodes safely, but in cases where the node fails, which happened on this issue log, it is the kube-proxy that should respond to such failure, dropping and recreating the conntrack entries appropriately.

the bug is legit, but I think that is hard to hit, what happened in this issue description is that the user is manually forcing this scenarios that is known to fail and documented in #122740

from kubernetes.

alexku7 commented on July 24, 2024

@aojea agree with your suggestion on draining nodes safely, but in cases where the node fails, which happened on this issue log, it is the kube-proxy that should respond to such failure, dropping and recreating the conntrack entries appropriately.

the bug is legit, but I think that is hard to hit, what happened in this issue description is that the user is manually forcing this scenarios that is known to fail and documented in #122740

In our case the node went to not ready state because of the overload.
So the kubeproxy has stuck for some time.
The core dns has been evicted.

After a while the node recovered but entire cluster became almost malfunctioned because of the dns issue
So we had to restart many pods where affected by that udp bug

from kubernetes.

aojea commented on July 24, 2024

In our case the node went to not ready state because of the overload.
So the kubeproxy has stuck for some time.

this is interesting ... why kube-proxy got stuck?

from kubernetes.

alexku7 commented on July 24, 2024

In our case the node went to not ready state because of the overload.

So the kubeproxy has stuck for some time.

this is interesting ... why kube-proxy got stuck?

The whole node got stuck because of memory overloading.

It's a bit complicated to reproduce in the lab, but happened twice in two different eks clusters.
Aws support recommended us to give more memory to the kubeReserved as a mitigation.

from kubernetes.

balu-ce commented on July 24, 2024

you need to provide logs and timing of the events, run kube-proxy with -v4 per example

I0625 13:13:20.351149       1 proxier.go:796] "Syncing iptables rules"
I0625 13:13:20.351454       1 iptables.go:358] "Running" command="iptables-save" arguments=["-t","nat"]
I0625 13:13:20.355238       1 proxier.go:1504] "Reloading service iptables data" numServices=40 numEndpoints=52 numFilterChains=6 numFilterRules=8 numNATChains=8 numNATRules=45
I0625 13:13:20.355259       1 iptables.go:423] "Running" command="iptables-restore" arguments=["-w","5","-W","100000","--noflush","--counters"]
I0625 13:13:20.359194       1 proxier.go:1533] "Network programming" endpoint="kube-system/kube-dns" elapsed=0.35915983
I0625 13:13:20.359258       1 cleanup.go:63] "Deleting conntrack stale entries for services" IPs=[]
I0625 13:13:20.359280       1 cleanup.go:69] "Deleting conntrack stale entries for services" nodePorts=[]
I0625 13:13:20.359306       1 conntrack.go:66] "Clearing conntrack entries" parameters=["-D","--orig-dst","172.20.0.10","--dst-nat","10.240.244.119","-p","udp"]
I0625 13:13:20.361710       1 conntrack.go:71] "Conntrack entries deleted" output=<
	conntrack v1.4.4 (conntrack-tools): 17 flow entries have been deleted.
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=55850 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=55850 mark=0 use=1
	udp      17 3 src=10.240.244.34 dst=172.20.0.10 sport=59350 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=59350 mark=0 use=1
	udp      17 3 src=10.240.244.34 dst=172.20.0.10 sport=59716 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=59716 mark=0 use=1
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=42887 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=42887 mark=0 use=1
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=41925 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=41925 mark=0 use=1
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=40972 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=40972 mark=0 use=1
	udp      17 3 src=10.240.244.34 dst=172.20.0.10 sport=51492 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=51492 mark=0 use=2
	udp      17 16 src=10.240.244.58 dst=172.20.0.10 sport=55964 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.58 sport=53 dport=55964 mark=0 use=1
	udp      17 3 src=10.240.244.34 dst=172.20.0.10 sport=48529 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=48529 mark=0 use=1
	udp      17 3 src=10.240.244.34 dst=172.20.0.10 sport=43667 dport=53 src=10.240.244.119 dst=10.240.244.34 sport=53 dport=43667 mark=0 use=1
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=51934 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=51934 mark=0 use=1
	udp      17 5 src=10.240.244.154 dst=172.20.0.10 sport=35714 dport=53 src=10.240.244.119 dst=10.240.244.154 sport=53 dport=35714 mark=0 use=1
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=58990 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=58990 mark=0 use=1
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=40587 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=40587 mark=0 use=1
	udp      17 19 src=10.240.244.140 dst=172.20.0.10 sport=48436 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.140 sport=53 dport=48436 mark=0 use=1
	udp      17 23 src=10.240.244.34 dst=172.20.0.10 sport=40050 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.34 sport=53 dport=40050 mark=0 use=1
	udp      17 16 src=10.240.244.191 dst=172.20.0.10 sport=40033 dport=53 [UNREPLIED] src=10.240.244.119 dst=10.240.244.191 sport=53 dport=40033 mark=0 use=1

@aojea
if you need anything more, pls let us know

from kubernetes.

aojea commented on July 24, 2024

/assign

thanks,

from kubernetes.

andrewtagg-db commented on July 24, 2024

Reviewing this issue since we still see impact occasionally, in the most recent incidence of this we found and deleted this conntrack entry for a destination service with ClusterIP 192.168.0.10:

conntrack -L | grep 192.168.0.10 | grep UNREPLIED
udp      17 29 src=10.120.1.150 dst=192.168.0.10 sport=49660 dport=53 [UNREPLIED] src=192.168.0.10 dst=10.120.1.150 sport=53 dport=49660 mark=0 use=1

It looks like DNAT isn't set up. After deleting it the next udp request succeeded and requests to the ClusterIP began working again.

Reviewing some related issues, I found the following PR which seems like it would resolve our case: #122741 (related to #122740).

from kubernetes.

Conntrack tables having stale entries for UDP connection about kubernetes HOT 20 OPEN

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent