Giter Club home page Giter Club logo

Comments (14)

squeed avatar squeed commented on June 26, 2024 1

That is indeed interesting. We should probably log enough information to catch this in debug mode. If you like, you can enable debug mode globally in your cluster (set debug: true in your helm values) or on a per-node basis using per-node configuration overrides.

from cilium.

squeed avatar squeed commented on June 26, 2024 1

Don't use kube-apiserver (and other entities?) in NetworkPolicies because they might not be stable 100% of the time

No, the kube-apiserver entity works well. The bigger issue is: be careful when using Host Firewall. Specifically, how the kube-apiserver is defined must not rely on the kube-apiserver always being up.

I'd currently consider dropping all kube-apiserver rules in favor of toFQDNs. This should work without policy-cidr-match-mode=nodes because we run the apiserver outside of the cluster, right?

ToFQDNs relies on layer 7 interception, which does not work for host firewall. You really have to allow access to the apiserver by 100% static means right now.

from cilium.

lmb avatar lmb commented on June 26, 2024

Thanks for the report. Not having a sysdump is going to make it harder for us to diagnose unfortunately. Can you reproduce this if you upgrade to the latest point release?

from cilium.

baurmatt avatar baurmatt commented on June 26, 2024

Hey @lmb, thanks for you answer! As mentioned, I've got quite few sysdumps. I just can't share them publicly. Should I share a link with you by mail?

We will attempt the latest point release update soon, but have little hope since this issue persistent for quite some time now.

from cilium.

lmb avatar lmb commented on June 26, 2024

I just can't share them publicly. Should I share a link with you by mail?

Sorry but we can't take responsibility for your confidential data. If that is something you need consider one of the enterprise vendors.

from cilium.

baurmatt avatar baurmatt commented on June 26, 2024

OK. I've attached one. Please let me know when you don't need it anymore. I will deleted it afterwards.

Edit: I've also me able to rollout the .dot version by now. Took only a couple of minutes for the first node to went into NotReady state. Seems like this didn't solve this bug.

from cilium.

squeed avatar squeed commented on June 26, 2024

This is a scary chicken-and-egg problem; the agent needs to be able to determine which IPs belong to the apiserver, and to do that, it needs access to the apiserver.

I would suggest enabling matching nodes via CIDR and then allowing access to the set of possible apiserver IPs.

from cilium.

baurmatt avatar baurmatt commented on June 26, 2024

@squeed Thanks for you answer! I think that would be an option for when the pods are restarting, but I'd like to understand why those nodes loose the kube-apiserver identity in the first place. Those pods only restart, at least by my interpretation, because Cilium looses the kube-apiserver <-> CIDR mapping.

from cilium.

baurmatt avatar baurmatt commented on June 26, 2024

@squeed Got some debug logs while the error showed up 🥳

cilium-debug.log.zip

I think the relevant parts are:

$ ggrep -B2 -P '(^((?!EndpointSelector).)*kube-apiserver|Kubernetes service definition changed)' cilium-debug.log
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Processing 0 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="EndpointSlice kubernetes has 0 backends" subsys=k8s
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints= k8sNamespace=default k8sSvcName=kubernetes old-endpoints="109.68.224.35:30153/TCP" old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
--
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Upserting IP into ipcache layer" identity="{16777231 custom-resource [] false true}" ipAddr=109.68.224.35/32 key=0 subsys=ipcache
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777231 custom-resource [] false true}" ipAddr="{109.68.224.35 ffffffff}" modification=Upsert subsys=datapath-ipcache
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="UpdateIdentities: Deleting identity" identity=16777339 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:104.0.0.0/5 cidr:108.0.0.0/6 cidr:108.0.0.0/7 cidr:109.0.0.0/8 cidr:109.0.0.0/9 cidr:109.64.0.0/10 cidr:109.64.0.0/11 cidr:109.64.0.0/12 cidr:109.64.0.0/13 cidr:109.68.0.0/14 cidr:109.68.0.0/15 cidr:109.68.0.0/16 cidr:109.68.128.0/17 cidr:109.68.192.0/18 cidr:109.68.224.0/19 cidr:109.68.224.0/20 cidr:109.68.224.0/21 cidr:109.68.224.0/22 cidr:109.68.224.0/23 cidr:109.68.224.0/24 cidr:109.68.224.0/25 cidr:109.68.224.0/26 cidr:109.68.224.32/27 cidr:109.68.224.32/28 cidr:109.68.224.32/29 cidr:109.68.224.32/30 cidr:109.68.224.34/31 cidr:109.68.224.35/32 cidr:64.0.0.0/2 cidr:96.0.0.0/3 cidr:96.0.0.0/4 reserved:kube-apiserver reserved:world]" subsys=policy
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Processing 1 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="EndpointSlice kubernetes has 1 backends" subsys=k8s
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints="109.68.224.35:30153/TCP" k8sNamespace=default k8sSvcName=kubernetes old-endpoints= old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Acquired service ID" backends="[109.68.224.35:30153]" l7LBFrontendPorts="[]" l7LBProxyPort=0 loadBalancerSourceRanges="[]" serviceID=145 serviceIP="{192.168.129.244 {TCP 30153} 0}" serviceName=kubernetes serviceNamespace=default sessionAffinity=false sessionAffinityTimeout=0 subsys=service svcExtTrafficPolicy=Cluster svcHealthCheckNodePort=0 svcIntTrafficPolicy=Cluster svcType=NodePort
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Deleting backends from session affinity match" backends="[]" serviceID=145 subsys=service
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Resolving identity" identityLabels="cidr:109.68.224.35/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="UpdateIdentities: Adding a new identity" identity=16777232 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:104.0.0.0/5 cidr:108.0.0.0/6 cidr:108.0.0.0/7 cidr:109.0.0.0/8 cidr:109.0.0.0/9 cidr:109.64.0.0/10 cidr:109.64.0.0/11 cidr:109.64.0.0/12 cidr:109.64.0.0/13 cidr:109.68.0.0/14 cidr:109.68.0.0/15 cidr:109.68.0.0/16 cidr:109.68.128.0/17 cidr:109.68.192.0/18 cidr:109.68.224.0/19 cidr:109.68.224.0/20 cidr:109.68.224.0/21 cidr:109.68.224.0/22 cidr:109.68.224.0/23 cidr:109.68.224.0/24 cidr:109.68.224.0/25 cidr:109.68.224.0/26 cidr:109.68.224.32/27 cidr:109.68.224.32/28 cidr:109.68.224.32/29 cidr:109.68.224.32/30 cidr:109.68.224.34/31 cidr:109.68.224.35/32 cidr:64.0.0.0/2 cidr:96.0.0.0/3 cidr:96.0.0.0/4 reserved:kube-apiserver reserved:world]" subsys=policy
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Short circuiting HTTP rules due to rule allowing all and no other rules needing attention" subsys=envoy-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="preparing new cache transaction: upserting 1 entries, deleting 0 entries" subsys=xds xdsCachedVersion=16 xdsTypeURL=type.googleapis.com/cilium.NetworkPolicy
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Resolving identity" identityLabels="cidr:109.68.224.35/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Waiting for proxy updates to complete..." subsys=endpoint-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Wait time for proxy updates: 56.608µs" subsys=endpoint-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Upserting IP into ipcache layer" identity="{16777232 kube-apiserver [] false true}" ipAddr=109.68.224.35/32 key=0 subsys=ipcache
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777232 kube-apiserver [] false true}" ipAddr="{109.68.224.35 ffffffff}" modification=Upsert subsys=datapath-ipcache
--
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="Processing 0 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="EndpointSlice kubernetes has 0 backends" subsys=k8s
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints= k8sNamespace=default k8sSvcName=kubernetes old-endpoints="109.68.224.35:30153/TCP" old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher

The kubernetes service in the default namespace looses it's endpoint and thus Cilium reacts on it. The reason for this to happen seems to be a rolling restart (or similar) of the apiserver deployment (2 pods, clean restart of the deployment). On startup or shutdown it updates the endpoints of the kubernetes service for a short amount of time to <none>:

$ kubectl get endpoints kubernetes -n kube-system -w
NAME         ENDPOINTS             AGE
kubernetes   109.68.224.35:30153   94d


kubernetes   <none>                94d
kubernetes   109.68.224.35:30153   94d
kubernetes   <none>                94d
kubernetes   109.68.224.35:30153   94d

from cilium.

baurmatt avatar baurmatt commented on June 26, 2024

This is a core functionality from Kubernetes. It removes the endpoint from the kubernetes on apiserver shutdown -> https://github.com/kubernetes/kubernetes/blob/fad52aedfcc14061cf20370be061789b4f3d97d9/pkg/controlplane/controller/kubernetesservice/controller.go#L130

from cilium.

squeed avatar squeed commented on June 26, 2024

Exactly right.

There's not much we can do in this scenario. If you would like to restrict access in this manner, you will need to set up something like a load balancer to ensure the apiserver has a consistent IP.

from cilium.

baurmatt avatar baurmatt commented on June 26, 2024

@squeed That is a static IP provided by a Load Balancer in front of a Managed Kubernetes cluster thus the apiserver component is run separately with the cloud provider. The apiserver uses --advertise-address to correctly announce the IP address.

from cilium.

squeed avatar squeed commented on June 26, 2024

Indeed. That said, it seems that Cilium is doing the "right thing" -- or, at least what it has been told. Access is restricted to IPs know to be apiserver IPs, and that service is empty.

I could imagine a world in which we have some hysteresis for the apiserver to prevent these sorts of issues, but those are only papering over the real problem: it is too easy a Host Firewall policy to lose access to the apiserver on upgrades or failovers.

We should document this limitation.

from cilium.

baurmatt avatar baurmatt commented on June 26, 2024

Thank you for clarifying! If I interpret your answer correct, this means: Don't use kube-apiserver (and other entities?) in NetworkPolicies because they might not be stable 100% of the time - at least in the current implementation state. Is that correct?

I'd currently consider dropping all kube-apiserver rules in favor of toFQDNs. This should work without policy-cidr-match-mode=nodes because we run the apiserver outside of the cluster, right?

+1 for documenting this and/or implementing something like a cache TTL for the entity <> CIDR mapping.

from cilium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.