Comments (14)
That is indeed interesting. We should probably log enough information to catch this in debug mode. If you like, you can enable debug mode globally in your cluster (set debug: true
in your helm values) or on a per-node basis using per-node configuration overrides.
from cilium.
Don't use kube-apiserver (and other entities?) in NetworkPolicies because they might not be stable 100% of the time
No, the kube-apiserver entity works well. The bigger issue is: be careful when using Host Firewall. Specifically, how the kube-apiserver is defined must not rely on the kube-apiserver always being up.
I'd currently consider dropping all kube-apiserver rules in favor of toFQDNs. This should work without policy-cidr-match-mode=nodes because we run the apiserver outside of the cluster, right?
ToFQDNs relies on layer 7 interception, which does not work for host firewall. You really have to allow access to the apiserver by 100% static means right now.
from cilium.
Thanks for the report. Not having a sysdump is going to make it harder for us to diagnose unfortunately. Can you reproduce this if you upgrade to the latest point release?
from cilium.
Hey @lmb, thanks for you answer! As mentioned, I've got quite few sysdumps. I just can't share them publicly. Should I share a link with you by mail?
We will attempt the latest point release update soon, but have little hope since this issue persistent for quite some time now.
from cilium.
I just can't share them publicly. Should I share a link with you by mail?
Sorry but we can't take responsibility for your confidential data. If that is something you need consider one of the enterprise vendors.
from cilium.
OK. I've attached one. Please let me know when you don't need it anymore. I will deleted it afterwards.
Edit: I've also me able to rollout the .dot version by now. Took only a couple of minutes for the first node to went into NotReady state. Seems like this didn't solve this bug.
from cilium.
This is a scary chicken-and-egg problem; the agent needs to be able to determine which IPs belong to the apiserver, and to do that, it needs access to the apiserver.
I would suggest enabling matching nodes via CIDR and then allowing access to the set of possible apiserver IPs.
from cilium.
@squeed Thanks for you answer! I think that would be an option for when the pods are restarting, but I'd like to understand why those nodes loose the kube-apiserver
identity in the first place. Those pods only restart, at least by my interpretation, because Cilium looses the kube-apiserver
<-> CIDR mapping.
from cilium.
@squeed Got some debug logs while the error showed up 🥳
I think the relevant parts are:
$ ggrep -B2 -P '(^((?!EndpointSelector).)*kube-apiserver|Kubernetes service definition changed)' cilium-debug.log
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Processing 0 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="EndpointSlice kubernetes has 0 backends" subsys=k8s
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints= k8sNamespace=default k8sSvcName=kubernetes old-endpoints="109.68.224.35:30153/TCP" old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
--
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Upserting IP into ipcache layer" identity="{16777231 custom-resource [] false true}" ipAddr=109.68.224.35/32 key=0 subsys=ipcache
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777231 custom-resource [] false true}" ipAddr="{109.68.224.35 ffffffff}" modification=Upsert subsys=datapath-ipcache
2024-05-28T12:16:38+02:00 {} time="2024-05-28T10:16:38Z" level=debug msg="UpdateIdentities: Deleting identity" identity=16777339 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:104.0.0.0/5 cidr:108.0.0.0/6 cidr:108.0.0.0/7 cidr:109.0.0.0/8 cidr:109.0.0.0/9 cidr:109.64.0.0/10 cidr:109.64.0.0/11 cidr:109.64.0.0/12 cidr:109.64.0.0/13 cidr:109.68.0.0/14 cidr:109.68.0.0/15 cidr:109.68.0.0/16 cidr:109.68.128.0/17 cidr:109.68.192.0/18 cidr:109.68.224.0/19 cidr:109.68.224.0/20 cidr:109.68.224.0/21 cidr:109.68.224.0/22 cidr:109.68.224.0/23 cidr:109.68.224.0/24 cidr:109.68.224.0/25 cidr:109.68.224.0/26 cidr:109.68.224.32/27 cidr:109.68.224.32/28 cidr:109.68.224.32/29 cidr:109.68.224.32/30 cidr:109.68.224.34/31 cidr:109.68.224.35/32 cidr:64.0.0.0/2 cidr:96.0.0.0/3 cidr:96.0.0.0/4 reserved:kube-apiserver reserved:world]" subsys=policy
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Processing 1 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="EndpointSlice kubernetes has 1 backends" subsys=k8s
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints="109.68.224.35:30153/TCP" k8sNamespace=default k8sSvcName=kubernetes old-endpoints= old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Acquired service ID" backends="[109.68.224.35:30153]" l7LBFrontendPorts="[]" l7LBProxyPort=0 loadBalancerSourceRanges="[]" serviceID=145 serviceIP="{192.168.129.244 {TCP 30153} 0}" serviceName=kubernetes serviceNamespace=default sessionAffinity=false sessionAffinityTimeout=0 subsys=service svcExtTrafficPolicy=Cluster svcHealthCheckNodePort=0 svcIntTrafficPolicy=Cluster svcType=NodePort
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Deleting backends from session affinity match" backends="[]" serviceID=145 subsys=service
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Resolving identity" identityLabels="cidr:109.68.224.35/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="UpdateIdentities: Adding a new identity" identity=16777232 labels="[cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:104.0.0.0/5 cidr:108.0.0.0/6 cidr:108.0.0.0/7 cidr:109.0.0.0/8 cidr:109.0.0.0/9 cidr:109.64.0.0/10 cidr:109.64.0.0/11 cidr:109.64.0.0/12 cidr:109.64.0.0/13 cidr:109.68.0.0/14 cidr:109.68.0.0/15 cidr:109.68.0.0/16 cidr:109.68.128.0/17 cidr:109.68.192.0/18 cidr:109.68.224.0/19 cidr:109.68.224.0/20 cidr:109.68.224.0/21 cidr:109.68.224.0/22 cidr:109.68.224.0/23 cidr:109.68.224.0/24 cidr:109.68.224.0/25 cidr:109.68.224.0/26 cidr:109.68.224.32/27 cidr:109.68.224.32/28 cidr:109.68.224.32/29 cidr:109.68.224.32/30 cidr:109.68.224.34/31 cidr:109.68.224.35/32 cidr:64.0.0.0/2 cidr:96.0.0.0/3 cidr:96.0.0.0/4 reserved:kube-apiserver reserved:world]" subsys=policy
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Short circuiting HTTP rules due to rule allowing all and no other rules needing attention" subsys=envoy-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="preparing new cache transaction: upserting 1 entries, deleting 0 entries" subsys=xds xdsCachedVersion=16 xdsTypeURL=type.googleapis.com/cilium.NetworkPolicy
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Resolving identity" identityLabels="cidr:109.68.224.35/32,reserved:kube-apiserver,reserved:world" subsys=identity-cache
--
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Waiting for proxy updates to complete..." subsys=endpoint-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Wait time for proxy updates: 56.608µs" subsys=endpoint-manager
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Upserting IP into ipcache layer" identity="{16777232 kube-apiserver [] false true}" ipAddr=109.68.224.35/32 key=0 subsys=ipcache
2024-05-28T12:16:39+02:00 {} time="2024-05-28T10:16:39Z" level=debug msg="Daemon notified of IP-Identity cache state change" identity="{16777232 kube-apiserver [] false true}" ipAddr="{109.68.224.35 ffffffff}" modification=Upsert subsys=datapath-ipcache
--
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="Processing 0 endpoints for EndpointSlice kubernetes" subsys=k8s
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="EndpointSlice kubernetes has 0 backends" subsys=k8s
2024-05-28T12:17:38+02:00 {} time="2024-05-28T10:17:38Z" level=debug msg="Kubernetes service definition changed" action=service-updated endpoints= k8sNamespace=default k8sSvcName=kubernetes old-endpoints="109.68.224.35:30153/TCP" old-service=nil service="frontends:[10.106.0.1]/ports=[https]/selector=map[]" subsys=k8s-watcher
The kubernetes
service in the default
namespace looses it's endpoint and thus Cilium reacts on it. The reason for this to happen seems to be a rolling restart (or similar) of the apiserver deployment (2 pods, clean restart of the deployment). On startup or shutdown it updates the endpoints of the kubernetes
service for a short amount of time to <none>
:
$ kubectl get endpoints kubernetes -n kube-system -w
NAME ENDPOINTS AGE
kubernetes 109.68.224.35:30153 94d
kubernetes <none> 94d
kubernetes 109.68.224.35:30153 94d
kubernetes <none> 94d
kubernetes 109.68.224.35:30153 94d
from cilium.
This is a core functionality from Kubernetes. It removes the endpoint from the kubernetes
on apiserver shutdown -> https://github.com/kubernetes/kubernetes/blob/fad52aedfcc14061cf20370be061789b4f3d97d9/pkg/controlplane/controller/kubernetesservice/controller.go#L130
from cilium.
Exactly right.
There's not much we can do in this scenario. If you would like to restrict access in this manner, you will need to set up something like a load balancer to ensure the apiserver has a consistent IP.
from cilium.
@squeed That is a static IP provided by a Load Balancer in front of a Managed Kubernetes cluster thus the apiserver component is run separately with the cloud provider. The apiserver uses --advertise-address
to correctly announce the IP address.
from cilium.
Indeed. That said, it seems that Cilium is doing the "right thing" -- or, at least what it has been told. Access is restricted to IPs know to be apiserver IPs, and that service is empty.
I could imagine a world in which we have some hysteresis for the apiserver to prevent these sorts of issues, but those are only papering over the real problem: it is too easy a Host Firewall policy to lose access to the apiserver on upgrades or failovers.
We should document this limitation.
from cilium.
Thank you for clarifying! If I interpret your answer correct, this means: Don't use kube-apiserver
(and other entities?) in NetworkPolicies because they might not be stable 100% of the time - at least in the current implementation state. Is that correct?
I'd currently consider dropping all kube-apiserver
rules in favor of toFQDNs
. This should work without policy-cidr-match-mode=nodes
because we run the apiserver outside of the cluster, right?
+1 for documenting this and/or implementing something like a cache TTL for the entity <> CIDR mapping.
from cilium.
Related Issues (20)
- Remove non-actionable config validation warning
- Residual ReplicaSets after Helm chart deployment HOT 2
- Data streams are reconnecting with Hubble UI replicas set higher than 1
- bpf: wireguard: add ENCRYPTION_STRICT_MODE to compile / complexity tests
- Remove `pkg/ebpf`
- Cilium abandons identity garbage collection if a CiliumIdentity deletion is conflicted
- dnsproxy: Transparent DNS Proxy i/o timeout with node-local-dns HOT 2
- Restart of cilium pods causes restart of hubble-relay pod and loss of metrics HOT 4
- [v1.17] bpf: remove unused logic to propagate rev_nat_index for loopback connections
- hostFirewall drops Neighbor Discovery Protocol(NDP/ICMP v6) packets between host and leaf router using Link Local Addresses(LLA)
- CI: Network performance GKE
- Directory watcher feature post work items
- Cilium perpetually out of sync within argo-cd due to spire-agent socket mount 'readOnly' HOT 2
- BGPv1 and BGPv2 - Updating a peer's ASN has no effect HOT 1
- Pod to pod packets via egress + ingress proxy are MTU dropped when IPsec is enabled HOT 1
- Connectivity test stuck at Waiting for NodePort since 1.15.5 HOT 1
- kpr: stabilize Service handling for loopback connections HOT 1
- wireguard: stabilize the encryption hook
- LB-IPAM: Add min/maxItems constraints to CiliumLoadBalancerIPPool
- Cilium v.1.15 host firewall ignores virtual IPs on the localhost (lo) interface
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cilium.