Comments (21)
Hello @gthao313,
Thanks for looking into this!
I can answer the questions on behalf of @nike21oct .
Bottlerocket AMI: Bottlerocket OS 1.18.0 (aws-k8s-1.24)
Instance type: m6i.2xlarge
EKS cluster version: 1.24
from bottlerocket.
@nike21oct Thanks for opening this issue! would mind sharing more details with us so we can try to reproduce this issue?
what bottlerocket AMI you are using?
What is you instance type?
What is you EKS cluster version
Thanks!
from bottlerocket.
hi @gthao313 did you get a chance to simulate the issue?
from bottlerocket.
@nike21oct sorry for the late reply. I still try to reproduce this issue. I'll let you know if I need more infos from you or I complete the reproduce.
from bottlerocket.
hi @gthao313 , did you get a chance to simulate the issue?
from bottlerocket.
hi @gthao313 , did you get a chance to simulate the issue?
from bottlerocket.
@nike21oct I wasn't able to reproduce this issue last week. I am working on another priority item now. I'll work on this issue as my next priority. Thank you! Pleas let me know if you have any concern.
from bottlerocket.
@gthao313 thanks for the response , I am waiting for your input.
from bottlerocket.
@nike21oct - I haven't had a chance to dig into this and try to repro, but the behavior you're seeing is sufficiently unexpected that there may be something deeper going on beyond just a missing input rule.
What distro base image are you using for your bootstrap container to run the iptables
commands? If it's a newer distro like AL23 that defaults to the "nftables" backend for iptables, that could account for the behavior you're seeing. Bottlerocket uses the "legacy" backend for iptables, and the two backends don't mix. kube-proxy has logic to detect which backend is in use, but if the system is in a state where both backends were used, it wouldn't be able to correct that.
AL2 defaults to the "legacy" backend, so you could try that. Otherwise, some distros offer a way to switch between backends - Debian's iptables wiki shows some steps for that distro.
from bottlerocket.
hi @bcressey , I followed below github link to create my bootstrap container and as per this documentation they are using alpine as a base image in Docker File
from bottlerocket.
I don't see a link, but if I check alpine:latest
they are using iptables with the "nftables" backend:
/ # apk add iptables
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/main/aarch64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.19/community/aarch64/APKINDEX.tar.gz
(1/4) Installing libmnl (1.0.5-r2)
(2/4) Installing libnftnl (1.2.6-r0)
(3/4) Installing libxtables (1.8.10-r3)
(4/4) Installing iptables (1.8.10-r3)
Executing busybox-1.36.1-r15.trigger
OK: 16 MiB in 19 packages
/ # iptables -V
iptables v1.8.10 (nf_tables)
If you use a Debian base image instead, you can switch iptables to the "legacy" backend:
β― docker run -it --rm -u 0 debian:bookworm-slim
root@01a8f51b742c:/# apt-get update
...
Reading package lists... Done
root@01a8f51b742c:/# apt-get install iptables
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libbsd0 libedit2 libip4tc2 libip6tc2 libjansson4 libmnl0 libnetfilter-conntrack3 libnfnetlink0 libnftables1 libnftnl11 libxtables12 netbase nftables
Suggested packages:
firewalld kmod
The following NEW packages will be installed:
iptables libbsd0 libedit2 libip4tc2 libip6tc2 libjansson4 libmnl0 libnetfilter-conntrack3 libnfnetlink0 libnftables1 libnftnl11 libxtables12 netbase nftables
0 upgraded, 14 newly installed, 0 to remove and 0 not upgraded.
Need to get 1144 kB of archives.
After this operation, 11.3 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
...
root@01a8f51b742c:/# iptables -V
iptables v1.8.9 (nf_tables)
root@01a8f51b742c:/# update-alternatives --set iptables /usr/sbin/iptables-legacy
update-alternatives: using /usr/sbin/iptables-legacy to provide /usr/sbin/iptables (iptables) in manual mode
root@01a8f51b742c:/# update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives: using /usr/sbin/ip6tables-legacy to provide /usr/sbin/ip6tables (ip6tables) in manual mode
root@01a8f51b742c:/# iptables -V
iptables v1.8.9 (legacy)
root@01a8f51b742c:/# ip6tables -V
ip6tables v1.8.9 (legacy)
This could be done at build time for your bootstrap container.
from bottlerocket.
hi @bcressey , I forgot to paste the link , below is the link of that documentation and github link of the docker file and bootstrap container script .
*******AWS documention for implementing CIS benchmark
https://aws.amazon.com/blogs/containers/validating-amazon-eks-optimized-bottlerocket-ami-against-the-cis-benchmark/
********Github link for bootstrap container ***********
https://github.com/aws-samples/containers-blog-maelstrom/tree/main/cis-bottlerocket-benchmark-eks/bottlerocket-cis-bootstrap-image
But I have a question what makes it difference using apline or debian as a base image in docker file except only the term legacy backend , my issue is that ip tables rules are implementing as per the bootstrap script and it block all traffic by default as specifies by command in script but I am using nginx as in ingrees controller in my EKS cluster which is using creating NLB on AWS cloud and using nodeport to communicate with the target group and target is my worker node which is EC2 instance and when I am allowing nodeport in my iptables rule it is not working as it should be.
********** block all traffic as per the rule specify in bootstrap script ***********
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP
Allowing nodeport of nginx ingress controlleer****
iptables -I INPUT -p tcp -m tcp --dport 32443 -j ACCEPT # For TLS traffic
iptables -I INPUT -p tcp -m tcp --dport 32002 -j ACCEPT # For Health Checks
iptables -I INPUT -p tcp -m tcp --dport 32080 -j ACCEPT
so will it make difference using alpine or debain as a base image in bootstrap container and how?
from bottlerocket.
I am using nginx as in ingrees controller in my EKS cluster which is using creating NLB on AWS cloud and using nodeport to communicate with the target group and target is my worker node which is EC2 instance and when I am allowing nodeport in my iptables rule it is not working as it should be.
The iptables -P <X> DROP
set a default drop behavior for the chain rather than a default allow. The important point there is that this is just a default; it only happens if nothing matches a rule.
kube-proxy
will populate the chain with rules for node port services. If the system is working properly, these rules will take precedence over the default behavior - whether that's allow or deny.
iptables -L KUBE-NODEPORTS -n -v -t nat
should show these node port rules.
Will it make difference using alpine or debain as a base image in bootstrap container and how?
Yes for the reasons I mentioned above. The iptables
command in the bootstrap container image needs to be using the "legacy" backend. Both Alpine and Debian now default to the "nftables" backend. Debian can be set to use the "legacy" backend by running additional commands - the update-alternatives
commands I showed.
If you use the "wrong" backend to configure the default behavior, then the system ends up in a confused state. The "nftables" backend in the kernel will only know about the default drop rule, and "legacy" backend will know about the node port rules.
You don't need to add node port rules to the bootstrap container, and I don't recommend that. The only rules you need are to allow access that nothing else would enable automatically - like SSH on 22/TCP, or kubelet on 10250/TCP.
If you fix the iptables
command you're running in the bootstrap container to use the legacy backend, that should clear up the majority of problems you're seeing. Beyond that, you can allow traffic to kubelet
if you want kubectl exec
and kubectl logs
to work, or to kube-proxy
on 10249/TCP if you want to scrape its metrics.
from bottlerocket.
hi @bcressey , thanks for your continuous response, your suggestion to use debian as a base image and cmd "update-alternatives" in docker file makes the pod of nginx ingress controller in running state but there is other issue which is that target (worker node) is not in healthy state , i have three targets in the target group and health status of these targets is unhealthy and I observe one more thing is that for some time health status came to healthy for one or two of the target instance and after some time it went unhealthy , so it means the target health is not persistent it comes healthy and then become unhealthy
so to troubleshoot this I open port 10254 in iptable rules (this port is used by ingress-nginx-controller-metrics with node port 32002) and 10249 port which is suggested by you . Below is my IPtable script which is used by bootstrap container.
#!/usr/bin/env bash
# Flush iptables rules
iptables -F
# 3.4.1.1 Ensure IPv4 default deny firewall policy (Automated)
iptables -P INPUT DROP
iptables -P OUTPUT DROP
iptables -P FORWARD DROP
# Allow inbound traffic for kubelet (so kubectl logs/exec works)
iptables -I INPUT -p tcp -m tcp --dport 10250 -j ACCEPT
**#These two rules I added from myself**
**iptables -I INPUT -p tcp -m tcp --dport 10254 -j ACCEPT**
**iptables -I INPUT -p tcp -m tcp --dport 10249 -j ACCEPT**
# 3.4.1.2 Ensure IPv4 loopback traffic is configured (Automated)
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
iptables -A INPUT -s 127.0.0.0/8 -j DROP
# 3.4.1.3 Ensure IPv4 outbound and established connections are configured (Manual)
iptables -A OUTPUT -p tcp -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p udp -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A OUTPUT -p icmp -m state --state NEW,ESTABLISHED -j ACCEPT
iptables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT
iptables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT
iptables -A INPUT -p icmp -m state --state ESTABLISHED -j ACCEPT
# Flush ip6tables rules
ip6tables -F
# 3.4.2.1 Ensure IPv6 default deny firewall policy (Automated)
ip6tables -P INPUT DROP
ip6tables -P OUTPUT DROP
ip6tables -P FORWARD DROP
# Allow inbound traffic for kubelet on ipv6 if needed (so kubectl logs/exec works)
ip6tables -A INPUT -p tcp --destination-port 10250 -j ACCEPT
ip6tables -A INPUT -p tcp -m tcp --destination-port 10254 -j ACCEPT
ip6tables -A INPUT -p tcp -m tcp --destination-port 10249 -j ACCEPT
# 3.4.2.2 Ensure IPv6 loopback traffic is configured (Automated)
ip6tables -A INPUT -i lo -j ACCEPT
ip6tables -A OUTPUT -o lo -j ACCEPT
ip6tables -A INPUT -s ::1 -j DROP
# 3.4.2.3 Ensure IPv6 outbound and established connections are configured (Manual)
ip6tables -A OUTPUT -p tcp -m state --state NEW,ESTABLISHED -j ACCEPT
ip6tables -A OUTPUT -p udp -m state --state NEW,ESTABLISHED -j ACCEPT
ip6tables -A OUTPUT -p icmp -m state --state NEW,ESTABLISHED -j ACCEPT
ip6tables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT
ip6tables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT
ip6tables -A INPUT -p icmp -m state --state ESTABLISHED -j ACCEPT
Please need your guidance on this also.
from bottlerocket.
I changed the iptables backend from ngtables to legacy for both bootstrap and validating containers with the PR aws-samples/containers-blog-maelstrom#116
from bottlerocket.
hi @bcressey , thanks for your continuous response, your suggestion to use debian as a base image and cmd "update-alternatives" in docker file makes the pod of nginx ingress controller in running state but there is other issue which is that target (worker node) is not in healthy state , i have three targets in the target group and health status of these targets is unhealthy and I observe one more thing is that for some time health status came to healthy for one or two of the target instance and after some time it went unhealthy , so it means the target health is not persistent it comes healthy and then become unhealthy
so to troubleshoot this I open port 10254 in iptable rules (this port is used by ingress-nginx-controller-metrics with node port 32002) and 10249 port which is suggested by you . Below is my IPtable script which is used by bootstrap container.
#!/usr/bin/env bash # Flush iptables rules iptables -F # 3.4.1.1 Ensure IPv4 default deny firewall policy (Automated) iptables -P INPUT DROP iptables -P OUTPUT DROP iptables -P FORWARD DROP # Allow inbound traffic for kubelet (so kubectl logs/exec works) iptables -I INPUT -p tcp -m tcp --dport 10250 -j ACCEPT **#These two rules I added from myself** **iptables -I INPUT -p tcp -m tcp --dport 10254 -j ACCEPT** **iptables -I INPUT -p tcp -m tcp --dport 10249 -j ACCEPT** # 3.4.1.2 Ensure IPv4 loopback traffic is configured (Automated) iptables -A INPUT -i lo -j ACCEPT iptables -A OUTPUT -o lo -j ACCEPT iptables -A INPUT -s 127.0.0.0/8 -j DROP # 3.4.1.3 Ensure IPv4 outbound and established connections are configured (Manual) iptables -A OUTPUT -p tcp -m state --state NEW,ESTABLISHED -j ACCEPT iptables -A OUTPUT -p udp -m state --state NEW,ESTABLISHED -j ACCEPT iptables -A OUTPUT -p icmp -m state --state NEW,ESTABLISHED -j ACCEPT iptables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT iptables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT iptables -A INPUT -p icmp -m state --state ESTABLISHED -j ACCEPT # Flush ip6tables rules ip6tables -F # 3.4.2.1 Ensure IPv6 default deny firewall policy (Automated) ip6tables -P INPUT DROP ip6tables -P OUTPUT DROP ip6tables -P FORWARD DROP # Allow inbound traffic for kubelet on ipv6 if needed (so kubectl logs/exec works) ip6tables -A INPUT -p tcp --destination-port 10250 -j ACCEPT ip6tables -A INPUT -p tcp -m tcp --destination-port 10254 -j ACCEPT ip6tables -A INPUT -p tcp -m tcp --destination-port 10249 -j ACCEPT # 3.4.2.2 Ensure IPv6 loopback traffic is configured (Automated) ip6tables -A INPUT -i lo -j ACCEPT ip6tables -A OUTPUT -o lo -j ACCEPT ip6tables -A INPUT -s ::1 -j DROP # 3.4.2.3 Ensure IPv6 outbound and established connections are configured (Manual) ip6tables -A OUTPUT -p tcp -m state --state NEW,ESTABLISHED -j ACCEPT ip6tables -A OUTPUT -p udp -m state --state NEW,ESTABLISHED -j ACCEPT ip6tables -A OUTPUT -p icmp -m state --state NEW,ESTABLISHED -j ACCEPT ip6tables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT ip6tables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT ip6tables -A INPUT -p icmp -m state --state ESTABLISHED -j ACCEPTPlease need your guidance on this also.
hi @bcressey ,
Any idea how i can debug on this part? Need your help on this part also
from bottlerocket.
How do I resolve a failed health check for a load balancer in Amazon EKS? might have some useful steps to try.
Beyond that - if you comment out the "default drop" iptables commands in your bootstrap script and the health checks start passing, then that indicates another port needs to be opened. Possibly 80 or 443 if those are the target ports for your service.
from bottlerocket.
hi @bcressey , yes after comment out "default drop" the health checks start passing .
Then after as you know that the Network load balancer checks health check on port 32002 which is mapped to port 10254 , so I opened port 10254 which is target port not 80 or 443 and did uncomment the "default drop" to check and i observe that health start getting failed again ,
Below is the port definition in service yaml
ports:
- name: metrics
nodePort: 32002
port: 10254
protocol: TCP
targetPort: metrics
iptables -I INPUT -p tcp -m tcp --dport 10254 -j ACCEPT
Another question why health check is getting passed for some time and failed after some time for targets it means it is not persistance , why this kind of behaviour ?
from bottlerocket.
hi @bcressey , any idea or your input on this?
from bottlerocket.
According to aws-samples/containers-blog-maelstrom#73 you are also using Cilium CNI which I am less familiar with in an AWS context.
Migrating Cilium from Legacy iptables Routing to Native eBPF Routing in Production has this quote from Cilium release notes:
We introduced eBPF-based host-routing in Cilium 1.9 to fully bypass iptables and the upper host stack, and to achieve a faster network namespace switch compared to regular veth device operation.
A note on Cilium's iptables usage says:
To my surprise, cilium doesnβt periodically synchronize those rules like kube-proxy. If you somehow remove a rule in its custom chain, you have to add it back manually or restart cilium-agent.
So there are two avenues you should explore. If you're using the native eBPF based routing with Cilium, then you may not have any iptables rules related to Cilium at all. However, this might mean that the kernel will apply default-drop to any packets even if Cilium knows about them. In that case my conclusion would be that setting iptables to default-drop isn't compatible with using Cilium in this mode, and you just shouldn't combine them in this way. My advice would be to document it as an exception for compliance purposes and move on.
If you're using the legacy iptables based routing with Cilium, then you should have all the necessary iptables rules. However, if it's the case that they aren't reapplied periodically, and if it's also the case that you're changing iptables
settings on existing nodes, then that may be erasing the rules that Cilium installed. This could also be a factor in the native eBPF based routing mode, if Cilium is meant to put fallback iptables rules in place.
from bottlerocket.
Related Issues (20)
- Bottlerocket merging bootstrap_extra_args are adding extra quotes when using the official eks terraform module HOT 7
- New setting under `settings.container-runtime` for configuring (`stargz`/`soci`) snapshotter for lazy image pulling HOT 2
- Add additional ECS configuration values to `settings.ecs` HOT 3
- ecr-credential-provider: use custom AWS_PROFILE HOT 4
- Bottlerocket node intermittently fails to start with "[FAILED] Failed to start Wait for Network to be Configured." HOT 13
- v1.19.0 update CHANGELOG
- Remove metal and vmware k8s 1.24 variants by Feb 2024
- v1.19.0 π¦ Tracking Issue HOT 1
- Missing cAdvisor metrics HOT 2
- Setting to control bottlerocket host cgroup cpu allocation HOT 2
- v1.19.0 update eni-max-pods mapping file
- v1.19.0 Host container updates HOT 1
- v1.19.0 Go dependency updates
- Sandbox container image being GC'd in 1.29 HOT 8
- Specify autoloaded kernel module options via settings. HOT 4
- Update ECS agent to v1.81.0 and Docker to v25
- update to glibc 2.39
- v1.19.1 π Tracking Issue HOT 2
- Issue with Bottlerocket image HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bottlerocket.