Comments (5)
Thanks for the report @Shershebnev !
Are any errors reported on the instance via the console using "Get system log", "Get instance screenshot", or "EC2 serial console"? I suspect that some additional user data and/or roles may be needed for the instance. I'm looking back over the changelogs to confirm this suspicion and explain the differences between k8s versions.
from bottlerocket.
Log is empty, but screenshot shows some encryption error
That's on Intel-based instances (m6i.4xlarge)
I've also tried AMD-based instance (m6a.4xlarge), it seems to be stuck on booting
I've also tried the oldest ami I can see - 1.13.0 (bottlerocket-aws-k8s-1.26-x86_64-v1.13.0-f7a2e3cc
) and it works fine even though it gives the same error about encryption, still it proceeds further and appears in ssm almost immediately.
Yet 1.14.0 gets stuck
So at this point I've realized I actually have nodes in EKS that I've switched to bottlerocket and they work fine on the latest ami for 1.26 but the nvidia version bottlerocket-aws-k8s-1.26-nvidia-x86_64-v1.15.1-264e294c
, they appear in ssm as well. The only differences I could see are /dev/xvda
root volume size (4 gb vs 2 gb) and eks nodes being on nvidia version. I've changed both and it seems to go past encryption error with such setup but then still got stuck
And after some more waiting I got a system log ending with
[ 305.391718] sundog[1858]: Setting generator 'pluto private-dns-name' failed with exit code 1 - stderr: Timed out retrieving private DNS name from EC2: deadline has elapsed
[FAILED] Failed to start User-specified setting generators.
See 'systemctl status sundog.service' for details.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Send signal to CloudFormation Stack.
[DEPEND] Dependency failed for Sets the hostname.
i-0ce93f121a3bf8b3a.log
I can confirm that on this VPC DNS resolution is enabled.
There seem to be related issue #3064 however my failing instances are in public subnet so doesn't seem to be caused by what they had going on in the issue. However my EKS nodes which seem to work fine are in the private subnets.
This turned into quite a long post, sorry about that. In a nutshell:
- When starting in public subnet as standalone instances:
- Ami version 1.13.0 seems to work fine and appear almost immediately in SSM even with 2 GB root volume.
- Ami version 1.14.0 and beyond (including latest version) seems to get stuck either on encryption error or, when increasing root volume to 4 GB, gets stuck for several minutes to arrive to DNS resolution error from the log above.
- However in EKS when starting nodes in private subnets everything seems to work fine (still can see the encryption error though), here they start with 4 GB (I also find it strange that default root volume size seems to be different as I don't specify root volume size in EKS explicitly)
Hope this is helpful :)
from bottlerocket.
Related to #3525 (comment) I think we might need to add in EC2 Describe Images access to the IAM Role policies attached in https://github.com/aws-samples/containers-blog-maelstrom/blob/ee8e18c0bb170f625b86a59dfc0605e9c98cdee3/bottlerocket-images-cache/ebs-snapshot-instance.yaml#L44. For example, I have AmazonEKSWorkerNodePolicy
attached with:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:DescribeVolumesModifications",
"ec2:DescribeVpcs",
"eks:DescribeCluster"
],
"Resource": "*"
}
]
}
as the policy. This might be the missing piece. Can you try this and see if it resolves the issues with 1.26 coming up? If so, we can try and get this other repo updated to cover this permissions addition.
from bottlerocket.
I've tried with AmazonEC2ReadOnlyAccess
AWS managed policy, everything works now on latest 1.26 🎉
from bottlerocket.
Sounds great! Glad we got you sorted!
from bottlerocket.
Related Issues (20)
- Bottlerocket merging bootstrap_extra_args are adding extra quotes when using the official eks terraform module HOT 7
- New setting under `settings.container-runtime` for configuring (`stargz`/`soci`) snapshotter for lazy image pulling HOT 2
- Add additional ECS configuration values to `settings.ecs` HOT 3
- ecr-credential-provider: use custom AWS_PROFILE HOT 4
- Bottlerocket node intermittently fails to start with "[FAILED] Failed to start Wait for Network to be Configured." HOT 11
- v1.19.0 update CHANGELOG
- Remove metal and vmware k8s 1.24 variants by Feb 2024
- v1.19.0 🦍 Tracking Issue HOT 1
- NodePort services inaccessible/blocked by iptables HOT 21
- Missing cAdvisor metrics HOT 2
- Setting to control bottlerocket host cgroup cpu allocation HOT 2
- v1.19.0 update eni-max-pods mapping file
- v1.19.0 Host container updates HOT 1
- v1.19.0 Go dependency updates
- Sandbox container image being GC'd in 1.29 HOT 8
- Specify autoloaded kernel module options via settings. HOT 4
- Update ECS agent to v1.81.0 and Docker to v25
- update to glibc 2.39
- v1.19.1 💘 Tracking Issue HOT 2
- Issue with Bottlerocket image HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bottlerocket.