Giter Club home page Giter Club logo

Comments (5)

zmrow avatar zmrow commented on May 24, 2024

Thanks for the report @Shershebnev !

Are any errors reported on the instance via the console using "Get system log", "Get instance screenshot", or "EC2 serial console"? I suspect that some additional user data and/or roles may be needed for the instance. I'm looking back over the changelogs to confirm this suspicion and explain the differences between k8s versions.

from bottlerocket.

Shershebnev avatar Shershebnev commented on May 24, 2024

Log is empty, but screenshot shows some encryption error
i-010cd92721081bc01
That's on Intel-based instances (m6i.4xlarge)
I've also tried AMD-based instance (m6a.4xlarge), it seems to be stuck on booting
i-09bdddf62b644452c
I've also tried the oldest ami I can see - 1.13.0 (bottlerocket-aws-k8s-1.26-x86_64-v1.13.0-f7a2e3cc) and it works fine even though it gives the same error about encryption, still it proceeds further and appears in ssm almost immediately.
Yet 1.14.0 gets stuck
So at this point I've realized I actually have nodes in EKS that I've switched to bottlerocket and they work fine on the latest ami for 1.26 but the nvidia version bottlerocket-aws-k8s-1.26-nvidia-x86_64-v1.15.1-264e294c, they appear in ssm as well. The only differences I could see are /dev/xvda root volume size (4 gb vs 2 gb) and eks nodes being on nvidia version. I've changed both and it seems to go past encryption error with such setup but then still got stuck
i-0ce93f121a3bf8b3a
And after some more waiting I got a system log ending with

[  305.391718] sundog[1858]: Setting generator 'pluto private-dns-name' failed with exit code 1 - stderr: Timed out retrieving private DNS name from EC2: deadline has elapsed
[FAILED] Failed to start User-specified setting generators.
See 'systemctl status sundog.service' for details.
[DEPEND] Dependency failed for Bottlerocket initial configuration complete.
[DEPEND] Dependency failed for Isolates configured.target.
[DEPEND] Dependency failed for Applies settings to create config files.
[DEPEND] Dependency failed for Send signal to CloudFormation Stack.
[DEPEND] Dependency failed for Sets the hostname.

i-0ce93f121a3bf8b3a.log
I can confirm that on this VPC DNS resolution is enabled.
There seem to be related issue #3064 however my failing instances are in public subnet so doesn't seem to be caused by what they had going on in the issue. However my EKS nodes which seem to work fine are in the private subnets.

This turned into quite a long post, sorry about that. In a nutshell:

  • When starting in public subnet as standalone instances:
    • Ami version 1.13.0 seems to work fine and appear almost immediately in SSM even with 2 GB root volume.
    • Ami version 1.14.0 and beyond (including latest version) seems to get stuck either on encryption error or, when increasing root volume to 4 GB, gets stuck for several minutes to arrive to DNS resolution error from the log above.
  • However in EKS when starting nodes in private subnets everything seems to work fine (still can see the encryption error though), here they start with 4 GB (I also find it strange that default root volume size seems to be different as I don't specify root volume size in EKS explicitly)

Hope this is helpful :)

from bottlerocket.

yeazelm avatar yeazelm commented on May 24, 2024

Related to #3525 (comment) I think we might need to add in EC2 Describe Images access to the IAM Role policies attached in https://github.com/aws-samples/containers-blog-maelstrom/blob/ee8e18c0bb170f625b86a59dfc0605e9c98cdee3/bottlerocket-images-cache/ebs-snapshot-instance.yaml#L44. For example, I have AmazonEKSWorkerNodePolicy attached with:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeInstances",
                "ec2:DescribeInstanceTypes",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVolumes",
                "ec2:DescribeVolumesModifications",
                "ec2:DescribeVpcs",
                "eks:DescribeCluster"
            ],
            "Resource": "*"
        }
    ]
}

as the policy. This might be the missing piece. Can you try this and see if it resolves the issues with 1.26 coming up? If so, we can try and get this other repo updated to cover this permissions addition.

from bottlerocket.

Shershebnev avatar Shershebnev commented on May 24, 2024

I've tried with AmazonEC2ReadOnlyAccess AWS managed policy, everything works now on latest 1.26 🎉

from bottlerocket.

yeazelm avatar yeazelm commented on May 24, 2024

Sounds great! Glad we got you sorted!

from bottlerocket.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.