Giter Club home page Giter Club logo

Comments (9)

jigisha620 avatar jigisha620 commented on July 21, 2024

Do you happen to have more complete set of Karpenter controller logs from the time when this happened?

from karpenter-provider-aws.

qxmips avatar qxmips commented on July 21, 2024

Do you happen to have more complete set of Karpenter controller logs from the time when this happened?

not much info, except for the messages that the node got deleted

22:18:57.334 disrupting via consolidation delete, terminating 1 nodes (1 pods) ip-10-11-130-134.ec2.internal/m5a.2xlarge/on-demand

...
03:49:10.814 deleted node

image

from karpenter-provider-aws.

jigisha620 avatar jigisha620 commented on July 21, 2024

From the logs it seems like the pod that was orphaned is prod/hdr-service-app-c9cdb8dbf-w2hr2. Just wanted to confirm that the pod spec that you have shared is same for this pod since the deployment is called test in that.

from karpenter-provider-aws.

qxmips avatar qxmips commented on July 21, 2024

From the logs it seems like the pod that was orphaned is prod/hdr-service-app-c9cdb8dbf-w2hr2. Just wanted to confirm that the pod spec that you have shared is same for this pod since the deployment is called test in that.

yeah sorry. the attached log was from the original issue with a production service. but basically prod/hdr-service-app-c9cdb8dbf-w2hr2 had the same issue.
here are the logs from the reproduces issue test:

E0622 04:08:15.200303      11 gc_controller.go:154] failed to get node ip-10-11-57-209.ec2.internal : node "ip-10-11-57-209.ec2.internal" not found
I0622 04:09:15.225784      11 gc_controller.go:246] "Found orphaned Pod assigned to the Node, deleting." pod="kube-system/aws-node-d47hp" node="ip-10-11-57-209.ec2.internal"
I0622 04:09:15.281401      11 gc_controller.go:246] "Found orphaned Pod assigned to the Node, deleting." pod="test/test-5ccdb7cd7f-dm9bq" node="ip-10-11-57-209.ec2.internal"
I0622 04:09:15.303497      11 gc_controller.go:246] "Found orphaned Pod assigned to the Node, deleting." pod="kube-system/ebs-csi-node-xt68r" node="ip-10-11-57-209.ec2.internal"
I0622 04:08:07.894293      10 node_tree.go:79] "Removed node in listed group from NodeTree" node="ip-10-11-57-209.ec2.internal" zone="us-east-1:\x00:us-east-1a"
--

from karpenter-provider-aws.

jigisha620 avatar jigisha620 commented on July 21, 2024

terminationGracePeriodSeconds: 43200 #6hrs , 43200 - 12hrs

The deployment you shared has this. Is there a reason that this comment says 6 hours? Was terminationGracePeriod set to 6 hours or 12?

from karpenter-provider-aws.

qxmips avatar qxmips commented on July 21, 2024

terminationGracePeriodSeconds: 43200 #6hrs , 43200 - 12hrs

The deployment you shared has this. Is there a reason that this comment says 6 hours? Was terminationGracePeriod set to 6 hours or 12?

terminationGracePeriodSeconds is set to 12hrs (43200 seconds). we need it as on rare occasions the pod can't be interupted for up to 12 hours.

from karpenter-provider-aws.

jigisha620 avatar jigisha620 commented on July 21, 2024

I believe at this point it would make sense to go over the cluster audit logs to see if there's something that indicates what went wrong. Do you mind opening a support ticket to facilitate this?

from karpenter-provider-aws.

qxmips avatar qxmips commented on July 21, 2024

I believe at this point it would make sense to go over the cluster audit logs to see if there's something that indicates what went wrong. Do you mind opening a support ticket to facilitate this?

sorry, what kind of support ticket do you mean? we don't have AWS premium support. I believe that behavior can be reproduced on any cluster.

from karpenter-provider-aws.

jigisha620 avatar jigisha620 commented on July 21, 2024

I tried to reproduce this issue with the config that you have shared but couldn't reproduce it. Karpenter didn't remove the node until terminationGracerPeriod was hit. At this point, I think it would make sense to have a look at more complete set of Karpenter controller logs from the time when this happened.

from karpenter-provider-aws.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.