Giter Club home page Giter Club logo

Comments (26)

SergeyKanzhelev avatar SergeyKanzhelev commented on September 27, 2024 2

I found that it was introduced in 1.22: https://github.com/kubernetes/kubernetes/pull/102344/files#diff-67dd9e8f3ee257072765326cb4f242852554a2c0753563fa51e292c0a63a7b94L2112 It may be more visible because of some finalizers are cleaning up slower of something. But I think the bug is critical enough to justify the cherry-picks

from kubernetes.

KPostOffice avatar KPostOffice commented on September 27, 2024 1

I looked into this some today and it looks like there was a check added in the last month here

From this PR by @SergeyKanzhelev . From what I understand of the problem, this fix should close this issue as well.

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

/sig apps
cc @alculquicondor @atiratree

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

FYI we had a similar issue in the past with handling pods which transitioned from being categorized as Failed to Succeeded: #111646. This was easier to reproduce as Job controller itself would handle pods with deletionTimestamp as Failed, but they could end up Succeeded eventually. The situation here seems to involve the opposite - the pod first is counted as Succeeded, then as Failed.

from kubernetes.

alculquicondor avatar alculquicondor commented on September 27, 2024

@liggitt wdyt about making the job controller resilient to a flip of phase from Succeeded to Failed, as opposed to fixing the underlying problem (which we haven't been able to identify yet)?

Ideally, changes of the phase should have been prevented by validation, but I know that could be very disruptive.

cc @tallclair @bobbypage for any thoughts on how the kubelet could have made a decision about transition the Pod to Failed after it was declared as Succeeded.

from kubernetes.

liggitt avatar liggitt commented on September 27, 2024

@smarterclayton I thought validation was added to prevent phase flips away from terminal phases

Things like this:

I suspect there are more actors than just job controller that would not handle transitions out of terminal phases correctly ... I don't love the idea of compensating for violation of an invariant in one controller

from kubernetes.

alculquicondor avatar alculquicondor commented on September 27, 2024

Here we are talking about the Pod phase, not the container.

Regardless, in the status above we see:

  containerStatuses:
  - state:
      terminated:
        exitCode: 137
        finishedAt: null
        message: The container could not be located when the pod was terminated
        reason: ContainerStatusUnknown
        startedAt: null

So I am not seeing how the phase would have been Suceeded at any point.

@mimowo can you double check whether there are other conditions in which the job controller could consider a Pod succeeded even if it doesn't have the Succeeded phase?

Alternatively.... maybe the update didn't come from a kubelet? 🤔

from kubernetes.

alculquicondor avatar alculquicondor commented on September 27, 2024

Quickly looking at the job controller code, it looks like the only way for the job controller to consider a Pod succeeded is Pod.Status.Phase == Succeeded.

So it seems that these Pods were indeed Succeeded at some point, but we couldn't validate via audit logs, as status updates are not recorded.

Additionally, in the affected Pods we saw:

  conditions:
  - message: 'Taint manager: deleting due to NoExecute taint'
    reason: DeletionByTaintManager
    status: "True"
    type: DisruptionTarget
...
  message: 'Pod was rejected: Predicate NodeAffinity failed'
  phase: Failed

which might suggest that this is happening during a Node drain or shutdown.

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

@smarterclayton I thought validation was added to prevent phase flips away from terminal phases

Things like this:

I suspect there are more actors than just job controller that would not handle transitions out of terminal phases correctly ... I don't love the idea of compensating for violation of an invariant in one controller

IIUC this validation is for container statuses. We observed, based on audit logs, that pod's phase can transition under some circumstances from Succeeded to Failed. All transitions are done by Kubelet.

In out scenario (found a couple cases like that) the transitioning to Failed is always due to rejecting a pod, after Kubelet restart.

Also, it seems the validation prevents updating from terminal container status to running, but transition to another terminal (completely differnt) is still possible.

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

/sig node
To help us triage, the fact that Kubelet can flip pods phase from Succeeded to Failed seems like a bug.
/cc @SergeyKanzhelev @bobbypage

from kubernetes.

SergeyKanzhelev avatar SergeyKanzhelev commented on September 27, 2024

Is it 1.30 only? Or repro on earlier versions?

This indeed looks like a bug

from kubernetes.

tallclair avatar tallclair commented on September 27, 2024

Some interesting additional context from a case we saw:

2024-05-23 21:37:42.048 - PATCH by Kubelet setting `ContainerCreating` [1]
2024-05-23 21:38:18.537 - PATCH by Kubelet setting the phase as Running [2]
2024-05-23 21:38:38.471 - PATCH by Kubelet setting the container exit code as 0 [3]
2024-05-23 21:41:01.002 - Kubelet restart [4]
2024-05-23 21:42:16.332 - PATCH by Kubelet setting phase as Succeeded
2024-05-23 21:42:17.394 - Job status update adding the succeeded pod - based on the timestamp in managed-fields, matched with logs [6]
2024-05-23 21:42:18.336 - finalizer removal by Job controller rejected by the validation.gatekeeper.sh webhook [7]
2024-05-23 21:47:59.045 - PATCH by Kubelet which sets the phase as Failed, reason OutOfPods [8]

I think a reason of OutOfPods can only be set in pod admission, called from HandlePodAdditions. That means somehow Kubelet is setting the pod phase to succeeded, but then still subsequently handling that pod as an addition.

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

The saw at least one affected cluster by Kubelet flipping the phase in 1.27, 1.28 and 1.29.

from kubernetes.

tallclair avatar tallclair commented on September 27, 2024

I think this comment is probably referring to the cause of this issue:

// After restarting, kubelet will get all existing pods through
// ADD as if they are new pods. These pods will then go through the
// admission process and *may* be rejected. This can be resolved
// once we have checkpointing.
handler.HandlePodAdditions(u.Pods)

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

IIUC it means in this code does not protect against it:

// pods are not allowed to transition out of terminal phases
if pod.Status.Phase == v1.PodFailed || pod.Status.Phase == v1.PodSucceeded {
// API server shows terminal phase; transitions are not allowed
if s.Phase != pod.Status.Phase {
klog.ErrorS(nil, "Pod attempted illegal phase transition", "pod", klog.KObj(pod), "originalStatusPhase", pod.Status.Phase, "apiStatusPhase", s.Phase, "apiStatus", s)
// Force back to phase from the API server
s.Phase = pod.Status.Phase
}
}
. Maybe because after kubelet restart the API server pod status is not initialized yet?

from kubernetes.

SergeyKanzhelev avatar SergeyKanzhelev commented on September 27, 2024

also from @smarterclayton:

the problem is status manager makes a blind write to the API on kubelet restart without checking / preconditioning on the API server having the expected phase. That's definitely a kubelet bug.

from kubernetes.

AnishShah avatar AnishShah commented on September 27, 2024

sig-node meeting notes:

/triage accepted
/priority important-longterm

from kubernetes.

alculquicondor avatar alculquicondor commented on September 27, 2024

@SergeyKanzhelev @tallclair is there anyone working on this?

Otherwise we might need to look into a mitigation in the job controller side, like following suit and honoring the last update from kubelet.

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

This is great news! @SergeyKanzhelev @smarterclayton does it match your expectation that the PR would prevent kubelet from flipping the phase from Succeeded to Failed?

In that case we could consider closing the issue, as all known cases fall into this category. Unless, we want to make Job controller resilient to potential race conditions of kubelet and other controllers that would want to set the opposite phase. Any opinions @alculquicondor @atiratree ?

from kubernetes.

SergeyKanzhelev avatar SergeyKanzhelev commented on September 27, 2024

This is great news! @SergeyKanzhelev @smarterclayton does it match your expectation that the PR would prevent kubelet from flipping the phase from Succeeded to Failed?

Yes, but only in this specific case - if this was a re-admission failure

The saw at least one affected cluster by Kubelet flipping the phase in 1.27, 1.28 and 1.29.

I am trying to understand whether it regressed at some point as I want to cherry-pick the fix. Do you know about 1.26?

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

Don't know definitely, but likely a regression, because I saw 10 clusters with stuck Jobs due to this issue, and all were 1.27+.

from kubernetes.

SergeyKanzhelev avatar SergeyKanzhelev commented on September 27, 2024

/priority critical-urgent

from kubernetes.

alculquicondor avatar alculquicondor commented on September 27, 2024

+1 to cherry-picking to all supported versions

from kubernetes.

mimowo avatar mimowo commented on September 27, 2024

+1 for cherry-picking

from kubernetes.

alculquicondor avatar alculquicondor commented on September 27, 2024

/close
cherry-picks are merged

from kubernetes.

k8s-ci-robot avatar k8s-ci-robot commented on September 27, 2024

@alculquicondor: Closing this issue.

In response to this:

/close
cherry-picks are merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

from kubernetes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.