Comments (26)
I found that it was introduced in 1.22: https://github.com/kubernetes/kubernetes/pull/102344/files#diff-67dd9e8f3ee257072765326cb4f242852554a2c0753563fa51e292c0a63a7b94L2112 It may be more visible because of some finalizers are cleaning up slower of something. But I think the bug is critical enough to justify the cherry-picks
from kubernetes.
I looked into this some today and it looks like there was a check added in the last month here
From this PR by @SergeyKanzhelev . From what I understand of the problem, this fix should close this issue as well.
from kubernetes.
/sig apps
cc @alculquicondor @atiratree
from kubernetes.
FYI we had a similar issue in the past with handling pods which transitioned from being categorized as Failed to Succeeded: #111646. This was easier to reproduce as Job controller itself would handle pods with deletionTimestamp as Failed, but they could end up Succeeded eventually. The situation here seems to involve the opposite - the pod first is counted as Succeeded, then as Failed.
from kubernetes.
@liggitt wdyt about making the job controller resilient to a flip of phase from Succeeded to Failed, as opposed to fixing the underlying problem (which we haven't been able to identify yet)?
Ideally, changes of the phase should have been prevented by validation, but I know that could be very disruptive.
cc @tallclair @bobbypage for any thoughts on how the kubelet could have made a decision about transition the Pod to Failed after it was declared as Succeeded.
from kubernetes.
@smarterclayton I thought validation was added to prevent phase flips away from terminal phases
Things like this:
I suspect there are more actors than just job controller that would not handle transitions out of terminal phases correctly ... I don't love the idea of compensating for violation of an invariant in one controller
from kubernetes.
Here we are talking about the Pod phase, not the container.
Regardless, in the status above we see:
containerStatuses:
- state:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was terminated
reason: ContainerStatusUnknown
startedAt: null
So I am not seeing how the phase would have been Suceeded at any point.
@mimowo can you double check whether there are other conditions in which the job controller could consider a Pod succeeded even if it doesn't have the Succeeded
phase?
Alternatively.... maybe the update didn't come from a kubelet? 🤔
from kubernetes.
Quickly looking at the job controller code, it looks like the only way for the job controller to consider a Pod succeeded is Pod.Status.Phase == Succeeded
.
So it seems that these Pods were indeed Succeeded at some point, but we couldn't validate via audit logs, as status updates are not recorded.
Additionally, in the affected Pods we saw:
conditions:
- message: 'Taint manager: deleting due to NoExecute taint'
reason: DeletionByTaintManager
status: "True"
type: DisruptionTarget
...
message: 'Pod was rejected: Predicate NodeAffinity failed'
phase: Failed
which might suggest that this is happening during a Node drain or shutdown.
from kubernetes.
@smarterclayton I thought validation was added to prevent phase flips away from terminal phases
Things like this:
- api: validate container phase transitions #54530
- kubelet: check for illegal container state transition #54597
I suspect there are more actors than just job controller that would not handle transitions out of terminal phases correctly ... I don't love the idea of compensating for violation of an invariant in one controller
IIUC this validation is for container statuses. We observed, based on audit logs, that pod's phase can transition under some circumstances from Succeeded
to Failed
. All transitions are done by Kubelet.
In out scenario (found a couple cases like that) the transitioning to Failed
is always due to rejecting a pod, after Kubelet restart.
Also, it seems the validation prevents updating from terminal
container status to running
, but transition to another terminal
(completely differnt) is still possible.
from kubernetes.
/sig node
To help us triage, the fact that Kubelet can flip pods phase from Succeeded to Failed seems like a bug.
/cc @SergeyKanzhelev @bobbypage
from kubernetes.
Is it 1.30 only? Or repro on earlier versions?
This indeed looks like a bug
from kubernetes.
Some interesting additional context from a case we saw:
2024-05-23 21:37:42.048 - PATCH by Kubelet setting `ContainerCreating` [1]
2024-05-23 21:38:18.537 - PATCH by Kubelet setting the phase as Running [2]
2024-05-23 21:38:38.471 - PATCH by Kubelet setting the container exit code as 0 [3]
2024-05-23 21:41:01.002 - Kubelet restart [4]
2024-05-23 21:42:16.332 - PATCH by Kubelet setting phase as Succeeded
2024-05-23 21:42:17.394 - Job status update adding the succeeded pod - based on the timestamp in managed-fields, matched with logs [6]
2024-05-23 21:42:18.336 - finalizer removal by Job controller rejected by the validation.gatekeeper.sh webhook [7]
2024-05-23 21:47:59.045 - PATCH by Kubelet which sets the phase as Failed, reason OutOfPods [8]
I think a reason of OutOfPods can only be set in pod admission, called from HandlePodAdditions
. That means somehow Kubelet is setting the pod phase to succeeded, but then still subsequently handling that pod as an addition.
from kubernetes.
The saw at least one affected cluster by Kubelet flipping the phase in 1.27, 1.28 and 1.29.
from kubernetes.
I think this comment is probably referring to the cause of this issue:
kubernetes/pkg/kubelet/kubelet.go
Lines 2437 to 2441 in a7ca13e
from kubernetes.
IIUC it means in this code does not protect against it:
kubernetes/pkg/kubelet/kubelet_pods.go
Lines 1817 to 1825 in bb2068b
from kubernetes.
also from @smarterclayton:
the problem is status manager makes a blind write to the API on kubelet restart without checking / preconditioning on the API server having the expected phase. That's definitely a kubelet bug.
from kubernetes.
sig-node meeting notes:
/triage accepted
/priority important-longterm
from kubernetes.
@SergeyKanzhelev @tallclair is there anyone working on this?
Otherwise we might need to look into a mitigation in the job controller side, like following suit and honoring the last update from kubelet.
from kubernetes.
This is great news! @SergeyKanzhelev @smarterclayton does it match your expectation that the PR would prevent kubelet from flipping the phase from Succeeded to Failed?
In that case we could consider closing the issue, as all known cases fall into this category. Unless, we want to make Job controller resilient to potential race conditions of kubelet and other controllers that would want to set the opposite phase. Any opinions @alculquicondor @atiratree ?
from kubernetes.
This is great news! @SergeyKanzhelev @smarterclayton does it match your expectation that the PR would prevent kubelet from flipping the phase from Succeeded to Failed?
Yes, but only in this specific case - if this was a re-admission failure
The saw at least one affected cluster by Kubelet flipping the phase in 1.27, 1.28 and 1.29.
I am trying to understand whether it regressed at some point as I want to cherry-pick the fix. Do you know about 1.26?
from kubernetes.
Don't know definitely, but likely a regression, because I saw 10 clusters with stuck Jobs due to this issue, and all were 1.27+.
from kubernetes.
/priority critical-urgent
from kubernetes.
+1 to cherry-picking to all supported versions
from kubernetes.
+1 for cherry-picking
from kubernetes.
/close
cherry-picks are merged
from kubernetes.
@alculquicondor: Closing this issue.
In response to this:
/close
cherry-picks are merged
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
from kubernetes.
Related Issues (20)
- [Flaky test] [It] [sig-network] ClusterDns [Feature:Example] [Feature:Networking-IPv4] should create pod that uses DNS HOT 3
- [Flaky Test] capz-windows-master ci-kubernetes-e2e-capz-master-windows.Overall HOT 1
- Setting externalIPs to the same IP as one node, renders the node unaccessible HOT 11
- Failure cluster [0a5db28b...] `[sig-network] ClusterDns [Feature:Example] [Feature:Networking-IPv4] should create pod that uses dns` HOT 5
- TLS Error with Karpenter Liveness and Readiness Probes: remote error: tls: unrecognized name HOT 2
- Can we add necessary labels when volume controller annotate PVC? HOT 3
- Endpoints controller uses stale endpoints in reconciling, the endpoint Subsets will be wrong and never restores correctly HOT 3
- When the pod is deleted, GC the current image. HOT 4
- Add FieldSelector to MutatingWebhookConfiguration spec HOT 8
- Kubernetes 1.28.14 coredns not working, [ERROR] plugin/errors: 2 7550147287576560633.7996095784120876925. HINFO: read udp 10.0.1.5:55591->8.8.8.8:53: read: no route to host HOT 4
- Failing Test: [NodeFeature:KubeletConfigDropInDir] when merging drop-in configs should merge kubelet configs correctly HOT 13
- Kubelet plugin registration reliability HOT 5
- CRI proxy for e2e_node tests HOT 5
- integrate kubelet with the systemd watchdog HOT 7
- A DaemonSet pod environment variable did not inject service information HOT 9
- topologySpreadConstraints for availability zone in aws is not working as expected HOT 2
- API Server responds 200 OK, but not properly handling the request HOT 3
- Versioned feature gate lint error need to provide clear directions on what to do HOT 5
- Need emulation version guidance on how to perserve disabled feature gate tests for LockToDefault features HOT 6
- Failure cluster [c85e0cdc...] `Cluster size autoscaling` HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes.