Comments (17)
@chengjoey @AxeZhan can you take a look?
I believe all the latest patch releases are affected too, because of #124559
from kubernetes.
/assign
Successfully catched this with a unit test.
I'll file a small pr with fix #124930 (comment) and a newly added UT.
from kubernetes.
cherry-picks:
#125039
#125041
#125042
#125043
from kubernetes.
@sara-hann This change will be included in the upcoming patch releases, scheduled for
2024-06-11. You can check out the Patch Release page for more details.
from kubernetes.
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
from kubernetes.
/sig scheduling
from kubernetes.
cc @alculquicondor
/priority important-soon
from kubernetes.
cc @sanposhiho
from kubernetes.
/priority critical-urgent
from kubernetes.
This crash happens when preFilter plugins filtered out all nodes ...
sched.nextStartNodeIndex = (sched.nextStartNodeIndex + processedNodes) % len(nodes)
If preFilter filtered out some nodes, here len(nodes) will be a subset of allNodes.
Why are we using this subset at first? I think %allNodes
should works fine?(seems logical to me)
from kubernetes.
I think so. In general, we just want to try a different set of nodes.
For the case of Daemonsets, it doesn't really matter, as we will just test one node.
from kubernetes.
I am having the same issue with 1.28.10 on a clean new cluster created with kops 1.28.5:
I0521 14:15:49.659515 10 leaderelection.go:260] successfully acquired lease kube-system/kube-scheduler
I0521 14:15:49.659839 10 schedule_one.go:80] "About to try and schedule pod" pod="kube-system/aws-cloud-controller-manager-rvbs8"
I0521 14:15:49.659903 10 schedule_one.go:93] "Attempting to schedule pod" pod="kube-system/aws-cloud-controller-manager-rvbs8"
E0521 14:15:49.660114 10 runtime.go:79] Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)
goroutine 426 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x19efce0?, 0x33010c0})
k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x7c
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x78
panic({0x19efce0?, 0x33010c0?})
runtime/panic.go:914 +0x218
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x0?, 0x4000ccb200)
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:491 +0xa3c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x28368?, 0x4000ccb200)
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:383 +0x280
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, 0x1b1ff00?, {0x20f09b0, 0x400019f600}, 0x4000ccee80, {0x2?, 0x4349a76c8302d44?, 0x333cd20?}, ...)
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:145 +0xa8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0x400067f2c0, {0x20ccc68?, 0x40004d6be0})
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:107 +0x348
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x30
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x6b68769b60b6b490?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x34c3f10eb615109?, {0x20aa080, 0x4000928120}, 0x1, 0x4000109740)
k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xe69071c31329cd3a?, 0x0, 0x0, 0x21?, 0xd0734d7fcbfb2b8e?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x20ccc68, 0x40004d6be0}, 0x40008fee80, 0xefb95d5a1b7ffc98?, 0x4840affac97ee6ca?, 0x21?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x20ccc68?, 0x40004d6be0?}, 0x0?, 0x75d6674e3c776941?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:170 +0x2c
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 421
k8s.io/kubernetes/pkg/scheduler/scheduler.go:406 +0xfc
panic: runtime error: integer divide by zero [recovered]
panic: runtime error: integer divide by zero
goroutine 426 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xe0
panic({0x19efce0?, 0x33010c0?})
runtime/panic.go:914 +0x218
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x0?, 0x4000ccb200)
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:491 +0xa3c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x28368?, 0x4000ccb200)
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:383 +0x280
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, 0x1b1ff00?, {0x20f09b0, 0x400019f600}, 0x4000ccee80, {0x2?, 0x4349a76c8302d44?, 0x333cd20?}, ...)
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:145 +0xa8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0x400067f2c0, {0x20ccc68?, 0x40004d6be0})
k8s.io/kubernetes/pkg/scheduler/schedule_one.go:107 +0x348
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x30
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x6b68769b60b6b490?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x34c3f10eb615109?, {0x20aa080, 0x4000928120}, 0x1, 0x4000109740)
k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xe69071c31329cd3a?, 0x0, 0x0, 0x21?, 0xd0734d7fcbfb2b8e?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x20ccc68, 0x40004d6be0}, 0x40008fee80, 0xefb95d5a1b7ffc98?, 0x4840affac97ee6ca?, 0x21?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x20ccc68?, 0x40004d6be0?}, 0x0?, 0x75d6674e3c776941?)
k8s.io/[email protected]/pkg/util/wait/backoff.go:170 +0x2c
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 421
k8s.io/kubernetes/pkg/scheduler/scheduler.go:406 +0xfc
The pod that it's trying to create is pending with the following messages:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m (x141 over 38m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
Warning FailedScheduling 9m30s default-scheduler 0/2 nodes are available: 1 Insufficient cpu, 1 node is filtered out by the prefilter result. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..
(just for the sake of being indexed for those who will search for the same error to find this github issue)
from kubernetes.
Fix is on the way: #124933
Just waiting for reviews from approver.
After this pr got merged, I'll do cherry-picks for v1.28(1.27?)~1.30
from kubernetes.
@AxeZhan @xmudrii Thanks for fixing this so fast. How do I pull this new code? I see that your PR was merged, but I don't see a new release to pull? looking for the 1.30
from kubernetes.
I'm seeing the issue after upgrade v1.26.11->v1.27.14
Should I search for a broken pod? Or what else can be causing this?
from kubernetes.
Yes, a broken pod could be causing this. You can also try downgrading to v1.27.13
from kubernetes.
Phew, manually changing the scheduler version to 1.27.13 fixed the issue and showed the broken pod in logs. Thanks!
from kubernetes.
Related Issues (20)
- Cannot set kubelet config `resolvConf` with drop-in config files HOT 8
- kubectl: panic when describe ingress with no Backend HOT 3
- Pod Anti Affinity is not working as expected HOT 4
- Kubernetes mounts a pod to local file system HOT 4
- tokenfile path is hardcoded in the config.go HOT 20
- Explicit version specification in go directive fouling corporate Snyk scans HOT 12
- `kubectl wait --for=jsonpath='{.status.readyReplicas}'=1` fails in 1.31.0-rc.1 HOT 3
- kube-scheduler updates pod status mistakenly during preemption HOT 7
- Status cannot be changed via pod patching HOT 4
- Get "invalid configuration: unable to read certificate-authority " error with clientcmd.RESTConfigFromKubeConfig function HOT 4
- Pod status phase misses Pending transition after node reboot HOT 8
- ResourceSliceList object has `listMeta` field instead of `metadata` field HOT 11
- Pkgs.k8s.io repository for Red Hat based distributions for the latest release 1.31.0 has dependency issue! HOT 8
- QosClass of pod status shouldn't be changeable HOT 4
- kube-apiserver and other components no longer honor --version build ID overrides HOT 3
- Fake client returns unexpected results on errors since 1.31 HOT 8
- Compromised container can trigger OOM with `kubectl cp` HOT 10
- Andoka Cloud HOT 4
- The pod initialization container initcontainers returns No space left on device HOT 6
- [kubelet] Pod is deleted due to Node not ready, deletions is cancelled, kubelet is not aware container is not running HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes.