Giter Club home page Giter Club logo

Comments (12)

alculquicondor avatar alculquicondor commented on June 25, 2024 2

@chengjoey @AxeZhan can you take a look?

I believe all the latest patch releases are affected too, because of #124559

from kubernetes.

AxeZhan avatar AxeZhan commented on June 25, 2024 1

/assign

Successfully catched this with a unit test.
I'll file a small pr with fix #124930 (comment) and a newly added UT.

from kubernetes.

k8s-ci-robot avatar k8s-ci-robot commented on June 25, 2024

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

from kubernetes.

mikkeloscar avatar mikkeloscar commented on June 25, 2024

/sig scheduling

from kubernetes.

liggitt avatar liggitt commented on June 25, 2024

cc @alculquicondor
/priority important-soon

from kubernetes.

alculquicondor avatar alculquicondor commented on June 25, 2024

cc @sanposhiho

from kubernetes.

alculquicondor avatar alculquicondor commented on June 25, 2024

/priority critical-urgent

from kubernetes.

AxeZhan avatar AxeZhan commented on June 25, 2024

This crash happens when preFilter plugins filtered out all nodes ...

@alculquicondor

sched.nextStartNodeIndex = (sched.nextStartNodeIndex + processedNodes) % len(nodes)

If preFilter filtered out some nodes, here len(nodes) will be a subset of allNodes.
Why are we using this subset at first? I think %allNodes should works fine?(seems logical to me)

from kubernetes.

alculquicondor avatar alculquicondor commented on June 25, 2024

I think so. In general, we just want to try a different set of nodes.

For the case of Daemonsets, it doesn't really matter, as we will just test one node.

from kubernetes.

shapirus avatar shapirus commented on June 25, 2024

I am having the same issue with 1.28.10 on a clean new cluster created with kops 1.28.5:

I0521 14:15:49.659515      10 leaderelection.go:260] successfully acquired lease kube-system/kube-scheduler
I0521 14:15:49.659839      10 schedule_one.go:80] "About to try and schedule pod" pod="kube-system/aws-cloud-controller-manager-rvbs8"
I0521 14:15:49.659903      10 schedule_one.go:93] "Attempting to schedule pod" pod="kube-system/aws-cloud-controller-manager-rvbs8"
E0521 14:15:49.660114      10 runtime.go:79] Observed a panic: "integer divide by zero" (runtime error: integer divide by zero)
goroutine 426 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x19efce0?, 0x33010c0})
        k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x7c
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x78
panic({0x19efce0?, 0x33010c0?})
        runtime/panic.go:914 +0x218
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x0?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:491 +0xa3c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x28368?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:383 +0x280
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, 0x1b1ff00?, {0x20f09b0, 0x400019f600}, 0x4000ccee80, {0x2?, 0x4349a76c8302d44?, 0x333cd20?}, ...)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:145 +0xa8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0x400067f2c0, {0x20ccc68?, 0x40004d6be0})
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:107 +0x348
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x30
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x6b68769b60b6b490?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x34c3f10eb615109?, {0x20aa080, 0x4000928120}, 0x1, 0x4000109740)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xe69071c31329cd3a?, 0x0, 0x0, 0x21?, 0xd0734d7fcbfb2b8e?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x20ccc68, 0x40004d6be0}, 0x40008fee80, 0xefb95d5a1b7ffc98?, 0x4840affac97ee6ca?, 0x21?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x20ccc68?, 0x40004d6be0?}, 0x0?, 0x75d6674e3c776941?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:170 +0x2c
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 421
        k8s.io/kubernetes/pkg/scheduler/scheduler.go:406 +0xfc
panic: runtime error: integer divide by zero [recovered]
        panic: runtime error: integer divide by zero

goroutine 426 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x0?})
        k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xe0
panic({0x19efce0?, 0x33010c0?})
        runtime/panic.go:914 +0x218
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).findNodesThatFitPod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x0?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:491 +0xa3c
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulePod(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, {0x20f09b0, 0x400019f600}, 0x28368?, 0x4000ccb200)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:383 +0x280
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).schedulingCycle(0x400067f2c0, {0x20ccc68, 0x40004d6e60}, 0x1b1ff00?, {0x20f09b0, 0x400019f600}, 0x4000ccee80, {0x2?, 0x4349a76c8302d44?, 0x333cd20?}, ...)
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:145 +0xa8
k8s.io/kubernetes/pkg/scheduler.(*Scheduler).scheduleOne(0x400067f2c0, {0x20ccc68?, 0x40004d6be0})
        k8s.io/kubernetes/pkg/scheduler/schedule_one.go:107 +0x348
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1()
        k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x30
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x6b68769b60b6b490?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:226 +0x40
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x34c3f10eb615109?, {0x20aa080, 0x4000928120}, 0x1, 0x4000109740)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:227 +0x90
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xe69071c31329cd3a?, 0x0, 0x0, 0x21?, 0xd0734d7fcbfb2b8e?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:204 +0x80
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext({0x20ccc68, 0x40004d6be0}, 0x40008fee80, 0xefb95d5a1b7ffc98?, 0x4840affac97ee6ca?, 0x21?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:259 +0x80
k8s.io/apimachinery/pkg/util/wait.UntilWithContext({0x20ccc68?, 0x40004d6be0?}, 0x0?, 0x75d6674e3c776941?)
        k8s.io/[email protected]/pkg/util/wait/backoff.go:170 +0x2c
created by k8s.io/kubernetes/pkg/scheduler.(*Scheduler).Run in goroutine 421
        k8s.io/kubernetes/pkg/scheduler/scheduler.go:406 +0xfc

The pod that it's trying to create is pending with the following messages:

Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  13m (x141 over 38m)  default-scheduler  0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
  Warning  FailedScheduling  9m30s                default-scheduler  0/2 nodes are available: 1 Insufficient cpu, 1 node is filtered out by the prefilter result. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling..

(just for the sake of being indexed for those who will search for the same error to find this github issue)

from kubernetes.

AxeZhan avatar AxeZhan commented on June 25, 2024

Fix is on the way: #124933

Just waiting for reviews from approver.

After this pr got merged, I'll do cherry-picks for v1.28(1.27?)~1.30

from kubernetes.

AxeZhan avatar AxeZhan commented on June 25, 2024

cherry-picks:
#125039
#125041
#125042
#125043

from kubernetes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.