Giter Club home page Giter Club logo

Comments (5)

h4gen avatar h4gen commented on May 28, 2024 1

@jacobtomlinson Thank you for your answer! I thought that autoscaling was enabled, because as you can see the events said, that the upscaling was triggered. Nevertheless I recreated the cluster and now it works like charme! Thank you very much for your help!

from dask-kubernetes.

jacobtomlinson avatar jacobtomlinson commented on May 28, 2024

This sounds like an issue on the Kubernetes side. If dask_kubernetes has created the pods then it has done its job, but if Kubernetes is not running them then perhaps there is an issue with fitting the pods in the cluster.

Could you share your worker-template.yaml?

from dask-kubernetes.

h4gen avatar h4gen commented on May 28, 2024

Thanks for your answer. I just use the standard worker-template.yaml from the pangeo repo. I just played a bit around with the image to make the versions match. But this did not influence the mentioned scaling/distribution behaviour. So basically I really just did, what the pangeo deployment tutorial said. I would be really happy to debug this, but I am pretty new to the whole kubernetes topic. Which would be a good point to start looking at?

from dask-kubernetes.

h4gen avatar h4gen commented on May 28, 2024

This is the output of kubectl describe for the worker-pod:

Name:         dask-hagen-9f4c574f-f2fbc2
Namespace:    pangeo
Node:         <none>
Labels:       app=dask
              component=dask-worker
              dask.pydata.org/cluster-name=dask-hagen-9f4c574f-f
Annotations:  <none>
Status:       Pending
IP:
Containers:
  dask-worker:
    Image:      pangeo/notebook:ede11f6
    Port:       <none>
    Host Port:  <none>
    Args:
      dask-worker
      --nthreads
      2
      --no-bokeh
      --memory-limit
      6GB
      --death-timeout
      60
    Limits:
      cpu:     1750m
      memory:  6G
    Requests:
      cpu:     1750m
      memory:  6G
    Environment:
      GCSFUSE_BUCKET:          pangeo-data
      DASK_SCHEDULER_ADDRESS:  tcp://10.44.2.15:33138
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-s8dsl (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  default-token-s8dsl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-s8dsl
    Optional:    false
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From                Message
  ----     ------            ----               ----                -------
  Normal   TriggeredScaleUp  38s                cluster-autoscaler  pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/jupyterhub-221016/zones/europe-west1-b/instanceGroups/gke-pangeo-cluster-worker-pool-364baee3-grp 1->6 (max: 100)}]
  Warning  FailedScheduling  19s (x7 over 50s)  default-scheduler   0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory.

from dask-kubernetes.

jacobtomlinson avatar jacobtomlinson commented on May 28, 2024

The key bit here is 0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory..

This means you do not have enough cpu and memory on your compute nodes in your cluster to fulfill the request. The way to handle this is to scale you Kubernetes cluster up. If you are using GKE there should be an option to auto scale based on demand.

from dask-kubernetes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.