Comments (5)
@jacobtomlinson Thank you for your answer! I thought that autoscaling was enabled, because as you can see the events said, that the upscaling was triggered. Nevertheless I recreated the cluster and now it works like charme! Thank you very much for your help!
from dask-kubernetes.
This sounds like an issue on the Kubernetes side. If dask_kubernetes
has created the pods then it has done its job, but if Kubernetes is not running them then perhaps there is an issue with fitting the pods in the cluster.
Could you share your worker-template.yaml
?
from dask-kubernetes.
Thanks for your answer. I just use the standard worker-template.yaml from the pangeo repo. I just played a bit around with the image to make the versions match. But this did not influence the mentioned scaling/distribution behaviour. So basically I really just did, what the pangeo deployment tutorial said. I would be really happy to debug this, but I am pretty new to the whole kubernetes topic. Which would be a good point to start looking at?
from dask-kubernetes.
This is the output of kubectl describe for the worker-pod:
Name: dask-hagen-9f4c574f-f2fbc2
Namespace: pangeo
Node: <none>
Labels: app=dask
component=dask-worker
dask.pydata.org/cluster-name=dask-hagen-9f4c574f-f
Annotations: <none>
Status: Pending
IP:
Containers:
dask-worker:
Image: pangeo/notebook:ede11f6
Port: <none>
Host Port: <none>
Args:
dask-worker
--nthreads
2
--no-bokeh
--memory-limit
6GB
--death-timeout
60
Limits:
cpu: 1750m
memory: 6G
Requests:
cpu: 1750m
memory: 6G
Environment:
GCSFUSE_BUCKET: pangeo-data
DASK_SCHEDULER_ADDRESS: tcp://10.44.2.15:33138
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-s8dsl (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
default-token-s8dsl:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-s8dsl
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal TriggeredScaleUp 38s cluster-autoscaler pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/jupyterhub-221016/zones/europe-west1-b/instanceGroups/gke-pangeo-cluster-worker-pool-364baee3-grp 1->6 (max: 100)}]
Warning FailedScheduling 19s (x7 over 50s) default-scheduler 0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory.
from dask-kubernetes.
The key bit here is 0/3 nodes are available: 3 Insufficient cpu, 3 Insufficient memory.
.
This means you do not have enough cpu and memory on your compute nodes in your cluster to fulfill the request. The way to handle this is to scale you Kubernetes cluster up. If you are using GKE there should be an option to auto scale based on demand.
from dask-kubernetes.
Related Issues (20)
- KubeCluster on Windows raises error: _WindowsSelectorEventLoop does NOT support subprocesses HOT 2
- Operator does not handle updates for DaskCluster HOT 2
- FileNotFoundError in Classic KubeCluster CI
- KubeCluster.shutdown_on_close defaults to True HOT 1
- Add batched worker provisioning to Dask Cluster spawning on Kubernetes HOT 7
- Reinstate job test
- Workers remain idle and not cleaned up after terminating cluster or failure of dask job HOT 1
- [Dask Operator] 'daskcluster_create_components/status.phase' failed with an exception HOT 12
- K8s: cannot create resource "deployments" HOT 2
- Cannot connect to cluster in 2023.6.0 HOT 6
- Increment the version on the CRDs HOT 1
- Helm chart failing to create workers with the new release. HOT 1
- Adding labels to DaskCluster (or other CRs) don't propagate
- Improper scaling of dask workers HOT 3
- Dask Auto scaler failing to create HOT 8
- Support auth without refresh tokens HOT 7
- Run Dask Operator container as non-root HOT 4
- Readiness/Liveness probes do not accept integer port HOT 4
- Service account cannot patch resource `daskautoscalers/scale` HOT 2
- Extensibility link broken on kubecluster.rst DOCS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-kubernetes.