Comments (8)
Thanks for raising this @weiwang217. I've opened #837 to resolve this. Would you mind testing that PR out and letting me know if it solves your problem?
from dask-kubernetes.
from dask-kubernetes.
We have documentation on how to do this here https://kubernetes.dask.org/en/latest/testing.html#testing-operator-controller-prs
from dask-kubernetes.
from dask-kubernetes.
Hi Jacob,
I have a suspicion that change may have caused a regression when working with replicas > 1. When I start a new DaskJob, all but one replica fails to connect to the scheduler because of duplicate names. Indeed when I run
kubectl describe pod <worker_pod>
I see:
Worker 1:
Environment:
DASK_WORKER_NAME: simple-job-default-worker-a10a25ac26
DASK_SCHEDULER_ADDRESS: tcp://simple-job-scheduler.join.svc.cluster.local:8786
...
Worker 2:
Environment:
DASK_WORKER_NAME: simple-job-default-worker-00add84cde
DASK_SCHEDULER_ADDRESS: tcp://simple-job-scheduler.join.svc.cluster.local:8786
DASK_WORKER_NAME: simple-job-default-worker-a10a25ac26
DASK_SCHEDULER_ADDRESS: tcp://simple-job-scheduler.join.svc.cluster.local:8786
Because the last defined environment variable is the first replica, all replicas share the same name.
Do you mind taking a look?
(Context: I'm on the same team as weiwang217 and we just noticed this change recently)
from dask-kubernetes.
Thanks for reporting this @kjleftin. Why are you setting the DASK_WORKER_NAME
in your config?
from dask-kubernetes.
Hi Jacob,
I'm following the example code in https://kubernetes.dask.org/en/latest/operator_resources.html#daskjob
Specifically, passing the DASK_WORKER_NAME env. variable to the dask worker CLI:
- name: worker
image: "ghcr.io/dask/dask:latest"
imagePullPolicy: "IfNotPresent"
args:
- dask-worker
- --name
- $(DASK_WORKER_NAME)
- --dashboard
- --dashboard-address
- "8788"
Note that I'm not setting DASK_WORKER_NAME explicitly. That is handled by the Dask Operator. (Before this change, each worker would have a different value for DASK_WORKER_NAME, but after this change, each worker has the same value).
from dask-kubernetes.
@kjleftin ok thanks for the clarification. I expect we may need to use copy
to avoid this. I'll take a look at the PR and update it.
from dask-kubernetes.
Related Issues (20)
- Dask dashboard not loading HOT 4
- Env var duplication HOT 2
- Ability to add different scheduler address to workers outside of standard format HOT 2
- Add a Changelog HOT 4
- Cluster creation constantly failing because of existing scheduler in "Terminating" status HOT 3
- Does dask-kubernetes compatible with newer version of k8rs? HOT 4
- Can not connect to k8s websocket deployed in Rancher HOT 5
- Update dask-kubernetes to a newer kr8s HOT 4
- Add Python 3.12 support HOT 1
- TOCTOU Bug while scaling down workers HOT 5
- Worker RestartPolicy not setable HOT 2
- Dask cluster creation issue with TLS HOT 1
- KubeCluster is shut down automatically even if shutdown_on_close is False HOT 1
- Go code failing to lint
- Dask Cluster with name longer than 53 chars is stuck in Created state, cannot be deleted
- Cannot Overwrite DASK_SCHEDULER_ADDRESS in Worker env HOT 1
- ConnectionClosedError during Dask Cluster Creation with k8s HOT 1
- Missing idleTimeout key in daskcluster_autoshutdown HOT 9
- Add IngressSpec besides ServiceSpec to Scheduler HOT 2
- Handle `image`, `env` and `args` fields updates in DaskCluster in k8s operator HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-kubernetes.