Comments (6)
I think the only clean way to do this is to use https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/. We should mark pods spawned by our notebook pod as owned by the notebook pod, so when the notebook pod dies all the worker pods do too.
from dask-kubernetes.
The Python bits of this seem to be done and work well now. This fails though if the Python process is killed in such a way that weakref.finalize
doesn't have an opportunity to work (such as a SIGKILL from kubernetes)
from dask-kubernetes.
The Python bits of this seem to be done and work well now. This fails though if the Python process is killed in such a way that weakref.finalize
doesn't have an opportunity to work (such as a SIGKILL from kubernetes)
from dask-kubernetes.
from dask-kubernetes.
Workers now kill themselves if they haven't seen their Scheduler for 60 seconds. Operationally this seems to work well even when pods are not cleaned up nicely by the jupyter process.
from dask-kubernetes.
The 60s timeout resolves this issue for me. Closing.
from dask-kubernetes.
Related Issues (20)
- `ResourceQuota` handling HOT 4
- Nightly CI job that opens an issue on test failures HOT 1
- Add optional ServiceMonitor HOT 4
- Add controller prometheus metrics
- Adding labels and annotations using Dask Kubernetes Operator HOT 6
- Operator Pod fails to run HOT 2
- Add backoffLimit to DaskJobs HOT 1
- Dask Kubernetes v2 (Stability) Release
- Invalid KubeCluster kwargs raises confusing exception HOT 3
- Adding a suspend field to the dask operator HOT 5
- Ideal user documentation flow HOT 4
- RuntimeError: cannot schedule new futures after shutdown when using external Kubernetes cluster HOT 4
- KubeCluster on Windows raises error: _WindowsSelectorEventLoop does NOT support subprocesses HOT 2
- Operator does not handle updates for DaskCluster HOT 2
- FileNotFoundError in Classic KubeCluster CI
- KubeCluster.shutdown_on_close defaults to True HOT 1
- Add batched worker provisioning to Dask Cluster spawning on Kubernetes HOT 7
- Reinstate job test
- Workers remain idle and not cleaned up after terminating cluster or failure of dask job HOT 1
- [Dask Operator] 'daskcluster_create_components/status.phase' failed with an exception HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-kubernetes.