Comments (4)
Right now we propagate labels from the top-level DaskCluster
and DaskWorkerGroup
objects down to their child resources. So if you set foo=bar
on the DaskCluster
then you could use the foo=bar,dask.org/component=scheduler
selector to select the scheduler service.
from dask-kubernetes.
I quickly threw together a ServiceMonitor
that will scrape metrics for all Dask schedulers deployed with the operator.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: dask-operator-scheduler
labels:
dask.org/component: "scheduler"
spec:
selector:
matchLabels:
dask.org/component: "scheduler"
targetLabels:
- dask.org/cluster-name
endpoints:
- interval: 15s
port: http-dashboard
I think it would make sense to optionally include this in the Operator Helm Chart with all the same templating and customisation as is available in the basic helm chart. This is a slightly different approach because one ServiceMonitor
would be registered which would select all Dask clusters created by the operator. Compared to the Helm Chart which creates one ServiceMonitor
per cluster.
I also had a play around with a PodMonitor
for scraping worker metrics but the operator currently doesn't configure worker ports by default so that will take more work.
from dask-kubernetes.
@jacobtomlinson Wow., thank you! I already forked the repo and wanted to create PR for that feature. It would be also nice to have specs customizable in make_scheduler_spec
/ make_worker_spec
to include some extra matadata labels when creating KubeCluster instance.
from dask-kubernetes.
Ok now that #688 is merged we can also create a PodMonitor
to scrape worker metrics.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: dask-operator-workers
labels:
dask.org/component: "worker"
spec:
selector:
matchLabels:
dask.org/component: "worker"
podTargetLabels:
- dask.org/cluster-name
- dask.org/workergroup-name
podMetricsEndpoints:
- interval: 15s
port: http-dashboard
from dask-kubernetes.
Related Issues (20)
- KubeCluster on Windows raises error: _WindowsSelectorEventLoop does NOT support subprocesses HOT 2
- Operator does not handle updates for DaskCluster HOT 2
- FileNotFoundError in Classic KubeCluster CI
- KubeCluster.shutdown_on_close defaults to True HOT 1
- Add batched worker provisioning to Dask Cluster spawning on Kubernetes HOT 7
- Reinstate job test
- Workers remain idle and not cleaned up after terminating cluster or failure of dask job HOT 1
- [Dask Operator] 'daskcluster_create_components/status.phase' failed with an exception HOT 12
- K8s: cannot create resource "deployments" HOT 2
- Cannot connect to cluster in 2023.6.0 HOT 6
- Increment the version on the CRDs HOT 1
- Helm chart failing to create workers with the new release. HOT 1
- Adding labels to DaskCluster (or other CRs) don't propagate
- Improper scaling of dask workers HOT 3
- Dask Auto scaler failing to create HOT 8
- Support auth without refresh tokens HOT 7
- Run Dask Operator container as non-root HOT 4
- Readiness/Liveness probes do not accept integer port HOT 4
- Service account cannot patch resource `daskautoscalers/scale` HOT 2
- Extensibility link broken on kubecluster.rst DOCS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-kubernetes.