Comments (8)
Hi! We have a similar use-case. When more than one team use the same Pyrra and Prometheus-operator it's difficult to tell the alert-rules apart and Pyrra can get a little cluttered. Having some customization to include existing labels from Kubernetes and adding new labels as well would be great!
pyrra/openapi/client/model_objective.go
Lines 18 to 39 in ca43989
From rules.go#L103-L104
ruleLabels := map[string]string{}
ruleLabels["slo"] = sloName
Adding a new entry here for custom labels like this for instance
ruleLabels["team"] = your-team-name
Seems the recording rules are missing namespace as well, since namespace is added automatically to the Pyrra config, but not to the recording rules and therefor not visible in the alert rules.
pyrra/openapi/client/model_multi_burnrate_alert.go
Lines 17 to 25 in ca43989
For Pyrra (frontend), maybe add some (optional) top-level grouping as well. That way you could add more SLOs for the same team and group them under a team page.
from pyrra.
Perfect. Thank you for that very in-depth comment. Indeed, we should probably track that in a separate issue to this one.
All that this really comes down to is propagating the labels all the way from the SLO CRD through Pyrra until they show up in the alerts.
Let's imagine this label is added in here:
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: your-slo-name-here
labels:
prometheus: k8s
role: alert-rules
+ team: pyrra
spec:
target: '99.0'
window: 7d
indicator:
ratio:
errors:
metric: frontend_request_counter_total{status=~"5..",app="app-name"}
total:
metric: frontend_request_counter_total{app="app-name"}
Then it should show up in the list page of Pyrra too as team=pyrra
.
In the end the same label needs to be part of the alert
ALERTS{alertname="ErrorBudgetBurn1w", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here", namespace="your-namespace", exhaustion="1w", threshold="0.010", team="pyrra"}
I'll open a separate issue and we'll look into it. 👍
At last you need to configure your alertmanager to route the alerts correctly based on the team
label, as per our example.
from pyrra.
Good point. I'm not sure why it never came up.
We should add it indeed.
Would you mind opening another Issue for this one?
Thanks
from pyrra.
Hey, thanks for trying and opening this issue. I'm sure there a plenty of more things (I have such a long list in my head too), so feel free to continue opening things.
- I created #39 for the deletion issue
Could you give a bit more context about the labels? Is that just adding the common Kubernetes labels to the generated PrometheusRule CRs?
- We can totally make the alertnames configurable for sure. Keep in mind, that the alertname is just another label for the alert itself. So usually the alerts are identified by their label set too, like:
{alertname="ErrorBudgetBurn", job="kube-dns", long="30m", short="3m", slo="coredns-response-errors"}{alertname="ErrorBudgetBurn", job="kube-dns", long="3h", short="15m", slo="coredns-response-errors"}
Therefore I'm not super concerned about the alertname itself, but we can still happily make it configurable.
I'll rename this issue to be about the second part of the original comment, as I opened a separate issue for 1.
from pyrra.
Could you give a bit more context about the labels? Is that just adding the common Kubernetes labels to the generated PrometheusRule CRs?
Yes, precisely
from pyrra.
That sounds very reasonable @hsolberg 👍
Can you give a even more concrete example? I'd use that as the base line for implementing future unit tests to make sure things are properly propagated through all components.
from pyrra.
Thanks for the fast reply! Maybe I should create a separate issue for the frontend part since it's a separate thing? If you prefer to have "user-stories" tickets or specific to what part needs to be changed (frontend or logic for generating alerts for instance).
Frontend
Regarding concrete examples. After conferring with a colleague (developer) to cover some scenarios, something like this for the presentation in Pyrra (frontend) perhaps?
|__<group-1>
| |__<app-1>
| | |__ <metric-1>
| | |__ <metric-2>
| | .
| | .
| |__<app-2>
| | |__<metric-1>
| | .
|
|__<group-2>
| |__<app-1>
| | |__<metric-1>
Would look something like this example
|__team-api
| |__api-app-auth
| | |__auth-error-rate
| | |__latency
| | |__system-error-rate
| |__api-app-integration
| | |__system-error-rate
|
|__team-frontend
| |__frontend-app
| | |__frontend-error-rate
| | |__frontend-latency
Alerts
As for the alerts, I already see the label slo
being added based on what we put in the name-field in metadata (see example below). So that part should be easy to differentiate when the new Prometheus filter is added in the next version (hopefully). And the possibility to do use pre-filtered links from Pyrra in the near future. That only leaves the need for including namespace in the alert rule as well as possibility of adding more (custom).
apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
name: your-slo-name-here
labels:
prometheus: k8s
role: alert-rules
spec:
target: '99.0'
window: 7d
indicator:
ratio:
errors:
metric: frontend_request_counter_total{status=~"5..",app="app-name"}
total:
metric: frontend_request_counter_total{app="app-name"}
Looking at the alert when firing with the rules above (this is the same as the last entry in the table Pyrra shows):
ALERTS{alertname="ErrorBudgetBurn", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here"}
When comparing this to what Pyrra shows in the table, maybe this should be consistent with the labels added to the alert? Also, namespace is missing, but that's only visible in as a label under the objectives name and at the top of the objective you navigate to.
Since the alerts are generated it's a bit difficult to name them without being generic as they are now. One solution could be to add the exhaustion to the alertname as well as adding the labels. That way you get a quick overview without having to expand all the alerts in prometheus and use the labels to filter more. Adding namespace would also help with filtering, so maybe something like this?
ALERTS{alertname="ErrorBudgetBurn1w", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here", namespace="your-namespace", exhaustion="1w", threshold="0.010", team="your-team"}
The last label would be a custom one just to be able to have the target responder for the alert. I didn't want to change too much of how it works now as I'm not a 100% sure of what it ideally would look like. So I'm trying to stick to the bare minimum in the examples and see if anyone else has other views and suggestions. I'll need to think it over some more, but this is the gist of it. 😅
from pyrra.
Hi! A followup question, not sure if it should be asked here or in the closed ticket.
The namespace label is used in Pyrra UI etc. but it doesn't seem to be included in the generated alert-rules. Is this something that could be added by default? Or maybe it's already part of the recent changes to master that's not added to a specific release yet? 😄
from pyrra.
Related Issues (20)
- Using Kubernetes labels/annotation as Prometheus alert labels and annotations is limiting HOT 4
- 0s burnrate generated HOT 3
- Unexpected grouping for ratios HOT 1
- Issue with Grouping in Latency indicator HOT 3
- Google Managed Prometheus breaks on `partial_reponse` in queries. HOT 2
- Allow customisation of the name of the recording rules HOT 1
- Error Budget showing strange skew HOT 1
- Surface Pyrra version HOT 1
- Feature - Add link to Pyrra in alerts HOT 1
- Grafana Dashboard HOT 5
- Feature: Allow customisation of warning/critical alert levels HOT 2
- [Feature] Link Grafana Dashboard to Pyrra equivalent HOT 1
- Environment variables support in Pyrra HOT 1
- SLO dependencies / levels / hierarchy HOT 3
- Custom PropagationLabelsPrefix HOT 2
- No data Budget and Availability HOT 1
- Doubt - Error Budget
- Issue when showing some SLOs (white screen)
- Pyrra causes instability of Prometheus HOT 1
- Fix typo in Kubernetes example readme HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyrra.