Giter Club home page Giter Club logo

Comments (8)

hsolberg avatar hsolberg commented on May 23, 2024 1

Hi! We have a similar use-case. When more than one team use the same Pyrra and Prometheus-operator it's difficult to tell the alert-rules apart and Pyrra can get a little cluttered. Having some customization to include existing labels from Kubernetes and adding new labels as well would be great!

type Objective struct {
Labels map[string]string `json:"labels"`
Description string `json:"description"`
Target float64 `json:"target"`
Window int64 `json:"window"`
Config string `json:"config"`
Indicator *Indicator `json:"indicator,omitempty"`
}
// NewObjective instantiates a new Objective object
// This constructor will assign default values to properties that have it defined,
// and makes sure properties required by API are set, but the set of arguments
// will change when the set of required properties is changed
func NewObjective(labels map[string]string, description string, target float64, window int64, config string) *Objective {
this := Objective{}
this.Labels = labels
this.Description = description
this.Target = target
this.Window = window
this.Config = config
return &this
}

From rules.go#L103-L104

        ruleLabels := map[string]string{}
        ruleLabels["slo"] = sloName

Adding a new entry here for custom labels like this for instance

ruleLabels["team"] = your-team-name

Seems the recording rules are missing namespace as well, since namespace is added automatically to the Pyrra config, but not to the recording rules and therefor not visible in the alert rules.

// MultiBurnrateAlert struct for MultiBurnrateAlert
type MultiBurnrateAlert struct {
Severity string `json:"severity"`
For int64 `json:"for"`
Factor float64 `json:"factor"`
Short Burnrate `json:"short"`
Long Burnrate `json:"long"`
State string `json:"state"`
}

For Pyrra (frontend), maybe add some (optional) top-level grouping as well. That way you could add more SLOs for the same team and group them under a team page.

from pyrra.

metalmatze avatar metalmatze commented on May 23, 2024 1

Perfect. Thank you for that very in-depth comment. Indeed, we should probably track that in a separate issue to this one.
All that this really comes down to is propagating the labels all the way from the SLO CRD through Pyrra until they show up in the alerts.

Let's imagine this label is added in here:

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: your-slo-name-here
  labels:
    prometheus: k8s
    role: alert-rules
+   team: pyrra
spec:
  target: '99.0'
  window: 7d
  indicator:
    ratio:
      errors:
        metric: frontend_request_counter_total{status=~"5..",app="app-name"}
      total:
        metric: frontend_request_counter_total{app="app-name"}

Then it should show up in the list page of Pyrra too as team=pyrra.

In the end the same label needs to be part of the alert

ALERTS{alertname="ErrorBudgetBurn1w", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here", namespace="your-namespace", exhaustion="1w", threshold="0.010", team="pyrra"}

I'll open a separate issue and we'll look into it. 👍
At last you need to configure your alertmanager to route the alerts correctly based on the team label, as per our example.

from pyrra.

metalmatze avatar metalmatze commented on May 23, 2024 1

Good point. I'm not sure why it never came up.
We should add it indeed.
Would you mind opening another Issue for this one?
Thanks

from pyrra.

metalmatze avatar metalmatze commented on May 23, 2024

Hey, thanks for trying and opening this issue. I'm sure there a plenty of more things (I have such a long list in my head too), so feel free to continue opening things.

  1. I created #39 for the deletion issue

Could you give a bit more context about the labels? Is that just adding the common Kubernetes labels to the generated PrometheusRule CRs?

  1. We can totally make the alertnames configurable for sure. Keep in mind, that the alertname is just another label for the alert itself. So usually the alerts are identified by their label set too, like:
{alertname="ErrorBudgetBurn", job="kube-dns", long="30m", short="3m", slo="coredns-response-errors"}{alertname="ErrorBudgetBurn", job="kube-dns", long="3h", short="15m", slo="coredns-response-errors"}

Therefore I'm not super concerned about the alertname itself, but we can still happily make it configurable.

I'll rename this issue to be about the second part of the original comment, as I opened a separate issue for 1.

from pyrra.

rlex avatar rlex commented on May 23, 2024

Could you give a bit more context about the labels? Is that just adding the common Kubernetes labels to the generated PrometheusRule CRs?

Yes, precisely

from pyrra.

metalmatze avatar metalmatze commented on May 23, 2024

That sounds very reasonable @hsolberg 👍
Can you give a even more concrete example? I'd use that as the base line for implementing future unit tests to make sure things are properly propagated through all components.

from pyrra.

hsolberg avatar hsolberg commented on May 23, 2024

Thanks for the fast reply! Maybe I should create a separate issue for the frontend part since it's a separate thing? If you prefer to have "user-stories" tickets or specific to what part needs to be changed (frontend or logic for generating alerts for instance).

Frontend
Regarding concrete examples. After conferring with a colleague (developer) to cover some scenarios, something like this for the presentation in Pyrra (frontend) perhaps?

|__<group-1>
|  |__<app-1>
|  |  |__ <metric-1>
|  |  |__ <metric-2>
|  |  .
|  |  .
|  |__<app-2>
|  |  |__<metric-1>
|  |  .
|
|__<group-2>
|  |__<app-1>
|  |  |__<metric-1>

Would look something like this example

|__team-api
|  |__api-app-auth
|  |  |__auth-error-rate
|  |  |__latency
|  |  |__system-error-rate
|  |__api-app-integration
|  |  |__system-error-rate
|
|__team-frontend
|  |__frontend-app
|  |  |__frontend-error-rate
|  |  |__frontend-latency

Alerts
As for the alerts, I already see the label slo being added based on what we put in the name-field in metadata (see example below). So that part should be easy to differentiate when the new Prometheus filter is added in the next version (hopefully). And the possibility to do use pre-filtered links from Pyrra in the near future. That only leaves the need for including namespace in the alert rule as well as possibility of adding more (custom).

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: your-slo-name-here
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: '99.0'
  window: 7d
  indicator:
    ratio:
      errors:
        metric: frontend_request_counter_total{status=~"5..",app="app-name"}
      total:
        metric: frontend_request_counter_total{app="app-name"}

Looking at the alert when firing with the rules above (this is the same as the last entry in the table Pyrra shows):

ALERTS{alertname="ErrorBudgetBurn", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here"}

When comparing this to what Pyrra shows in the table, maybe this should be consistent with the labels added to the alert? Also, namespace is missing, but that's only visible in as a label under the objectives name and at the top of the objective you navigate to.

image

Since the alerts are generated it's a bit difficult to name them without being generic as they are now. One solution could be to add the exhaustion to the alertname as well as adding the labels. That way you get a quick overview without having to expand all the alerts in prometheus and use the labels to filter more. Adding namespace would also help with filtering, so maybe something like this?

ALERTS{alertname="ErrorBudgetBurn1w", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here", namespace="your-namespace", exhaustion="1w", threshold="0.010", team="your-team"}

The last label would be a custom one just to be able to have the target responder for the alert. I didn't want to change too much of how it works now as I'm not a 100% sure of what it ideally would look like. So I'm trying to stick to the bare minimum in the examples and see if anyone else has other views and suggestions. I'll need to think it over some more, but this is the gist of it. 😅

from pyrra.

hsolberg avatar hsolberg commented on May 23, 2024

Hi! A followup question, not sure if it should be asked here or in the closed ticket.
The namespace label is used in Pyrra UI etc. but it doesn't seem to be included in the generated alert-rules. Is this something that could be added by default? Or maybe it's already part of the recent changes to master that's not added to a specific release yet? 😄

from pyrra.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.