pyrra-dev / pyrra Goto Github PK

View Code? Open in Web Editor NEW

1.2K 1.2K 103.0 18.77 MB

Making SLOs with Prometheus manageable, accessible, and easy to use for everyone!

Home Page: https://demo.pyrra.dev

License: Apache License 2.0

Dockerfile 0.12% Makefile 0.57% Go 70.63% HTML 0.30% TypeScript 19.95% SCSS 2.02% JavaScript 3.40% Jsonnet 3.00%

docker golang kubernetes metrics monitoring prometheus slo thanos time-series

pyrra's People

Contributors

Stargazers

Watchers

Forkers

rira12621 paulfantom yeya24 aditya-konarde solidnerd vrutkovs anishasthana developgo ckavili pragmaticivan chlunde krishnakvr rigmad doytsujin isgasho pwalleni pathcl exaring unity-technologies manojvivek limess excoriate bhoslepu 4cecoder suyambuganesh82 wilfriedroset gaikanomer9 arthursens ragavendira1 mmazur akainocat rekup mroberts91 normalfaults suzana-nesic learningkeeper atosatto philyuchkoff fatsheep9146 peeyushgirdhar alex-omosa t106362512 gg-big-org thien-nguyenthanh amosd92 songjiayang s-diez roidelapluie m-messiah saswatamcode seyedk jthiatt asubiotto abestel sjkimberley joshcarp firecube-oss iq-scm akselleirv braderz sjanulonoks emanuelstanciu cloud-native-team luis-sousa-pinto dotdc khestia mzupan harmw vladminzatu jmichalek132 rohlik squat tech1ndex cyh4157 itayreshef jorgelbg coralogix pulseinnovations taylormutch sepulworld bison devopstoday11 enjoygill twu szczepad kakkoyun djhuahao yairst sebastiangaiser pgmrey nishanthvasudevan eventellect saas-zach-dunton contentsquare samwright

pyrra's Issues

Use Kubernetes Recommended Labels

Hello, and thanks for great tool!
I think it needs two things:

Deletion of generated rules after pyrra object removal. Right now they are being left as is. What about adding labels like

managed-by=pyrra
pyrra-slo-name=prometheus-http-error

And some controller to ensure rules are in place / removed?

Adjust name of alert, adding prefix / suffix to alertname. Atm all alerts are "ErrorBudgetBurn"

Automatically refresh on Detail page

Especially during development, I often find myself refreshing the Detail page to get the latest numbers. Basically developing a new feature or improving performance I want to see how those changes are doing against an SLO.

It's kind of a waste to reload the entire page so having something to refresh graphs and numbers every so often would be helpful.

Error budget graph gradient is slightly wrong

If the error budget crosses 0 we want to color the filled area either in red or green.
To make that work we calculate a gradient that vertically runs from the top to the bottom of the graph and should split right where the 0 is.
I tried for quite some time to get it right and in the end, I wasn't able to make it happen. Something about the offsets and paddings etc.

I left some TODOs in the code and would be happy to get that reviewed by others:

pyrra/ui/src/components/graphs/ErrorBudgetGraph.tsx

Lines 77 to 86 in 920b332

 // TODO: This seems "good enough" but sometimes the gradient still reaches the wrong side. 

 // Maybe it's a floating point thing? 

 const zeroPercentage = 1 - (0 - min) / (max - min) 

 const gradient = u.ctx.createLinearGradient(width / 2, canvasPadding - 2, width / 2, height - canvasPadding) 

 gradient.addColorStop(0, `#${greens[0]}`) 

 gradient.addColorStop(zeroPercentage, `#${greens[0]}`) 

 gradient.addColorStop(zeroPercentage, `#${reds[0]}`) 

 gradient.addColorStop(1, `#${reds[0]}`) 

 return gradient

Standalone cli tool able to transform SLO definitions into prom rules

I'd like to be able to use pyrra's functionality in an environment that does not match how pyrra is supposed to be deployed. To that end I'm interested in the following feature: a build of (parts of) pyrra as a standalone CLI binary capable (at least initially) of transforming files containing SLO definitions into files containing prom rules.

Example usage:
pyrratool slo2prom -i slo.yaml -o promrules.yaml

Improve objectiveHTTPRatioGrouping test object

Since the objectiveHTTPRatio has a concrete job label given, it doesn't semantically make sense to group by that label as test case for objectiveHTTPRatioGrouping.

Remove the job grouping here:

pyrra/slo/promql_test.go

Line 41 in afa28f6

o.Indicator.Ratio.Grouping = []string{"job", "handler"}

Show firing alerts for SLOs in list/table view

We should be able to, next to listing the SLOs in the table, also query ALERTS and match the firing alerts with the SLOs in the table and thus show on the overview/list view already, which SLOs might have firing alerts.
This should make it very handy if there's an ongoing incident.

Team label in URL doesn't remove illegal chars

Hi!

Found a bug after we added the team-label to SLOs.

pyrra.dev/team: team

When you select the SLO from the frontpage the URL you're sent to looks like this:
https://pyrra.dev/objectives?expr={__name__=%22example-error-rate%22,%20namespace=%22example%22,%20pyrra.dev/team=%22team%22}&grouping={}

This results in an error because pyrra.dev/team contains a .
"1:115: parse error: unexpected character inside braces: '.'"

Removing the . from the URL results in the same error, only now the problem is /.
"1:118: parse error: unexpected character inside braces: '/'"
Removing / as well from the URL resolves the problem.

The URL that worked:
https://pyrra.dev/objectives?expr={__name__=%22example-error-rate%22,%20namespace=%22example%22,%20pyrradevteam=%22team%22}&grouping={}

Is it possible to filter this out from the generated URLs from the front page of Pyrra? Or maybe do the same as for alerts? Removing the pyrra.dev/ prefix that is.

If we click the label team on the frontpage to filter on team it works without any issues, but the filter doesn't use braces so that might be why it's not affected. The URL looks like this:
https://pyrra.dev/?filter=%7Bpyrra.dev/team=%22team%22%7D

Panic in GetObjectiveStatus

2022/03/25 15:51:49 http: panic serving 100.64.0.184:43928: runtime error: invalid memory address or nil pointer dereference
goroutine 3147 [running]:
net/http.(*conn).serve.func1()
	/usr/local/go/src/net/http/server.go:1802 +0xb9
panic({0x15c0de0, 0x27f68b0})
	/usr/local/go/src/runtime/panic.go:1047 +0x266
main.(*ObjectivesServer).GetObjectiveStatus(0xc00026a0a0, {0x1bd3af8, 0xc003bd2510}, {0xc0038ecd40, 0x36}, {0xc003b17bfe, 0x0})
	/workspace/main.go:491 +0xed6
github.com/pyrra-dev/pyrra/openapi/server/go.(*ObjectivesApiController).GetObjectiveStatus(0xc00026e078, {0x1bc4208, 0xc00026e3c0}, 0xc003b64a00)
	/workspace/openapi/server/go/api_objectives.go:141 +0x164
net/http.HandlerFunc.ServeHTTP(0x4, {0x1bc4208, 0xc00026e3c0}, 0x0)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/pyrra-dev/pyrra/openapi/server/go.Logger.func1({0x1bc4208, 0xc00026e3c0}, 0xc003b64a00)
	/workspace/openapi/server/go/logger.go:22 +0x9e
net/http.HandlerFunc.ServeHTTP(0xc003bd2510, {0x1bc4208, 0xc00026e3c0}, 0x8)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/pyrra-dev/pyrra/openapi.MiddlewareLogger.func1.1({0x1bc4208, 0xc00026e3a8}, 0xc003b64a00)
	/workspace/openapi/server.go:176 +0xf4
net/http.HandlerFunc.ServeHTTP(0x7f5c167ada68, {0x1bc4208, 0xc00026e3a8}, 0xc0038e3dd0)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/pyrra-dev/pyrra/openapi.MiddlewareMetrics.func1.1({0x1bceca8, 0xc003f73ea0}, 0xc003bd2510)
	/workspace/openapi/server.go:161 +0xf4
net/http.HandlerFunc.ServeHTTP(0xc003b64900, {0x1bceca8, 0xc003f73ea0}, 0x4)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000242e40, {0x1bceca8, 0xc003f73ea0}, 0xc003b64800)
	/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0x1cf
net/http.StripPrefix.func1({0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/usr/local/go/src/net/http/server.go:2090 +0x330
net/http.HandlerFunc.ServeHTTP(0xc003bd23f0, {0x1bceca8, 0xc003f73ea0}, 0xc003b1c628)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/go-chi/chi/v5.(*Mux).Mount.func1({0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:314 +0x19c
net/http.HandlerFunc.ServeHTTP(0x15b2c40, {0x1bceca8, 0xc003f73ea0}, 0xc003b1c620)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc00017e4e0, {0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:442 +0x216
net/http.HandlerFunc.ServeHTTP(0xc003bd23f0, {0x1bceca8, 0xc003f73ea0}, 0x450194)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc00017e4e0, {0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:71 +0x48d
github.com/go-chi/chi/v5.(*Mux).Mount.func1({0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:314 +0x19c
net/http.HandlerFunc.ServeHTTP(0x15b2c40, {0x1bceca8, 0xc003f73ea0}, 0xc003ae97a4)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc00017e480, {0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:442 +0x216
net/http.HandlerFunc.ServeHTTP(0xc00037cbe0, {0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/go-chi/cors.(*Cors).Handler.func1({0x1bceca8, 0xc003f73ea0}, 0xc003b64700)
	/go/pkg/mod/github.com/go-chi/[email protected]/cors.go:228 +0x1bd
net/http.HandlerFunc.ServeHTTP(0x1bd3a50, {0x1bceca8, 0xc003f73ea0}, 0x27f63c0)
	/usr/local/go/src/net/http/server.go:2047 +0x2f
github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc00017e480, {0x1bceca8, 0xc003f73ea0}, 0xc003b64600)
	/go/pkg/mod/github.com/go-chi/chi/[email protected]/mux.go:88 +0x442
net/http.serverHandler.ServeHTTP({0xc003bd2330}, {0x1bceca8, 0xc003f73ea0}, 0xc003b64600)
	/usr/local/go/src/net/http/server.go:2879 +0x43b
net/http.(*conn).serve(0xc0036d1b80, {0x1bd3af8, 0xc00001d320})
	/usr/local/go/src/net/http/server.go:1930 +0xb08
created by net/http.(*Server).Serve
	/usr/local/go/src/net/http/server.go:3034 +0x4e8

pyrra/main.go

Line 491 in a7ccfce

s.Availability.Errors = float64(v.Value)

First idea is that NaN isn't possible to be casted?

Show loading state for availability and error budget

While loading availability and error budget (which is the same request, so they always finished together) we don't show a spinner or any indication of loading happening in the backround. Instead it looks somewhat broken.

We can probably re-use the same spinner as for the other components on the UI and make it 2x bigger than the others and that should be good for now!

Feature: Support for dark mode

Hi! Great job so far! Just wanted to check if support for dark mode is something that could be considered in a future release.

Read SLO from Kubernetes but write rules to filesystem

We've built a huge Thanos and Prometheus cluster to run the monitoring stack for our production without using the operator,
We do not have the CRD for writing rules, is there a way to read the configs from kubernetes but write the result rules to filesystem?

Interactive SLO creation & editing form

Rather than having users always start with an abstract configuration file, we want to have an interactive form where users can configure the objective, the objective's window and the underlying SLI.
We want to have graphs and stats for availability and thus error budget live update, when the configuration is changed.
This will be most helpful if the Prometheus instance already has historical data for the to be created SLO. Additionally, updating SLOs will have a much more visible aspect of knowing how the update would have influenced the SLO in the past and thus maybe how it will behave in the future.

The form is to be design and discussed in the future.

Improve 'Installation' docs

Some things I wish that were more readily available in the README.md:

Short top-level explanation about how Pyrra works
Short description about the different modes (filesystem vs kubernetes vs kubernetes & config-map-mode)
- Note about dependency on the Prometheus Operator
- Link to kubernetes example

I don't think it's a massive issue since everything can be figured out, but it took me some time and digging through the repo. I think we can make it a bit easier for new users when we extend the docs in that regard.

Support for existing SLOs present in k8s env from record rules.

Tried the application, nice interface

have a question regarding importing/adding support for existing SLOs into the interface.

eg-> , we have a CI git ops setup for defining SLOs which generate prom record rules in k8s cluster (using sloth, is it possible to visualize those already present SLOs in pyrra interface.

running locally using `filesystem` fails

$ ./bin/filesystem
2021/08/02 16:37:23 lstat /etc/pyrra: no such file or directory

This may be a little bit of an issue in general but specifically on MacOS it is rather uncommon to use /etc for anything non-system.
Should probably be changed from a hardcoded path here

pyrra/cmd/filesystem/main.go

Line 32 in 3bbe879

filenames, err := filepath.Glob("/etc/pyrra/*.yaml")

to something configurable.

Additionally it would be good to add a note of documentation, what exactly is required in the yaml file.

Add recording rules for availability pre-computation

It would be really helpful to create a recording rule for error budgets.
Right now, the availability and error budget are always calculated from scratch (and then cached) which takes a lot of time when done across 2w or 4w for example (depending on the series cardinality and sample size).

We should explore creating a recording rule for each SLO that is then super lightweight to be queried (one series where for availability and error budget it can be an instant query).
The down side is, that if an SLO is changed, the recording rules history drastically differs, most likely. This might throw off users. Would that be a problem?

As for an implementation, we should look at the same approach that the kubernetes-mixin uses for the apiserver SLO. It effectively splits up the recording rule into two. The first recording rule level evaluates the average across a short time range and the second one then simply reads these pre-aggregated recroding rule series to get the overall availability and error budget.

Instrument Pyrra itself with metrics

So far we don't have any metrics... 😅
We should add some Go and HTTP default metrics and can then start adding more application specific ones later on.

Additionally, it'd be great to provide SLOs for Pyrra itself based on these metrics. 😎

Gauge metrics suport by Pyrra

At the moment, Pyrra suports only metrics-counters. It would be great if possible to have support for gauge metrics as well.
For example, blackbox exporter "probe_success" metric returns 1 (success) or 0 (fail). Is it possible to add formulas to Pyrra to create SLO graph based on such data?
Perhaps this can be achieved using the "count_over_time" function, and the number of successful attempts - with "sum_over_time"?

Thank you)

Panic reloading Prometheus

pyrra-filesystem_1  | goroutine 70 [running]:
pyrra-filesystem_1  | main.cmdFilesystem.func7()
pyrra-filesystem_1  | 	/workspace/filesystem.go:237 +0x3ef
pyrra-filesystem_1  | github.com/oklog/run.(*Group).Run.func1({0xc00052a440, 0xc0005400f0})
pyrra-filesystem_1  | 	/go/pkg/mod/github.com/oklog/[email protected]/group.go:38 +0x2f
pyrra-filesystem_1  | created by github.com/oklog/run.(*Group).Run
pyrra-filesystem_1  | 	/go/pkg/mod/github.com/oklog/[email protected]/group.go:37 +0x22f
pyrra-filesystem_1  | level=info ts=2022-03-23T00:57:50.446907165Z caller=main.go:100 msg="using Prometheus" url=http://localhost:9090
pyrra-filesystem_1  | level=info ts=2022-03-23T00:57:50.447024576Z caller=filesystem.go:113 msg="watching directory for changes" directory=/etc/pyrra
pyrra-filesystem_1  | level=info ts=2022-03-23T00:57:50.447638782Z caller=filesystem.go:265 msg="starting up HTTP API" address=:9444
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.451548833Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/caddy-response-errors.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.458684776Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/caddy-response-latency.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.462586082Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/parca-grpc-profilestore-errors.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.465282016Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/parca-grpc-profilestore-latency.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.468134747Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/prometheus-http-errors.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.47298123Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/prometheus-rule-evaluation-failures.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.475449587Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/pyrra-demo-hourly.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:50.477591209Z caller=filesystem.go:155 msg=reading file=/etc/pyrra/pyrra-demo-random.yaml
pyrra-filesystem_1  | level=debug ts=2022-03-23T00:57:55.480652213Z caller=filesystem.go:231 msg="reloading Prometheus now"
pyrra-filesystem_1  | level=warn ts=2022-03-23T00:57:55.482150924Z caller=filesystem.go:235 msg="failed to reload Prometheus"
pyrra-filesystem_1  | panic: runtime error: invalid memory address or nil pointer dereference
pyrra-filesystem_1  | [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x14647cf]

Improve spacing between headline, text and graph

Update examples manifests

The CRD spec has been changed but the files in the examples folder haven't been updated.

kubectl apply -f examples/nginx.yaml

Error from server (Invalid): error when creating "examples/nginx.yaml": ServiceLevelObjective.pyrra.dev "nginx-api-errors" is invalid: spec.indicator: Required value
Error from server (Invalid): error when creating "examples/nginx.yaml": ServiceLevelObjective.pyrra.dev "nginx-api-latency" is invalid: spec.indicator: Required value

Delete generated PrometheusRule and rule files

Both for filesystem and Kubernetes we need to reconcile the generated files so that not only new ones are created or existing ones are updated, but no longer existing ones are deleted.

The tricky part is figuring out the difference of what's gone, but it should be doable nonetheless.

PR Check "Docker Push" Failing

The PR check to push an image to ghcr is failing like in #219 .
The permissions seem to not be on point for the account that's used as part of the workflow

Collapse the config at the bottom of the detail page by default

In a conversation earlier today @brancz mentioned that the config isn't super useful to constantly be shown.
We can collapse the config by default and show it (expand) on clicking the headline or some icon next to it.

UI: Show warning if volume is below error budget

If someone sets an objective of 99% that means that out of 100 events only 1 can fail.
If the volume/amount of events that happend for that SLO is less than 100 it becomes problematic.

I'd even say, that probably below 1000 requests in total is problematic. Although then it becomes hard to draw the line of how few events are acceptable when it isn't anymore.

At least we should show are warning if (error budget) * (volume) < 1 or (1 - objective) * (volume) < 1.

Concrete example could be:
The objective is to have 99.5% over 4w. Now, in 4w the service only had 135 requests.
Therefore (1 - 0.995) * 135 = 0.675 which is less than 1. It means that just one bad requests exhausts the entire error budget.
In that case, we should show a warning that the objective's target is too high for the few events the service has.

Setup CI to automatically run tests and build container images

We don't have a CI setup yet.
Let's use the GitHub Actions to push images to the GitHub container registry.

SLOs with latency should have their latency objective shown

Right now, if you have a latency SLO the target latency is only to be found in the config of that SLO.
We can do better and bring forward that latency objective like 99% within 1s and show it more prominently on the detail and list page.

Old image tag on v0.4.0 release

Hi! Seems the tag in the v0.4.0 release needs to be updated in the config as well. Both api.yaml and kubernetes.yaml show the previous version (v0.3.4)

image: ghcr.io/pyrra-dev/pyrra:v0.3.4

./config/kubernetes.yaml#L36
./config/api.yaml#L35

The example in README.md also use the previous version.
README.md?plain=1#L30

Pyrra deployment with Helm

Hi!

Are there any plans to offer Pyrra as a single Helm chart? If not, would you accept contributions in this area? I would be glad to contribute in this area if you accept contributions.

My main motivation would be to offer more deployments capabilities for Pyrra application on Kubernetes, making easier to distribute and re-use the Helm chart across multiple environments.

Thanks in advance! 🙏

Pyrra `ListObjectives` route returns 500 if SLO is created with invalid metrics

We tried creating an SLO with the following spec:

spec:
  target: "99.9"
  description: "Success ratio of workspace backups"
  window: 4w
  indicator:
    ratio:
      errors:
        metric: gitpod_ws_manager_workspace_backups_failure_total
      total:
        metric: (gitpod_ws_manager_workspace_backups_failure_total + gitpod_ws_manager_workspace_backups_success_total)

We made a mistake here when we assumed that a query could work instead of a single metric.

The problem is that the admission controller accepted the SLO, and after that all other SLOs we had stopped showing up in the ListObjectives route. We got confused at first, but after sometime we noticed the 500s showing up in the logs. We deleted this problematic SLO and 500s disappeared.

Accepting queries instead of a specific metric might be reasonable in some use cases, but that is not the point of this issue 😅. I believe it would be a better experience if the admission controller rejected the SLO during creation time, or if Pyrra UI could handle invalid SLOs without returning 500s.

Propagating labels already containing a subdomain prefix causes an error

Labels in Kubernetes conform to RFC 1123 (docs). This means that using the prefix pyrra.dev/ disables the option of propagating a label that already contains a subdomain prefix since this breaks the RFC constraints.

Could a possible option be to use pyrra.dev- or simply pyrra- as the prefix for propagating labels?

Dragging in graphs should update timestamps overall

We should when dragging and selecting a specific time range update the URL to reflect these time ranges, so sharing the URL would be possible.

Wrap errors with more context

I think we could provide a bit more context around the returned errors. Right now there are a lot of these kind of error returns:

	if err := r.Create(ctx, newRule); err != nil {
		return ctrl.Result{}, err
	}

// ...

	if err := r.Update(ctx, newRule); err != nil {
		return ctrl.Result{}, err
	}

In this specific example it might not be immediately clear in the logs if an error occurred during the the Create or Update step.

My suggestion would be to wrap errors with minimal context.

Generate alert message annotation

We should generate some helpful message as annotation for alerts.

In an ideal scenario, I'd like to see the current amount of left error budget, how quickly the error budget would be exhausted given the burnrate. Maybe some extra information I can't think of right now.

Release arm64 image

Hello ☀️

Would it be possible to release a multi-arch image which includes an arm64 build? We're currently migrating our clusters to arm64 nodes and this would fit very well on there 👌

Graphs for burn rate alerts

The table at the bottom of the detail page has the multiple burn rate alerts listed. These are each made of 2 alerts and will alert if the burn rate gets above a certain threshold.
We should add a graph showing both, short and long, burn rates with the threshold, so it's easy to tell how bad the error budget burn really is.
Additionally, we can have a dynamic text explaining what it means if the alert is firing. Something along the lines of:

This alert firing means that both the 6 hour and 3 day burn rates are above a threshold of 1. If the error budget continues to be burned at this rate, all the error budget will be burned after 4w

Given 4w is the objective's window.

By default that graphs aren't shown nor queried, however, users can expand the rows in the table to show the individual ones. Plus, if any of the alerts are firing, we should show these graphs right away.

left edge of numbers is cut off in error budget view

The numbers on the y axis are cut off, misleading a tiny bit since the - is missing

CI: Generate OpenAPI files in CI and diff against files

We should make sure to consistently have the generated files checked in.

Design graph hover

Currently, it's the theme's default state. We want to properly design it.

Docker-compose Example: not seeing alerts and rules in prometheus

Hi
I followed the docker-compose example, the Pyrra UI works, but cannot see the alerts or rules in prometheus.
Is this expected?
thanks

Documentation link is broken

Link in https://github.com/pyrra-dev/pyrra#documentation is broken :)

Upper and lower boundry of error count both = 0

Upper and lower boundry of error count both = 0. Either the first char is cut off or something is off with the numbers

Support basePath other than `/`

Sometimes you don't want to run Pyrra's UI on the / base path but on something like /pyrra so it can be run behind a proxy.
We need to support that in the React UI and pass a flag from the CLI to the UI via HTML <script> tags, probably.

Support for OpenSLO format

It would be nice to use OpenSLO format to define SLO instead or alongside the current specification.

Combine api, kubernetes and filesystem binaries into one binary with subcommands

We don't really need 3 different binaries and this will make things so much easier for users. Also only one single Docker container needs to be built.

Fix prometheus-url flag for filesystem

The Prometheus client that's then used by the Prom API client passed here doesn't use the correct Prometheus API URL.

pyrra/main.go

Line 112 in 3a58b4e

 code = cmdFilesystem(logger, reg, client, CLI.Filesystem.ConfigFiles, CLI.Filesystem.PrometheusFolder) 

Add label propagation to Pyrra

We want to support propagating the labels all the way from the SLO CRD through Pyrra until they show up in the alerts in alertmanager so the alerts can be routed to the correct receivers.

Let's imagine this label is added in here:

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: your-slo-name-here
  labels:
    prometheus: k8s
    role: alert-rules
+   team: pyrra
spec:
  target: '99.0'
  window: 7d
  indicator:
    ratio:
      errors:
        metric: frontend_request_counter_total{status=~"5..",app="app-name"}
      total:
        metric: frontend_request_counter_total{app="app-name"}

Then it should show up in the list page of Pyrra too as team=pyrra.

In the end the same label needs to be part of the alert

ALERTS{alertname="ErrorBudgetBurn1w", alertstate="firing", long="1d", severity="warning", short="1h30m", slo="your-slo-name-here", namespace="your-namespace", exhaustion="1w", threshold="0.010", team="pyrra"}

At last alertmanager needs to be configured to route the alerts correctly based on the team label.

Original comment: #38 (comment)

Open questions

I'm not 100% sure if we really want to add the label to the metadata.labels. It could be better to have a simliar thing to what Deployments and StatefulSets do, where we would have a spec.metadata.labels. So we can have separate labels for Kubernetes and the SLO itself.

	// TODO: This seems "good enough" but sometimes the gradient still reaches the wrong side.
	// Maybe it's a floating point thing?
	const zeroPercentage = 1 - (0 - min) / (max - min)

	const gradient = u.ctx.createLinearGradient(width / 2, canvasPadding - 2, width / 2, height - canvasPadding)
	gradient.addColorStop(0, `#${greens[0]}`)
	gradient.addColorStop(zeroPercentage, `#${greens[0]}`)
	gradient.addColorStop(zeroPercentage, `#${reds[0]}`)
	gradient.addColorStop(1, `#${reds[0]}`)
	return gradient