As in the image below, the value of the "Remaining error budget (30d window)" label is

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Please don't forget about me! 😄 <p d

Please don't forget about me! <p dir="auto

The value of the "Remaining error budget (30d window)" label is not properly shown about sloth HOT 16 CLOSED

slok commented on July 21, 2024

The value of the "Remaining error budget (30d window)" label is not properly shown

from sloth.

Comments (16)

slok commented on July 21, 2024 1

Hi @VCuzmin!

Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.

Imagine this metric: http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}. You could make the SLI queries like this:

Total: rate(http_request_total[{{.window}}]).
Error: rate(http_request_total[{{.window}},error="true"]).

This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.

sli_reuslt_1{handler="handler1"}
sli_reuslt_2{handler="handler2"}
...
sli_reuslt_n{handler="handlerN"}

This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.

On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:

Total: sum(rate(http_request_total[{{.window}}])).
Error: sum(rate(http_request_total[{{.window}},error="true"])).

To be sure that is this... Go to Prometheus and get your SLO results:

slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}

You should obtain a single metric, if there are multiple, the problem is the one I explained above.

from sloth.

slok commented on July 21, 2024 1

If you didn't configure anything regarding the window, you are using the default 30-day window.

The spec (also called manifest) is basically the YAML file you used to Define your service SLOs.

Regarding your problem, In that Dashboard you have 2 problems:

The NaN: This is common when there are no metrics (that's why I asked for the raw Prometheus graph). NaN is not 0.
The multiple NaN (That's why I asked for the Grafana version and Grafana dashboard revision).

from sloth.

VCuzmin commented on July 21, 2024

Thank you for your explanations but that didn't fix the issue. So let's give you more context...
This is my sli

This is the result of your query(from your last comment) and as you can see there is a single metric ... but the problem is not fixed though

Thank you!

from sloth.

VCuzmin commented on July 21, 2024

Please don't forget about me!

from sloth.

VCuzmin commented on July 21, 2024

Hi @VCuzmin!

Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.

Imagine this metric: http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}. You could make the SLI queries like this:

Total: rate(http_request_total[{{.window}}]).

Error: rate(http_request_total[{{.window}},error="true"]).

This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.

sli_reuslt_1{handler="handler1"}

sli_reuslt_2{handler="handler2"}

...

sli_reuslt_n{handler="handlerN"}

This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.

On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:

Total: sum(rate(http_request_total[{{.window}}])).

Error: sum(rate(http_request_total[{{.window}},error="true"])).

To be sure that is this... Go to Prometheus and get your SLO results:

slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}

You should obtain a single metric, if there are multiple, the problem is the one I explained above.

Please don't forget about me!

from sloth.

VCuzmin commented on July 21, 2024

Hi @VCuzmin!

Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.

Imagine this metric: http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}. You could make the SLI queries like this:

Total: rate(http_request_total[{{.window}}]).

Error: rate(http_request_total[{{.window}},error="true"]).

This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.

sli_reuslt_1{handler="handler1"}

sli_reuslt_2{handler="handler2"}

...

sli_reuslt_n{handler="handlerN"}

This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.

On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:

Total: sum(rate(http_request_total[{{.window}}])).

Error: sum(rate(http_request_total[{{.window}},error="true"])).

To be sure that is this... Go to Prometheus and get your SLO results:

slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}

You should obtain a single metric, if there are multiple, the problem is the one I explained above.

I replied here: #220 (comment)

from sloth.

slok commented on July 21, 2024

Please don't forget about me!

😄

I'll need a little bit more of information please:

Grafana version
Grafana dashboard revision.
Sloth version
Your SLO spec
In case you are using a custom SLO window, what window is it.
You metrics SLO graphs for:
- slo:sli_error:ratio_rate1h
- slo:period_error_budget_remaining:ratio
- slo:error_budget:ratio

from sloth.

VCuzmin commented on July 21, 2024

Please don't forget about me!

😄

I'll need a little bit more of information please:

Grafana version

Grafana dashboard revision.

Sloth version

Your SLO spec

In case you are using a custom SLO window, what window is it.

You metrics SLO graphs for:

slo:sli_error:ratio_rate1h

slo:period_error_budget_remaining:ratio

slo:error_budget:ratio

There is a lot of information you want to be provided...do you have a slack account? Maybe it's more convenient to talk there... this way I can share my screen

from sloth.

saladar commented on July 21, 2024

also have same issues.

from sloth.

VCuzmin commented on July 21, 2024

Can you help me pls?

from sloth.

slok commented on July 21, 2024

Hi @VCuzmin!

I know that you have this problem 😄, I'm trying to help you and do my best, however, you should know that I maintain Sloth in my free time (and other projects).

I have Slack, anyhow other people (like @saladar) can have this same problem, so making it async and in public would benefit the community.

If you have any problem making public any of the data I asked for, don't worry and omit that data.

from sloth.

VasileCuzmin commented on July 21, 2024

I'm the previous VCuzmin user. I just had a issue with that account and logged with the current one...
Well, how can you help me after all? Why that label is not working? Thanks

from sloth.

slok commented on July 21, 2024

Without data I can't help you

from sloth.

jesusvazquez commented on July 21, 2024

I'm thinking that this could be one of the cases of https://www.robustperception.io/get-thee-to-a-nannary Please read that carefully

Also, can you please check if the metrics you've specified here #220 (comment) have data?

But again, its very hard to provide help without more data

from sloth.

VCuzmin commented on July 21, 2024

Ok. Thank you for the answer! I will be back with more data. But I need more info...
I don t know where I can find these: my SLO spec, and in the case of a custom window, what window is it

thanks!

from sloth.

VCuzmin commented on July 21, 2024

Hello! I upgraded the Grafana image from 8.2.3 to 8.3.3 (helm release 6.20.1) and the dashboard is working now! I hope in the future the problem will not come up again! I will close the issue! Thank you for your help!

from sloth.

The value of the "Remaining error budget (30d window)" label is not properly shown about sloth HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent