Giter Club home page Giter Club logo

Comments (16)

slok avatar slok commented on July 21, 2024 1

Hi @VCuzmin!

Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.

Imagine this metric: http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}. You could make the SLI queries like this:

  • Total: rate(http_request_total[{{.window}}]).
  • Error: rate(http_request_total[{{.window}},error="true"]).

This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.

  • sli_reuslt_1{handler="handler1"}
  • sli_reuslt_2{handler="handler2"}
  • ...
  • sli_reuslt_n{handler="handlerN"}

This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.

On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:

  • Total: sum(rate(http_request_total[{{.window}}])).
  • Error: sum(rate(http_request_total[{{.window}},error="true"])).

To be sure that is this... Go to Prometheus and get your SLO results:

slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}

You should obtain a single metric, if there are multiple, the problem is the one I explained above.

from sloth.

slok avatar slok commented on July 21, 2024 1

If you didn't configure anything regarding the window, you are using the default 30-day window.

The spec (also called manifest) is basically the YAML file you used to Define your service SLOs.

Regarding your problem, In that Dashboard you have 2 problems:

  • The NaN: This is common when there are no metrics (that's why I asked for the raw Prometheus graph). NaN is not 0.
  • The multiple NaN (That's why I asked for the Grafana version and Grafana dashboard revision).

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Thank you for your explanations but that didn't fix the issue. So let's give you more context...
This is my sli

image

This is the result of your query(from your last comment) and as you can see there is a single metric ... but the problem is not fixed though

image

Thank you!

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Please don't forget about me!

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Hi @VCuzmin!

Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.

Imagine this metric: http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}. You could make the SLI queries like this:

  • Total: rate(http_request_total[{{.window}}]).
  • Error: rate(http_request_total[{{.window}},error="true"]).

This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.

  • sli_reuslt_1{handler="handler1"}
  • sli_reuslt_2{handler="handler2"}
  • ...
  • sli_reuslt_n{handler="handlerN"}

This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.

On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:

  • Total: sum(rate(http_request_total[{{.window}}])).
  • Error: sum(rate(http_request_total[{{.window}},error="true"])).

To be sure that is this... Go to Prometheus and get your SLO results:

slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}

You should obtain a single metric, if there are multiple, the problem is the one I explained above.

Please don't forget about me!

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Hi @VCuzmin!

Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.

Imagine this metric: http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}. You could make the SLI queries like this:

  • Total: rate(http_request_total[{{.window}}]).
  • Error: rate(http_request_total[{{.window}},error="true"]).

This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.

  • sli_reuslt_1{handler="handler1"}
  • sli_reuslt_2{handler="handler2"}
  • ...
  • sli_reuslt_n{handler="handlerN"}

This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.

On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:

  • Total: sum(rate(http_request_total[{{.window}}])).
  • Error: sum(rate(http_request_total[{{.window}},error="true"])).

To be sure that is this... Go to Prometheus and get your SLO results:

slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}

You should obtain a single metric, if there are multiple, the problem is the one I explained above.

I replied here: #220 (comment)

from sloth.

slok avatar slok commented on July 21, 2024

Please don't forget about me!

😄

I'll need a little bit more of information please:

  • Grafana version
  • Grafana dashboard revision.
  • Sloth version
  • Your SLO spec
  • In case you are using a custom SLO window, what window is it.
  • You metrics SLO graphs for:
    • slo:sli_error:ratio_rate1h
    • slo:period_error_budget_remaining:ratio
    • slo:error_budget:ratio

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Please don't forget about me!

😄

I'll need a little bit more of information please:

  • Grafana version

  • Grafana dashboard revision.

  • Sloth version

  • Your SLO spec

  • In case you are using a custom SLO window, what window is it.

  • You metrics SLO graphs for:

    • slo:sli_error:ratio_rate1h
    • slo:period_error_budget_remaining:ratio
    • slo:error_budget:ratio

There is a lot of information you want to be provided...do you have a slack account? Maybe it's more convenient to talk there... this way I can share my screen

from sloth.

saladar avatar saladar commented on July 21, 2024

also have same issues.

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Can you help me pls?

from sloth.

slok avatar slok commented on July 21, 2024

Hi @VCuzmin!

I know that you have this problem 😄, I'm trying to help you and do my best, however, you should know that I maintain Sloth in my free time (and other projects).

I have Slack, anyhow other people (like @saladar) can have this same problem, so making it async and in public would benefit the community.

If you have any problem making public any of the data I asked for, don't worry and omit that data.

from sloth.

VasileCuzmin avatar VasileCuzmin commented on July 21, 2024

I'm the previous VCuzmin user. I just had a issue with that account and logged with the current one...
Well, how can you help me after all? Why that label is not working? Thanks

from sloth.

slok avatar slok commented on July 21, 2024

Without data I can't help you

from sloth.

jesusvazquez avatar jesusvazquez commented on July 21, 2024

I'm thinking that this could be one of the cases of https://www.robustperception.io/get-thee-to-a-nannary Please read that carefully

Also, can you please check if the metrics you've specified here #220 (comment) have data?

But again, its very hard to provide help without more data

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Ok. Thank you for the answer! I will be back with more data. But I need more info...
I don t know where I can find these: my SLO spec, and in the case of a custom window, what window is it

thanks!

from sloth.

VCuzmin avatar VCuzmin commented on July 21, 2024

Hello! I upgraded the Grafana image from 8.2.3 to 8.3.3 (helm release 6.20.1) and the dashboard is working now! I hope in the future the problem will not come up again! I will close the issue! Thank you for your help!

from sloth.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.