Comments (16)
Hi @VCuzmin!
Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.
Imagine this metric: http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}
. You could make the SLI queries like this:
- Total:
rate(http_request_total[{{.window}}])
. - Error:
rate(http_request_total[{{.window}},error="true"])
.
This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.
sli_reuslt_1{handler="handler1"}
sli_reuslt_2{handler="handler2"}
- ...
sli_reuslt_n{handler="handlerN"}
This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.
On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:
- Total:
sum(rate(http_request_total[{{.window}}]))
. - Error:
sum(rate(http_request_total[{{.window}},error="true"]))
.
To be sure that is this... Go to Prometheus and get your SLO results:
slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}
You should obtain a single metric, if there are multiple, the problem is the one I explained above.
from sloth.
If you didn't configure anything regarding the window, you are using the default 30-day
window.
The spec (also called manifest) is basically the YAML file you used to Define your service SLOs.
Regarding your problem, In that Dashboard you have 2 problems:
- The
NaN
: This is common when there are no metrics (that's why I asked for the raw Prometheus graph).NaN
is not0
. - The multiple
NaN
(That's why I asked for the Grafana version and Grafana dashboard revision).
from sloth.
Thank you for your explanations but that didn't fix the issue. So let's give you more context...
This is my sli
This is the result of your query(from your last comment) and as you can see there is a single metric ... but the problem is not fixed though
Thank you!
from sloth.
Please don't forget about me!
from sloth.
Hi @VCuzmin!
Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.
Imagine this metric:
http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}
. You could make the SLI queries like this:
- Total:
rate(http_request_total[{{.window}}])
.- Error:
rate(http_request_total[{{.window}},error="true"])
.This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.
sli_reuslt_1{handler="handler1"}
sli_reuslt_2{handler="handler2"}
- ...
sli_reuslt_n{handler="handlerN"}
This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.
On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:
- Total:
sum(rate(http_request_total[{{.window}}]))
.- Error:
sum(rate(http_request_total[{{.window}},error="true"]))
.To be sure that is this... Go to Prometheus and get your SLO results:
slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}
You should obtain a single metric, if there are multiple, the problem is the one I explained above.
Please don't forget about me!
from sloth.
Hi @VCuzmin!
Is very likely that your SLI query is being calculated with multiple labels as outputs instead of creating a single grouped query. Let me explain to you with an example.
Imagine this metric:
http_request_total{"handler"="{THE_HANDLER}", error="{TRUE|FALSE}}
. You could make the SLI queries like this:
- Total:
rate(http_request_total[{{.window}}])
.- Error:
rate(http_request_total[{{.window}},error="true"])
.This SLI would return multiple SLI results based on the "handler" label. So you would obtain multiple results on the Grafana.
sli_reuslt_1{handler="handler1"}
sli_reuslt_2{handler="handler2"}
- ...
sli_reuslt_n{handler="handlerN"}
This is totally fine for alerting or your custom views, but Sloth's generic dashboard would not support this.
On the other hand, you could fix it easily, by grouping as a single metric and remove the grouping labels:
- Total:
sum(rate(http_request_total[{{.window}}]))
.- Error:
sum(rate(http_request_total[{{.window}},error="true"]))
.To be sure that is this... Go to Prometheus and get your SLO results:
slo:sli_error:ratio_rate5m{sloth_service="{SLO_SERVICE},sloth_slo="{SLO_NAME}"}
You should obtain a single metric, if there are multiple, the problem is the one I explained above.
I replied here: #220 (comment)
from sloth.
Please don't forget about me!
😄
I'll need a little bit more of information please:
- Grafana version
- Grafana dashboard revision.
- Sloth version
- Your SLO spec
- In case you are using a custom SLO window, what window is it.
- You metrics SLO graphs for:
slo:sli_error:ratio_rate1h
slo:period_error_budget_remaining:ratio
slo:error_budget:ratio
from sloth.
Please don't forget about me!
😄
I'll need a little bit more of information please:
Grafana version
Grafana dashboard revision.
Sloth version
Your SLO spec
In case you are using a custom SLO window, what window is it.
You metrics SLO graphs for:
slo:sli_error:ratio_rate1h
slo:period_error_budget_remaining:ratio
slo:error_budget:ratio
There is a lot of information you want to be provided...do you have a slack account? Maybe it's more convenient to talk there... this way I can share my screen
from sloth.
also have same issues.
from sloth.
Can you help me pls?
from sloth.
Hi @VCuzmin!
I know that you have this problem 😄, I'm trying to help you and do my best, however, you should know that I maintain Sloth in my free time (and other projects).
I have Slack, anyhow other people (like @saladar) can have this same problem, so making it async and in public would benefit the community.
If you have any problem making public any of the data I asked for, don't worry and omit that data.
from sloth.
I'm the previous VCuzmin user. I just had a issue with that account and logged with the current one...
Well, how can you help me after all? Why that label is not working? Thanks
from sloth.
Without data I can't help you
from sloth.
I'm thinking that this could be one of the cases of https://www.robustperception.io/get-thee-to-a-nannary Please read that carefully
Also, can you please check if the metrics you've specified here #220 (comment) have data?
But again, its very hard to provide help without more data
from sloth.
Ok. Thank you for the answer! I will be back with more data. But I need more info...
I don t know where I can find these: my SLO spec, and in the case of a custom window, what window is it
thanks!
from sloth.
Hello! I upgraded the Grafana image from 8.2.3 to 8.3.3 (helm release 6.20.1) and the dashboard is working now! I hope in the future the problem will not come up again! I will close the issue! Thank you for your help!
from sloth.
Related Issues (20)
- How can I reset the error budget remaining to 100 for 7 days from 30days HOT 3
- Option to generate sloth yaml - using 5m record rule chaining
- Sloth Alerting Rules Not Firing - Graphs Empty on Query Test HOT 1
- Help on Latency SLO definition HOT 1
- Testing an operator which manages Sloth SLOs HOT 1
- Question: Is there a way to refer totalQuery via template variable in errorQuery
- promql expr validation issues HOT 2
- Overriding the `sloth_id` doesn't work
- Feature Request: Provision SLOs from Helm install HOT 2
- Issues making Sloth work with Google Managed Prometheus HOT 1
- what does the current remaining buget -4.69e -12% mean in sloth HOT 2
- Alerting expression changes in Prometheus Alerts browser HOT 1
- 🔴 Project Status HOT 9
- Confusing definitions of errorQuery and totalQuery
- Have you considered creating 'totalQuery' as a recording rule as well?
- grafana dashboard broken for SLOs with dots in the name
- error: "generate" command failed: invalid spec, could not load with any of the supported spec types HOT 1
- How can one add a weekly maintenance window into the calculations for SLO's with sloth? HOT 1
- NaN in SLO dashboard HOT 5
- Sloth pod is not showing SLO metrics HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sloth.