Comments (3)
I ran into the same issue while experimenting. I not sure using trigger that queries back to prometheus/mimir is providing us more value than just the standard cpu and memory triggers
triggers:
- metadata:
type: Utilization
value: "70"
type: cpu
triggers:
- metadata:
type: Utilization
value: "80"
type: memory
In the event that we do want to query back, I think the current query
sum(sum by (pod) (rate(container_cpu_usage_seconds_total{container="ruler",namespace="default"}[5m]))
is computing the total cpu usage from all pods, and should be divided by the current number of pods. The current threshold does not account for the pods post scaling as far as I can tell.
from mimir.
I not sure using trigger that queries back to prometheus/mimir is providing us more value than just the standard cpu and memory triggers
the querier autoscaling uses the size of the query queues for example
is computing the total cpu usage from all pods, and should be divided by the current number of pods. The current threshold does not account for the pods post scaling as far as I can tell.
the keda threshold
should determine the number of desired pods. So in this case we don't look at how well scaled the current pods are (i.e. how far is each from the average or from its limit) - we look at how many pods we want given the current load of all pods.
from mimir.
Revisiting this, the floor | int64
behaviour is similar to that of the jsonnet version.
However the helper functions differ, which is what is causing a behavioural change with the floor | int64
operations.
In the jsonnet version we convert everything to milli cpu
local cpuToMilliCPUInt(str) = (
// Converts any CPU requests to millicores. (eg 0.5 = 500m)
// This is due to KEDA requiring an integer.
if (std.isString(str) && std.endsWith(str, 'm')) then (
std.parseInt(std.rstripChars(str, 'm'))
) else (
std.parseJson(str + '') * 1000
)
),
Where as helm is using a helper function that converts the CPU to a float of the CPU core representation.
{{/*
parseCPU is used to convert Kubernetes CPU units to the corresponding float value of CPU cores.
The returned value is a string representation. If you need to do any math on it, please parse the string first.
mimir.parseCPU takes 1 argument
.value = the Kubernetes CPU request value
*/}}
{{- define "mimir.parseCPU" -}}
{{- $value_string := .value | toString -}}
{{- if (hasSuffix "m" $value_string) -}}
{{ trimSuffix "m" $value_string | float64 | mulf 0.001 -}}
{{- else -}}
{{- $value_string }}
{{- end -}}
{{- end -}}
So using the earlier example of 100m
CPU requests, we end up with the right value if keeping the conversion as MilliCPU.
Current implementation that converts to CPU cores via * 0.001:
100 * 0.001 = 0.1 * (100/100) = 0.1
Jsonnet implementation where we use MilliCPU:
100 * (100/100) = 100
Since we no longer end up with a decimal figure smaller than 1, the float | int64
operations no longer cause a problem.
The current helper mimir.parseCPU
is used in some other parts of the chart for calculating GOMAXPROCS
.
The best option is likely to just add another helper that can generate the MilliCPU value.
from mimir.
Related Issues (20)
- Provide charts through OCI registry HOT 3
- Different object storage buckets for different tenants HOT 6
- API read requests splitting
- Options to improve availability during partial ingester outages
- Race-condition in integration tests `TestPlayWithGrafanaMimirTutorial`
- Mimir Read Latency Errors (MimirCacheRequestErrors & MimirRequestLatency) HOT 4
- Compactor: Reduce memory consumption from large meta.json files
- Docs: Standardize format of admonition blocks HOT 3
- Ruler: MimirRulerTooManyFailedQueries alert due to user error HOT 5
- Lookup style for S3 bucket HOT 2
- per-tenant `max_total_query_length` not overriding global limit HOT 3
- Flakey unit test Test_ProxyEndpoint_LogSlowQueries
- Flakey integration test TestRulerEvaluationDelay HOT 1
- Some ingester error logs are incorrectly noted as "sampled"
- [Helm] Confusing error message when sending multiple `X-Scope-OrgID` headers
- Support podAnnotations for minio pods HOT 2
- bug: writing block: closing index writer: postings offset table size exceeds 4 bytes: 5078829125 HOT 4
- global deletion mark skipped clean when `{block}/deletion-mark.json` does't exists
- query-frontend: label names and values endpoints cache ignore `limit` parameter HOT 2
- "The specified bucket does not exist" but all buckets are present in minio HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mimir.