The helm chart mimir-distributed kedaAutoscaling impl

[mimir-distributed] kedaAutoscaling can set threshold to 0 about mimir HOT 3 CLOSED

lasermoth commented on May 28, 2024 3

[mimir-distributed] kedaAutoscaling can set threshold to 0

from mimir.

Comments (3)

edwintye commented on May 28, 2024

I ran into the same issue while experimenting. I not sure using trigger that queries back to prometheus/mimir is providing us more value than just the standard cpu and memory triggers

triggers:
  - metadata:
      type: Utilization
      value: "70"
    type: cpu
triggers:
  - metadata:
      type: Utilization
      value: "80"
    type: memory

In the event that we do want to query back, I think the current query

sum(sum by (pod) (rate(container_cpu_usage_seconds_total{container="ruler",namespace="default"}[5m]))

is computing the total cpu usage from all pods, and should be divided by the current number of pods. The current threshold does not account for the pods post scaling as far as I can tell.

from mimir.

dimitarvdimitrov commented on May 28, 2024

I not sure using trigger that queries back to prometheus/mimir is providing us more value than just the standard cpu and memory triggers

the querier autoscaling uses the size of the query queues for example

is computing the total cpu usage from all pods, and should be divided by the current number of pods. The current threshold does not account for the pods post scaling as far as I can tell.

the keda threshold should determine the number of desired pods. So in this case we don't look at how well scaled the current pods are (i.e. how far is each from the average or from its limit) - we look at how many pods we want given the current load of all pods.

from mimir.

lasermoth commented on May 28, 2024

Revisiting this, the floor | int64 behaviour is similar to that of the jsonnet version.

However the helper functions differ, which is what is causing a behavioural change with the floor | int64 operations.

In the jsonnet version we convert everything to milli cpu

  local cpuToMilliCPUInt(str) = (
    // Converts any CPU requests to millicores. (eg 0.5 = 500m)
    // This is due to KEDA requiring an integer.

    if (std.isString(str) && std.endsWith(str, 'm')) then (
      std.parseInt(std.rstripChars(str, 'm'))
    ) else (
      std.parseJson(str + '') * 1000
    )
  ),

Where as helm is using a helper function that converts the CPU to a float of the CPU core representation.

{{/*
parseCPU is used to convert Kubernetes CPU units to the corresponding float value of CPU cores.
The returned value is a string representation. If you need to do any math on it, please parse the string first.

mimir.parseCPU takes 1 argument
.value = the Kubernetes CPU request value
*/}}
{{- define "mimir.parseCPU" -}}
  {{- $value_string := .value | toString -}}
  {{- if (hasSuffix "m" $value_string) -}}
      {{ trimSuffix "m" $value_string | float64 | mulf 0.001 -}}
  {{- else -}}
      {{- $value_string }}
  {{- end -}}
{{- end -}}

So using the earlier example of 100m CPU requests, we end up with the right value if keeping the conversion as MilliCPU.

Current implementation that converts to CPU cores via * 0.001:

100 * 0.001 = 0.1 * (100/100) = 0.1

Jsonnet implementation where we use MilliCPU:

100 * (100/100) = 100

Since we no longer end up with a decimal figure smaller than 1, the float | int64 operations no longer cause a problem.

The current helper mimir.parseCPU is used in some other parts of the chart for calculating GOMAXPROCS.
The best option is likely to just add another helper that can generate the MilliCPU value.

from mimir.

[mimir-distributed] kedaAutoscaling can set threshold to 0 about mimir HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent