Giter Club home page Giter Club logo

cluster-kube-descheduler-operator's Issues

Support proper strategy names

Currently we have this shorthand approach to descheduler strategies that maps a new name to the actual upstream name:

Operator param Descheduler strategy
duplicates RemoveDuplicates
interpodantiaffinity RemovePodsViolatingInterPodAntiAffinity
lownodeutilization LowNodeUtilization
nodeaffinity RemovePodsViolatingNodeAffinity
nodetaints RemovePodsViolatingNodeTaints

This seems confusing and adds something to translate when trying to configure the operator. These are handled in a simple switch statement and it should be relatively easy to add support for the real upstream strategy names and make those our primary. We can silently support these shorthands for backward compatibility and phase out eventually

Descheduler should parse IMAGE env var for development

For QA and development, there is a need to be able to set a custom descheduler image in the operator (often to the latest ART build in order to verify bug fixes). This used to be the image field in the operator spec, but because we are removing that from the supported user config this should be added as an undocumented/unsupported flag.

For example, the kube scheduler operator (and others) get the env var in the operator deployment then sub that in with the config reconciler

Make descheduler run as cron job

As of now, descheduler is running as job, in order to avoid regressions from 3.10 and 3.11, we need to make it as cron job.

will post a PR soon.

Switch to actual upstream Descheduler policy

We currently have our own config API in the operator that differs from the upstream Descheduler API. For example, the operator needs to be configured with a strategies field like:

apiVersion: operator.openshift.io/v1beta1
kind: KubeDescheduler
metadata:
  name: config
  namespace: openshift-kube-descheduler-operator
spec:
  strategies:
    - name: "RemoveDuplicates"
    - name: "RemovePodsHavingTooManyRestarts"
      params:
       - name: "PodRestartThreshold"
         value: "10"
       - name: "IncludingInitContainers"
         value: "false"

which just gets internally translated into:

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemoveDuplicates":
     enabled: true
  "RemovePodsHavingTooManyRestarts":
     enabled: true
     params:
       podsHavingTooManyRestarts:
         podRestartThreshold: 10
         includingInitContainers: false

So, while our own API is slightly simpler in its definition, in practice it must be manually converted which adds complexity to the codebase. It also means that we need to constantly update our own operator code to support new strategies and parameters as they are added upstream meaning double the work for us to add a new feature upstream.

In addition, it is also confusing to users that the config API is different when using our operator vs running the descheduler on their own which could inhibit adoption of the operator. It would be much simpler to simply have to point to upstream docs for configuring the descheduler.

This is why I propose a field called policy in the operator spec, which would point to a configmap containing an actual descheduler policy (along with an optional field namespace, which defaults to openshift-config). This would match the design of the scheduler operator which has a Policy field that points to a configmap with a regular scheduler policy (see the OpenShift docs on how to deploy the scheduler operator with a custom policy. This would be the exact same design).

I think we will still need to support the current v1beta1 config API until it can be fully deprecated, but this shift will save us effort and reduce potential failure points.

Update: I've opened these PRs to begin the work required for this:

Run unit and e2e tests in CI

As of now, we don't have proper gating for changes going into repo. We'd like to be at a stage atleast where

  • Travis CI is running all the unit tests.
  • e2e's are running in openshift CI.

As of now, there are 2 issues blocking e2e CI setup.

  • Operator sdk supporting local testing. We'd like to have e2es run locally(without building container images) using operator sdk. While this is not a complete blocker, it becomes difficult to manage the registry to which we push images for every PR.

Ref: operator-framework/operator-sdk#745

  • How to integrate with CI? How OLM can pull the bits provided in the PR for running tests in. Quoting Evan from OLM team

We’re planning to add some easier methods to inject things into catalogs (e.g. just write out a new CR in a cluster describing the catalog entry)

Both the above items are WIP from respective teams.

/cc @sjenning

LowNodeUtilization: "TargetThreshold" params not translated correctly, overriden by "Threshold" values

The "TargetThreshold" values are not correctly translated in the cluster Configmap. The "TargetThreshold" values are taken from the "Threshold" values
As is, the Descheduler operator is not usable, except if we update the generated cluster ConfighMap by hand and we don't touch theDeschedulerinstance..

This "strategy":

strategies:
  - name: "LowNodeUtilization"
    params:
      - name: "CPUThreshold"
        value: "10"
      - name: "MemoryThreshold"
        value: "20"
      - name: "PodsThreshold"
        value: "30"
      - name: "CPUTargetThreshold"
        value: "40"
      - name: "MemoryTargetThreshold"
        value: "50"
      - name: "PodsTargetThreshold"
        value: "60"

Is translated in "cluster" ConfigMap to:

 strategies:
  LowNodeUtilization:
    enabled: true
    params:
      nodeResourceUtilizationThresholds:
        targetThresholds:
          cpu: 10
          memory: 20
          pods: 30
        thresholds:
          cpu: 10
          memory: 20
          pods: 30

"unknown conversion" with descheduler policy post-1.14

Using a descheduler image built after the go1.14 bump upstream results in the following error when used with our operator:

$ oc logs pod/cluster-64fd56cddf-c4mf7
E0717 15:33:52.633020       1 server.go:46] failed converting versioned policy to internal policy version: converting (v1alpha1.DeschedulerPolicy) to (api.DeschedulerPolicy): unknown conversion

I haven't verified yet if this is only an error with our operator and how we generate the policy, or if it's an issue with the descheduler's api itself. The error arises from here: https://github.com/kubernetes-sigs/descheduler/blob/267b0837dc3085c387d1ee6bf76050bf0db91c9a/pkg/descheduler/policyconfig.go#L51

/kind bug
/priority critical-urgent

Operator Hub installation does not create openshift-kube-descheduler-operator project and install inside it

I realized the issue when I first attempted to install the descheduler through the Operator Hub. The Operator does not create and install into the hardcoded openshift-kube-descheduler-operator project. This project does not exist ahead of time, and a cluster-admin cannot create the project due to a admission controller preventing new projects with openshift-* from being created.

Once you deploy the descheduler into a user-managed namespace, the pods complain of missing cluster cr in openshift-kube-descheduler-operator

image

Why is policy customization removed in 4.7?

In 4.6 we could configure the descheduler policies with the strategies field as the defaults don't work for us, but now in 4.7 the field is deprecated and can only enable the default profiles with no configuration options. Our only choices now are to keep using the 4.6 operator or remove it completely and run descheduler ourselves.

descheduler pod OOM on large clusters

I experience on our largest openshift cluster that the descheduler pod runs out of memory. Is there a way to set the pod resources in the deployment.apps/descheduler ?

I tried to set the operator to "unmanaged" and change the deployment.apps/descheduler manually. But the operator keeps setting the default. So I had to remove the operator.

Thanks

Resources are not configurable

Openshift-Descheduler Resources are preconfigured:

containers:
        - resources:
            limits:
              cpu: 100m
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 500Mi

For large clusters it will direct to OOM Pod restarts. It would be fine to have possiblity to setup in minimum own limits with the KubeDescheduler custom resource.

LowNodeUtilization "NumberOfNodes" not working

In the example and in the documentation, for the LowNodeUtilization stragegy the parameters"NumberOfNodes"is used in many places
It does not work. It is due to those lines of code that "switchi" on the "toLower" value of the parameter, but test on non lowercase value
Using"nodes" as parameter name works

This is due to those lines:

switch strings.ToLower(param.Name) {
case "cputhreshold":
thresholds[v1.ResourceCPU] = deschedulerapi.Percentage(value)
case "memorythreshold":
thresholds[v1.ResourceMemory] = deschedulerapi.Percentage(value)
case "podsthreshold":
thresholds[v1.ResourcePods] = deschedulerapi.Percentage(value)
case "cputargetthreshold":
targetThresholds[v1.ResourceCPU] = deschedulerapi.Percentage(value)
case "memorytargetthreshold":
targetThresholds[v1.ResourceMemory] = deschedulerapi.Percentage(value)
case "podstargetthreshold":
targetThresholds[v1.ResourcePods] = deschedulerapi.Percentage(value)
case "nodes", "numberOfNodes":
utilizationThresholds.NumberOfNodes = value
}

Update readme with new manual deployment options

Just opening this to track these updates. With the switch to operatorhub setup, I'd still like to know how I can manually deploy the operator from source if possible (does oc create -f manifests/. still work?)

system-cluster-critical pod forbidden to run

I just deployed the descheduler operator from operator hub and got this event from the job:

Error creating: pods "example-descheduler-1-1571692440-" is forbidden: pods with system-cluster-critical priorityClass is not permitted in descheduler namespace

descheduler is normal project I created to run the operator. there was no special instruction on where the operator should be run. What am I doing wrong.
Also as result of this issue I now have several pending jobs, this should probably not be happening.

Support evict annotation for namespaces

The operator currently auto-excludes all namespaces with openshift-* or kube-* prefixes from eviction. This makes sense to prevent users from breaking their cluster with the Descheduler, and those are reserved prefixes so users should not be able to create their own namespaces that match the pattern.

However, it may be useful for administrators and support to be able to include certain system namespaces for rebalancing (for example, during and after upgrades). Perhaps we could add a check for the same descheduler.alpha.kubernetes.io/evict annotation on namespaces before assuming they should be excluded. Pods within that namespace would still be subject to the same eviction rules

cc @ingvagabund wdyt?

README inconsistencies

In testing the operator, found two minor inconsistencies from the README file.

  1. The README references the openshift-descheduler-operator namespace twice, but it's actually the openshift-kube-descheduler-operator namespace.
  2. The Sample CR section says that the operator expects the name config, but the name is cluster when created.

service monitors are scrapped by user workload monitoring.

I have a customer that have installed this operator in openshift 4.10

Now the alert PrometheusOperatorRejectedResources is firing.

After checking the prometheus operator in user workload monitoring, it shows that the service monitor:

openshift-kube-descheduler-operator/kube-descheduler

is trying to be scrapped by user workload monitoring instead of by openshift-monitoring.

The label:

openshift.io/cluster-monitoring: "true"

is not found in the namespace that seems it's managed by operator.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.