The cluster-kube-descheduler-operator's discuss from openshift

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Running tests locally won't check rbac issues

An unwanted side-effect of running tests locally in CI is rbac and deployment are not checked. This is causing issues.

/cc @sjenning

Incorrect path to manifests in Dockerfile.metadata

For 4.13 and next versions

Support proper strategy names

Currently we have this shorthand approach to descheduler strategies that maps a new name to the actual upstream name:

Operator param	Descheduler strategy
`duplicates`	`RemoveDuplicates`
`interpodantiaffinity`	`RemovePodsViolatingInterPodAntiAffinity`
`lownodeutilization`	`LowNodeUtilization`
`nodeaffinity`	`RemovePodsViolatingNodeAffinity`
`nodetaints`	`RemovePodsViolatingNodeTaints`

This seems confusing and adds something to translate when trying to configure the operator. These are handled in a simple switch statement and it should be relatively easy to add support for the real upstream strategy names and make those our primary. We can silently support these shorthands for backward compatibility and phase out eventually

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Descheduler should parse IMAGE env var for development

For QA and development, there is a need to be able to set a custom descheduler image in the operator (often to the latest ART build in order to verify bug fixes). This used to be the image field in the operator spec, but because we are removing that from the supported user config this should be added as an undocumented/unsupported flag.

For example, the kube scheduler operator (and others) get the env var in the operator deployment then sub that in with the config reconciler

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Make descheduler run as cron job

As of now, descheduler is running as job, in order to avoid regressions from 3.10 and 3.11, we need to make it as cron job.

will post a PR soon.

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Switch to actual upstream Descheduler policy

We currently have our own config API in the operator that differs from the upstream Descheduler API. For example, the operator needs to be configured with a strategies field like:

apiVersion: operator.openshift.io/v1beta1
kind: KubeDescheduler
metadata:
  name: config
  namespace: openshift-kube-descheduler-operator
spec:
  strategies:
    - name: "RemoveDuplicates"
    - name: "RemovePodsHavingTooManyRestarts"
      params:
       - name: "PodRestartThreshold"
         value: "10"
       - name: "IncludingInitContainers"
         value: "false"

which just gets internally translated into:

apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
  "RemoveDuplicates":
     enabled: true
  "RemovePodsHavingTooManyRestarts":
     enabled: true
     params:
       podsHavingTooManyRestarts:
         podRestartThreshold: 10
         includingInitContainers: false

So, while our own API is slightly simpler in its definition, in practice it must be manually converted which adds complexity to the codebase. It also means that we need to constantly update our own operator code to support new strategies and parameters as they are added upstream meaning double the work for us to add a new feature upstream.

In addition, it is also confusing to users that the config API is different when using our operator vs running the descheduler on their own which could inhibit adoption of the operator. It would be much simpler to simply have to point to upstream docs for configuring the descheduler.

This is why I propose a field called policy in the operator spec, which would point to a configmap containing an actual descheduler policy (along with an optional field namespace, which defaults to openshift-config). This would match the design of the scheduler operator which has a Policy field that points to a configmap with a regular scheduler policy (see the OpenShift docs on how to deploy the scheduler operator with a custom policy. This would be the exact same design).

I think we will still need to support the current v1beta1 config API until it can be fully deprecated, but this shift will save us effort and reduce potential failure points.

Update: I've opened these PRs to begin the work required for this:

Add the possibility to define nodeSelector+tolerations for the operator and KubeDeschedulers

This is a feature request
The title says it all

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Add support for history limit

Run unit and e2e tests in CI

As of now, we don't have proper gating for changes going into repo. We'd like to be at a stage atleast where

Travis CI is running all the unit tests.
e2e's are running in openshift CI.

As of now, there are 2 issues blocking e2e CI setup.

Operator sdk supporting local testing. We'd like to have e2es run locally(without building container images) using operator sdk. While this is not a complete blocker, it becomes difficult to manage the registry to which we push images for every PR.

Ref: operator-framework/operator-sdk#745

How to integrate with CI? How OLM can pull the bits provided in the PR for running tests in. Quoting Evan from OLM team

We’re planning to add some easier methods to inject things into catalogs (e.g. just write out a new CR in a cluster describing the catalog entry)

Both the above items are WIP from respective teams.

/cc @sjenning

Enable nodeFit filtering before eviction

Node Fit filtering is available in upstream descheduler.
It prevents from evicting pods if prerequisites are not satisfied.

Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.16
release-4.17

For more information, see the branching documentation.

Operator should automatically exclude "openshift-*" namespaces

No matter what config is passed, our operator should make sure to exclude openshift-* projects from descheduling. These are reserved namespace formats, so there should be no reason to deschedule from them in our platform

/kind bug

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

LowNodeUtilization: "TargetThreshold" params not translated correctly, overriden by "Threshold" values

The "TargetThreshold" values are not correctly translated in the cluster Configmap. The "TargetThreshold" values are taken from the "Threshold" values
As is, the Descheduler operator is not usable, except if we update the generated cluster ConfighMap by hand and we don't touch theDeschedulerinstance..

This "strategy":

strategies:
  - name: "LowNodeUtilization"
    params:
      - name: "CPUThreshold"
        value: "10"
      - name: "MemoryThreshold"
        value: "20"
      - name: "PodsThreshold"
        value: "30"
      - name: "CPUTargetThreshold"
        value: "40"
      - name: "MemoryTargetThreshold"
        value: "50"
      - name: "PodsTargetThreshold"
        value: "60"

Is translated in "cluster" ConfigMap to:

 strategies:
  LowNodeUtilization:
    enabled: true
    params:
      nodeResourceUtilizationThresholds:
        targetThresholds:
          cpu: 10
          memory: 20
          pods: 30
        thresholds:
          cpu: 10
          memory: 20
          pods: 30

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

NoOfNodes parameter in LowNodeUtilization strategy is not enabled.

As of now, numberOfNodes parameter in lownodeutilization strategy is not enabled. We should enable it.

Future Release Branches Frozen For Merging | branch:release-4.1 branch:release-4.0

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.1
release-4.0

Contact the Test Platform or Automated Release teams for more information.

"unknown conversion" with descheduler policy post-1.14

Using a descheduler image built after the go1.14 bump upstream results in the following error when used with our operator:

$ oc logs pod/cluster-64fd56cddf-c4mf7
E0717 15:33:52.633020       1 server.go:46] failed converting versioned policy to internal policy version: converting (v1alpha1.DeschedulerPolicy) to (api.DeschedulerPolicy): unknown conversion

I haven't verified yet if this is only an error with our operator and how we generate the policy, or if it's an issue with the descheduler's api itself. The error arises from here: https://github.com/kubernetes-sigs/descheduler/blob/267b0837dc3085c387d1ee6bf76050bf0db91c9a/pkg/descheduler/policyconfig.go#L51

/kind bug
/priority critical-urgent

Make descheduler CR singleton.

As of now, there is nothing stopping creation of multiple CRs for descheduler. We need to make it singleton.

Operator Hub installation does not create openshift-kube-descheduler-operator project and install inside it

I realized the issue when I first attempted to install the descheduler through the Operator Hub. The Operator does not create and install into the hardcoded openshift-kube-descheduler-operator project. This project does not exist ahead of time, and a cluster-admin cannot create the project due to a admission controller preventing new projects with openshift-* from being created.

Once you deploy the descheduler into a user-managed namespace, the pods complain of missing cluster cr in openshift-kube-descheduler-operator

Why is policy customization removed in 4.7?

In 4.6 we could configure the descheduler policies with the strategies field as the defaults don't work for us, but now in 4.7 the field is deprecated and can only enable the default profiles with no configuration options. Our only choices now are to keep using the 4.6 operator or remove it completely and run descheduler ourselves.

Parse logLevel and operatorLogLevel settings

Currently, we don't do any parsing for logLevel and operatorLogLevel from the operator CRD. These should be consumed so that users can configure log levels

descheduler pod OOM on large clusters

I experience on our largest openshift cluster that the descheduler pod runs out of memory. Is there a way to set the pod resources in the deployment.apps/descheduler ?

I tried to set the operator to "unmanaged" and change the deployment.apps/descheduler manually. But the operator keeps setting the default. So I had to remove the operator.

Thanks

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Resources are not configurable

Openshift-Descheduler Resources are preconfigured:

containers:
        - resources:
            limits:
              cpu: 100m
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 500Mi

For large clusters it will direct to OOM Pod restarts. It would be fine to have possiblity to setup in minimum own limits with the KubeDescheduler custom resource.

[Release] Please prepare bundle for OCP 4.10

Now that we are fully branched for 4.10, please prepare your operator to supply a 4.10 bundle (https://github.com/openshift/cluster-kube-descheduler-operator/tree/release-4.10/manifests), so that 4.10 operator publishing works

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

e2e is only checking operator, not operand

e2e is only checking operator, not operand and we need to check both.

So this message is wrong:

cluster-kube-descheduler-operator/test/e2e/operator_test.go

Line 191 in 3cc94e2

klog.Infof("Descheduler running in %v", deschPod.Name)

And we need one more check.

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Future Release Branches Frozen For Merging | branch:release-4.17 branch:release-4.18

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.17
release-4.18

For more information, see the branching documentation.

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

LowNodeUtilization "NumberOfNodes" not working

In the example and in the documentation, for the LowNodeUtilization stragegy the parameters"NumberOfNodes"is used in many places
It does not work. It is due to those lines of code that "switchi" on the "toLower" value of the parameter, but test on non lowercase value
Using"nodes" as parameter name works

This is due to those lines:

cluster-kube-descheduler-operator/pkg/operator/target_config_reconciler.go

Lines 181 to 196 in 73b98f8

 switch strings.ToLower(param.Name) { 

 case "cputhreshold": 

 thresholds[v1.ResourceCPU] = deschedulerapi.Percentage(value) 

 case "memorythreshold": 

 thresholds[v1.ResourceMemory] = deschedulerapi.Percentage(value) 

 case "podsthreshold": 

 thresholds[v1.ResourcePods] = deschedulerapi.Percentage(value) 

 case "cputargetthreshold": 

 targetThresholds[v1.ResourceCPU] = deschedulerapi.Percentage(value) 

 case "memorytargetthreshold": 

 targetThresholds[v1.ResourceMemory] = deschedulerapi.Percentage(value) 

 case "podstargetthreshold": 

 targetThresholds[v1.ResourcePods] = deschedulerapi.Percentage(value) 

 case "nodes", "numberOfNodes": 

 utilizationThresholds.NumberOfNodes = value 

 }

Update readme with new manual deployment options

Just opening this to track these updates. With the switch to operatorhub setup, I'd still like to know how I can manually deploy the operator from source if possible (does oc create -f manifests/. still work?)

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

system-cluster-critical pod forbidden to run

I just deployed the descheduler operator from operator hub and got this event from the job:

Error creating: pods "example-descheduler-1-1571692440-" is forbidden: pods with system-cluster-critical priorityClass is not permitted in descheduler namespace

descheduler is normal project I created to run the operator. there was no special instruction on where the operator should be run. What am I doing wrong.
Also as result of this issue I now have several pending jobs, this should probably not be happening.

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Support evict annotation for namespaces

The operator currently auto-excludes all namespaces with openshift-* or kube-* prefixes from eviction. This makes sense to prevent users from breaking their cluster with the Descheduler, and those are reserved prefixes so users should not be able to create their own namespaces that match the pattern.

However, it may be useful for administrators and support to be able to include certain system namespaces for rebalancing (for example, during and after upgrades). Perhaps we could add a check for the same descheduler.alpha.kubernetes.io/evict annotation on namespaces before assuming they should be excluded. Pods within that namespace would still be subject to the same eviction rules

cc @ingvagabund wdyt?

Future Release Branches Frozen For Merging | branch:release-4.5 branch:release-4.4

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.5
release-4.4

Contact the Test Platform or Automated Release teams for more information.

Expose version of descheduler being managed

As of now, we are not exposing the descheduler version, we should try creating one so that it can be managed by OLM.

Please prepare bundle for OCP 4.7

Now that we are fully branched for 4.7, please prepare your operator to supply a 4.7 bundle, so that 4.7 operator publishing works and doesn't overwrite 4.6 bundles. This means at least updating the package.yaml under
https://github.com/openshift/cluster-kube-descheduler-operator/tree/master/manifests

@jcantrill @richm

Reference: openshift-eng/ocp-build-data#708

README inconsistencies

In testing the operator, found two minor inconsistencies from the README file.

The README references the openshift-descheduler-operator namespace twice, but it's actually the openshift-kube-descheduler-operator namespace.
The Sample CR section says that the operator expects the name config, but the name is cluster when created.

Add support for flags to descheduler

As of now, descheduler strategies are being created from policy configmap, we need to support flags for descheduler binary.

service monitors are scrapped by user workload monitoring.

I have a customer that have installed this operator in openshift 4.10

Now the alert PrometheusOperatorRejectedResources is firing.

After checking the prometheus operator in user workload monitoring, it shows that the service monitor:

openshift-kube-descheduler-operator/kube-descheduler

is trying to be scrapped by user workload monitoring instead of by openshift-monitoring.

The label:

openshift.io/cluster-monitoring: "true"

is not found in the namespace that seems it's managed by operator.

	switch strings.ToLower(param.Name) {
	case "cputhreshold":
	thresholds[v1.ResourceCPU] = deschedulerapi.Percentage(value)
	case "memorythreshold":
	thresholds[v1.ResourceMemory] = deschedulerapi.Percentage(value)
	case "podsthreshold":
	thresholds[v1.ResourcePods] = deschedulerapi.Percentage(value)
	case "cputargetthreshold":
	targetThresholds[v1.ResourceCPU] = deschedulerapi.Percentage(value)
	case "memorytargetthreshold":
	targetThresholds[v1.ResourceMemory] = deschedulerapi.Percentage(value)
	case "podstargetthreshold":
	targetThresholds[v1.ResourcePods] = deschedulerapi.Percentage(value)
	case "nodes", "numberOfNodes":
	utilizationThresholds.NumberOfNodes = value
	}

openshift / cluster-kube-descheduler-operator Goto Github PK

cluster-kube-descheduler-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org