openshift / cluster-kube-descheduler-operator Goto Github PK
View Code? Open in Web Editor NEWAn operator to run descheduler on OpenShift.
An operator to run descheduler on OpenShift.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
An unwanted side-effect of running tests locally in CI is rbac and deployment are not checked. This is causing issues.
/cc @sjenning
For 4.13 and next versions
Currently we have this shorthand approach to descheduler strategies that maps a new name to the actual upstream name:
Operator param | Descheduler strategy |
---|---|
duplicates |
RemoveDuplicates |
interpodantiaffinity |
RemovePodsViolatingInterPodAntiAffinity |
lownodeutilization |
LowNodeUtilization |
nodeaffinity |
RemovePodsViolatingNodeAffinity |
nodetaints |
RemovePodsViolatingNodeTaints |
This seems confusing and adds something to translate when trying to configure the operator. These are handled in a simple switch statement and it should be relatively easy to add support for the real upstream strategy names and make those our primary. We can silently support these shorthands for backward compatibility and phase out eventually
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
For QA and development, there is a need to be able to set a custom descheduler image in the operator (often to the latest ART build in order to verify bug fixes). This used to be the image
field in the operator spec, but because we are removing that from the supported user config this should be added as an undocumented/unsupported flag.
For example, the kube scheduler operator (and others) get the env var in the operator deployment then sub that in with the config reconciler
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
As of now, descheduler is running as job, in order to avoid regressions from 3.10 and 3.11, we need to make it as cron job.
will post a PR soon.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
We currently have our own config API in the operator that differs from the upstream Descheduler API. For example, the operator needs to be configured with a strategies
field like:
apiVersion: operator.openshift.io/v1beta1
kind: KubeDescheduler
metadata:
name: config
namespace: openshift-kube-descheduler-operator
spec:
strategies:
- name: "RemoveDuplicates"
- name: "RemovePodsHavingTooManyRestarts"
params:
- name: "PodRestartThreshold"
value: "10"
- name: "IncludingInitContainers"
value: "false"
which just gets internally translated into:
apiVersion: "descheduler/v1alpha1"
kind: "DeschedulerPolicy"
strategies:
"RemoveDuplicates":
enabled: true
"RemovePodsHavingTooManyRestarts":
enabled: true
params:
podsHavingTooManyRestarts:
podRestartThreshold: 10
includingInitContainers: false
So, while our own API is slightly simpler in its definition, in practice it must be manually converted which adds complexity to the codebase. It also means that we need to constantly update our own operator code to support new strategies and parameters as they are added upstream meaning double the work for us to add a new feature upstream.
In addition, it is also confusing to users that the config API is different when using our operator vs running the descheduler on their own which could inhibit adoption of the operator. It would be much simpler to simply have to point to upstream docs for configuring the descheduler.
This is why I propose a field called policy
in the operator spec, which would point to a configmap containing an actual descheduler policy (along with an optional field namespace
, which defaults to openshift-config
). This would match the design of the scheduler operator which has a Policy field that points to a configmap with a regular scheduler policy (see the OpenShift docs on how to deploy the scheduler operator with a custom policy. This would be the exact same design).
I think we will still need to support the current v1beta1
config API until it can be fully deprecated, but this shift will save us effort and reduce potential failure points.
Update: I've opened these PRs to begin the work required for this:
This is a feature request
The title says it all
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
As of now, we don't have proper gating for changes going into repo. We'd like to be at a stage atleast where
As of now, there are 2 issues blocking e2e CI setup.
Ref: operator-framework/operator-sdk#745
We’re planning to add some easier methods to inject things into catalogs (e.g. just write out a new CR in a cluster describing the catalog entry)
Both the above items are WIP from respective teams.
/cc @sjenning
Node Fit filtering is available in upstream descheduler.
It prevents from evicting pods if prerequisites are not satisfied.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.16
release-4.17
For more information, see the branching documentation.
No matter what config is passed, our operator should make sure to exclude openshift-*
projects from descheduling. These are reserved namespace formats, so there should be no reason to deschedule from them in our platform
/kind bug
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
The "TargetThreshold" values are not correctly translated in the cluster Configmap. The "TargetThreshold" values are taken from the "Threshold" values
As is, the Descheduler operator is not usable, except if we update the generated cluster ConfighMap by hand and we don't touch theDescheduler
instance..
This "strategy":
strategies:
- name: "LowNodeUtilization"
params:
- name: "CPUThreshold"
value: "10"
- name: "MemoryThreshold"
value: "20"
- name: "PodsThreshold"
value: "30"
- name: "CPUTargetThreshold"
value: "40"
- name: "MemoryTargetThreshold"
value: "50"
- name: "PodsTargetThreshold"
value: "60"
Is translated in "cluster" ConfigMap to:
strategies:
LowNodeUtilization:
enabled: true
params:
nodeResourceUtilizationThresholds:
targetThresholds:
cpu: 10
memory: 20
pods: 30
thresholds:
cpu: 10
memory: 20
pods: 30
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
As of now, numberOfNodes parameter in lownodeutilization strategy is not enabled. We should enable it.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.1
release-4.0
Contact the Test Platform or Automated Release teams for more information.
Using a descheduler image built after the go1.14 bump upstream results in the following error when used with our operator:
$ oc logs pod/cluster-64fd56cddf-c4mf7
E0717 15:33:52.633020 1 server.go:46] failed converting versioned policy to internal policy version: converting (v1alpha1.DeschedulerPolicy) to (api.DeschedulerPolicy): unknown conversion
I haven't verified yet if this is only an error with our operator and how we generate the policy, or if it's an issue with the descheduler's api itself. The error arises from here: https://github.com/kubernetes-sigs/descheduler/blob/267b0837dc3085c387d1ee6bf76050bf0db91c9a/pkg/descheduler/policyconfig.go#L51
/kind bug
/priority critical-urgent
As of now, there is nothing stopping creation of multiple CRs for descheduler. We need to make it singleton.
I realized the issue when I first attempted to install the descheduler through the Operator Hub. The Operator does not create and install into the hardcoded openshift-kube-descheduler-operator
project. This project does not exist ahead of time, and a cluster-admin cannot create the project due to a admission controller preventing new projects with openshift-*
from being created.
Once you deploy the descheduler into a user-managed namespace, the pods complain of missing cluster cr in openshift-kube-descheduler-operator
In 4.6 we could configure the descheduler policies with the strategies
field as the defaults don't work for us, but now in 4.7 the field is deprecated and can only enable the default profiles with no configuration options. Our only choices now are to keep using the 4.6 operator or remove it completely and run descheduler ourselves.
Currently, we don't do any parsing for logLevel and operatorLogLevel from the operator CRD. These should be consumed so that users can configure log levels
I experience on our largest openshift cluster that the descheduler pod runs out of memory. Is there a way to set the pod resources in the deployment.apps/descheduler ?
I tried to set the operator to "unmanaged" and change the deployment.apps/descheduler manually. But the operator keeps setting the default. So I had to remove the operator.
Thanks
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
Openshift-Descheduler Resources are preconfigured:
containers:
- resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 500Mi
For large clusters it will direct to OOM Pod restarts. It would be fine to have possiblity to setup in minimum own limits with the KubeDescheduler custom resource.
Now that we are fully branched for 4.10, please prepare your operator to supply a 4.10 bundle (https://github.com/openshift/cluster-kube-descheduler-operator/tree/release-4.10/manifests), so that 4.10 operator publishing works
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
e2e is only checking operator, not operand and we need to check both.
So this message is wrong:
And we need one more check.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.17
release-4.18
For more information, see the branching documentation.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
In the example and in the documentation, for the LowNodeUtilization
stragegy the parameters"NumberOfNodes"
is used in many places
It does not work. It is due to those lines of code that "switchi" on the "toLower" value of the parameter, but test on non lowercase value
Using"nodes"
as parameter name works
This is due to those lines:
cluster-kube-descheduler-operator/pkg/operator/target_config_reconciler.go
Lines 181 to 196 in 73b98f8
Just opening this to track these updates. With the switch to operatorhub setup, I'd still like to know how I can manually deploy the operator from source if possible (does oc create -f manifests/.
still work?)
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
I just deployed the descheduler operator from operator hub and got this event from the job:
Error creating: pods "example-descheduler-1-1571692440-" is forbidden: pods with system-cluster-critical priorityClass is not permitted in descheduler namespace
descheduler
is normal project I created to run the operator. there was no special instruction on where the operator should be run. What am I doing wrong.
Also as result of this issue I now have several pending jobs, this should probably not be happening.
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
The operator currently auto-excludes all namespaces with openshift-*
or kube-*
prefixes from eviction. This makes sense to prevent users from breaking their cluster with the Descheduler, and those are reserved prefixes so users should not be able to create their own namespaces that match the pattern.
However, it may be useful for administrators and support to be able to include certain system namespaces for rebalancing (for example, during and after upgrades). Perhaps we could add a check for the same descheduler.alpha.kubernetes.io/evict
annotation on namespaces before assuming they should be excluded. Pods within that namespace would still be subject to the same eviction rules
cc @ingvagabund wdyt?
The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.
release-4.5
release-4.4
Contact the Test Platform or Automated Release teams for more information.
As of now, we are not exposing the descheduler version, we should try creating one so that it can be managed by OLM.
Now that we are fully branched for 4.7, please prepare your operator to supply a 4.7 bundle, so that 4.7 operator publishing works and doesn't overwrite 4.6 bundles. This means at least updating the package.yaml under
https://github.com/openshift/cluster-kube-descheduler-operator/tree/master/manifests
Reference: openshift-eng/ocp-build-data#708
In testing the operator, found two minor inconsistencies from the README file.
openshift-descheduler-operator
namespace twice, but it's actually the openshift-kube-descheduler-operator
namespace.config
, but the name is cluster
when created.As of now, descheduler strategies are being created from policy configmap, we need to support flags for descheduler binary.
I have a customer that have installed this operator in openshift 4.10
Now the alert PrometheusOperatorRejectedResources is firing.
After checking the prometheus operator in user workload monitoring, it shows that the service monitor:
openshift-kube-descheduler-operator/kube-descheduler
is trying to be scrapped by user workload monitoring instead of by openshift-monitoring.
The label:
openshift.io/cluster-monitoring: "true"
is not found in the namespace that seems it's managed by operator.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.