This is a feature request The title says it all

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sounds good. We can extend <a href="https://github.com/openshift/cluster-kube-deschedu

Same as <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add the possibility to define nodeSelector+tolerations for the operator and KubeDeschedulers about cluster-kube-descheduler-operator HOT 18 CLOSED

titou10titou10 commented on September 21, 2024

Add the possibility to define nodeSelector+tolerations for the operator and KubeDeschedulers

from cluster-kube-descheduler-operator.

Comments (18)

damemi commented on September 21, 2024 1

@titou10titou10 thanks for your request! I don't see anything wrong with adding this ability. The operator can already have nodeSelector and tolerations defined on it (that's just set in the operator deployment), but we could talk about how the operator could manage selector/tolerations for the KubeDescheduler operand. @ingvagabund wdyt?

from cluster-kube-descheduler-operator.

titou10titou10 commented on September 21, 2024 1

@ingvagabund setting tolerations and node selection is not related to scheduling decisions. The goal is to run all the "cluster dedicated" workloads onto dedicated nodes, eg so called "infra nodes" (nodes with a role "infra" instead of "worker" in OpenShift). On those nodes run workloads like the monitoring stack (prometheus, alertmanager), logging, nfs client provisionner, velero (backup), sealed secret etc, image registry, ie all workloads related to "operate" OpenShift/k8s

from cluster-kube-descheduler-operator.

ingvagabund commented on September 21, 2024 1

NodeAffinity/Tolerations - run the descheduler on specific nodes. I don't see any harm adding this as an option, but prefer calling it NodeAffinity to differentiate from the next one

In the spirit of minimizing the configuration space, I'd prefer to allow the deschedulers run on master and infra nodes. The less the better. Even though one can allow master nodes to receive workload as well, I'd like to avoid the case where a descheduler can run everywhere. So instead of allowing to set a generic NodeAffinity, let's set infrasSchedulable: false by default (or limit nodeAffinity to node-role.kubernetes.io/infra: "" only) . It's easier to extend the nodeAffinity to a generic expression later. I am fine with tolerations.

from cluster-kube-descheduler-operator.

damemi commented on September 21, 2024 1

@ingvagabund the descheduler can currently run everywhere, since there is no setting controlling where the operand pod is scheduled. If the cluster has masters schedulable, the descheduler could end up on a master (or it might not, it's basically random).

I think I see your point though, where we can add a setting to only run the descheduler on master/infra nodes (and set the matching affinity internally, rather than expose that level of config to the users). In that case though, we should also do the same with tolerations, since running on a non-schedulable master requires certain tolerations.

I think that is the ultimate goal of this issue: to run descheduler (which is technically, an optional "workload" pod but really more of an infra component) on a master/infra node when the cluster has mastersSchedulable: false. Is that correct? Or could this be a 3-option setting to run on master/infra/workload?

from cluster-kube-descheduler-operator.

damemi commented on September 21, 2024 1

This operator is an opinionated way to run the descheduler, in order to provide a reliable configuration for OpenShift while minimizing risk for mistakes which could hurt the descheduler or the cluster itself. This is the support contract that we are providing with it, and we are intentionally choosing to not support every configuration of every option because it would not be possible for us to do so.

Adding new options such as this is also better to start restrictive, and possibly open up the configuration in later releases. It is much more difficult for us to provide the full range of options and later discover a need to restrict those options. The goal being to provide a consistent experience that does not break existing use cases between releases.

If you wish to run the descheduler with full levels of configuration, you are free to deploy the image yourself from upstream (https://github.com/kubernetes-sigs/descheduler/), however this will not have the OpenShift support that is offered through the operator.

from cluster-kube-descheduler-operator.

damemi commented on September 21, 2024

/kind feature

from cluster-kube-descheduler-operator.

ingvagabund commented on September 21, 2024

Sounds good. We can extend KubeDescheduler CRD and have node selector and tolerations put there.

@titou10titou10 just curious, what's your use case where setting node selector/tolerations improves the scheduling decision?

from cluster-kube-descheduler-operator.

usamaahmadkhan commented on September 21, 2024

Same as @titou10titou10 's use case. The node selector for KubeDescheduler will give flexibility to select which nodes to consider when descheduling. e.g. I only want to run descheduler on the infra nodes or 2 KubeDescheduler instances with different thresholds for different types of machines.

from cluster-kube-descheduler-operator.

damemi commented on September 21, 2024

The node selector for KubeDescheduler will give flexibility to select which nodes to consider when descheduling

@usamaahmadkhan I think these are actually different goals. Correct me if I'm wrong, but @titou10titou10 you just want to run the descheduler on certain nodes and not actually restrict eviction to those nodes?

The goal is to run all the "cluster dedicated" workloads onto dedicated nodes

This is not to be confused with the upstream --node-selector option, which only considers pods for eviction on the selected nodes (and is not currently supported in this operator).

From this I see 3 separate goals:

OperatorNodeAffinity - run the openshift-kube-descheduler-operator on specific nodes. Already possible by editing operator deployment
NodeAffinity/Tolerations - run the descheduler on specific nodes. I don't see any harm adding this as an option, but prefer calling it NodeAffinity to differentiate from the next one
NodeSelector - only evict pods that are on certain nodes. I am not so sure about providing this option now, as it can impact performance of the descheduler and has limited use cases.

I'd say we can go ahead and implement Option 2, if that is accurately what's being requested. @ingvagabund @titou10titou10 @usamaahmadkhan what do you think?

Also cc @soltysh :)

from cluster-kube-descheduler-operator.

titou10titou10 commented on September 21, 2024

@damemi you're right. Define via nodeselector and tolerations where the pod for the descheduler operator and the descheduler instance itself will run (basically "infra nodes" in our case), so goals 1 and 2

from cluster-kube-descheduler-operator.

titou10titou10 commented on September 21, 2024

IMHO I really don't understand why here we could not use the k8s standard feature for my very basic need, ie define nodeSelector/tolerations to define where to run pods.....instead of creating a new (opiniated) way to do it.....

from cluster-kube-descheduler-operator.

damemi commented on September 21, 2024

Though none of that is to say we won't ever allow setting affinity/tolerations through the operator, just explaining our rationale behind an opinionated approach.

We need to consider the use case here, is it to run the descheduler on any arbitrary node? That would certainly require more consideration for arbitrary affinity and tolerations. But this is more about running it on master and infra nodes, which have defined labels. In that case, we don't need to offer full configuration and can wrap this functionality in a single option ourselves.

Also consider the case where kubernetes decides to change these labels (for whatever reason). If the operator let you define them yourself, it would require a config change when you upgraded your cluster version. However, if we wrap them into a simple option then the change would be handled internally by our code, and the functioning would remain consistent across upgrades, seamlessly. This is of course hypothetical, but is an instance where this restriction is actually a feature.

from cluster-kube-descheduler-operator.

titou10titou10 commented on September 21, 2024

@damemi fair enough. I understood the "opinionated approach" and I respect it. Your code, your choice. fine

Sorry if you took my coments as insults or personal attack, it was not, I apologize for that

FYI:

the"node-role.kubernetes.io/infra"label is not used and not needed at all by OpenShift in v4.x (Onlymasterand worker are defined/required). It was on v3.x
thenode-role.kubernetes.iolabel is not in the "Well-Known Labels, Annotations and Taints" list for k8s as listed here: https://kubernetes.io/docs/reference/labels-annotations-taints/
In OCP v4, Anyone is free to set whatever label name on their nodes to categorize the nature of the nodes. This arbitrary label is used as selectors in"MachineConfigPools" to configure the nodes themselves. This is what we are doing
So your arguments are wrong for OCP and k8s about this label is general, and using this feature will only be applicable for "cluster that uses node with anode-role.kubernetes.io/infra label with no taints".

No problem with me as you pointed two solutions: not using your operator to deploy the Descheduler or write our own. This is what we will do in the meantime

To conclude, this feature request should be closed as "will-not-be-implemented-for-now" (I will do it in a few days) and another one opened to describe the problem you are solving with your solution. This is not what we (and other it seems) asked for. Fair enough

Again my apalogizes if someone has been hurt with my comments and thanks for your great work on this

from cluster-kube-descheduler-operator.

damemi commented on September 21, 2024

@titou10titou10 no offense taken :) your question was valid, and similarly, I hope my explanation did not offend or appear overtly stubborn. Perhaps we could document these intentions more clearly. Thank you for your feedback!

from cluster-kube-descheduler-operator.

openshift-bot commented on September 21, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

from cluster-kube-descheduler-operator.

openshift-bot commented on September 21, 2024

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

from cluster-kube-descheduler-operator.

openshift-bot commented on September 21, 2024

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

from cluster-kube-descheduler-operator.

openshift-ci commented on September 21, 2024

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

from cluster-kube-descheduler-operator.

Add the possibility to define nodeSelector+tolerations for the operator and KubeDeschedulers about cluster-kube-descheduler-operator HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent