Cluster Lifecycle Manager (CLM) to provision and update multiple Kubernetes clusters

License: MIT License

Makefile 0.51% Go 99.26% Dockerfile 0.23%

aws cloud cloudformation cluster go kubernetes lifecycle manager

cluster-lifecycle-manager's People

Contributors

Stargazers

Watchers

cluster-lifecycle-manager's Issues

Get rid of senza

The dependency makes the code much more harder to a newcomer to understand:

what is really happening behind the scenes?
why do I need an external dependency?

Hi Teapots :-)

Can we make cluster-lifecycle-controller part of cluster-lifecycle-manager and create a multi binary project?
This would make more sense than having a separate project that copied/extracted logic from CLM.

i3.metal vCPU count error

time="2018-12-12T14:56:12Z" level=warning msg="Unable to determine vCPU count for i3.metal: strconv.ParseInt: parsing "N/A": invalid syntax"

Use client-go directly instead of shelling out to `kubectl`

Currently we call kubectl externally when applying manifests. This allows us to reuse apply logic, which would be cumbersome to implement internally, but leads to excessive resource consumption and complicates result handling (we have to capture the whole output and look for strings inside just to figure out if the error was because of a missing resource).
We should instead reuse the client instance we already have, especially considering that apply contains a lot of logic that we might not want anyway.

Support other cloud providers?

Hi,

Any plans to support other cloud providers?

Re-fetch pod information before evicting

We have a race condition because the Kubernetes Eviction API doesn't properly support statefulset pods (kubernetes/kubernetes#64923). Let's just re-fetch the pod metadata on every eviction attempt, this way we can at least reduce the race window to one or two seconds.

Datetime-ranged scheduled cluster updates

Feature request

Support timeframes or datetime frames scheduled cluster updates.

(opposed to automatically triggered cluster updates)

Changes would pile-up and automatically executed all at once in the specified hours, could be cron syntax or something else.

This might be useful for ensuring scheduling cluster updates are run out of high peak usage, specific dates (e.g. Black Friday), outside of office hours, etc...

Limit max interval in eviction retries

Ensure that the max interval in the exponential backoffs isn't too high when we set a very high max eviction timeout of like 3 days.

Please implement release tags

It would be appreciated if you start implementing releases, so we know exactly what changed in between versions.

Completed Pods should not block the updates

Logs showing PDB violations:

time="2018-06-05T13:48:25Z" level=info msg="Pod Disruption Budget violated" cluster=stups node=ip-172-31-17-54.eu-central-1.compute.internal ns=default pod=b-hdq3tk6ivnwstiy22f2zfejwg worker=2
time="2018-06-05T13:48:25Z" level=info msg="Pod Disruption Budget violated" cluster=stups node=ip-172-31-17-54.eu-central-1.compute.internal ns=default pod=b-htbbbaynjiw53xocviwpjmbvs worker=2
time="2018-06-05T13:48:25Z" level=info msg="Pod Disruption Budget violated" cluster=stups node=ip-172-31-17-54.eu-central-1.compute.internal ns=default pod=b-qybqt9bv8hmqe1chnef3smsv3 worker=2

Pods are already completed:

% kubectl get pods b-hdq3tk6ivnwstiy22f2zfejwg b-htbbbaynjiw53xocviwpjmbvs b-qybqt9bv8hmqe1chnef3smsv3
NAME                          READY     STATUS      RESTARTS   AGE
b-hdq3tk6ivnwstiy22f2zfejwg   0/1       Completed   0          1h
b-htbbbaynjiw53xocviwpjmbvs   0/1       Completed   0          1h
b-qybqt9bv8hmqe1chnef3smsv3   0/1       Completed   0          1h

PDB that stops the rolling update:

% kubectl get pdb cdp-controller-builder
NAME                     MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
cdp-controller-builder   N/A             0                 0                     13d

Workaround:

% kubectl get pods | awk '/Completed/ {print $1}' | xargs kubectl delete pod

Taint/label the nodes in a node pool that's being decommissioned

When we update all the nodes in a node pool, we first taint & label them with lifecycle-status=decommission-pending and then drain one by one. However, when a node pool is removed, we only relabel the nodes when actually draining. Same logic should be used in both scenarios.

Respect terminationGracePeriodSeconds when evicting pods.

Currently we evict pods during updates in order to respect PodDisruptionBudgets. However we fail to wait for the terminationGracePeriodSeconds defined for the pods, meaning we could potentially evict a pod, and terminate the node right after, not giving the kubelet enough time to gracefully shut down the pod.

This is where we should wait for the grace period: https://github.com/zalando-incubator/cluster-lifecycle-manager/blob/master/pkg/updatestrategy/node_pool_manager.go#L378-L379

Evict pods in parallel

Since #27 it now takes a "long time" (~10 min.) to evict pods from a single node even if it's not violating the PDB. This is because we evict the pods one by one and wait for each pod to be gracefully terminated.

There is no reason not to do this in parallel.

Fail if applying manifest fails

In case applying a manifest fails, the CLM will just log an error and continue.

time="2018-08-10T14:40:50Z" level=error msg="Error applying template template: /home/master/workspace/ubernetes-on-aws-e2e_master-5FUAFYO2STDSYHIB2UTPTBMD2CFAR7XTT5L5MRNTC4AMDBO42YQA/head/cluster/manifests/kube-downscaler/deployment.yaml:1: function "neq" not defined"

I think we added this graceful error handling in the past to work around some kubectl issues. But it probably probably makes sense to fail the provisioning if this happens.

Updating go 1.13 in travis

Travis should have the latest version of go

Updates of channel should update test clusters first

An update of one channel has to sort clusters by name and select test clusters first.
Only after test clusters being updated, update also production clusters.

Feature request: Upgrade clusters at given time windows only

Hi all,

we have the following feature request:
Possibility to run updates for a specific cluster at given time windows (eg. out of office hours) only?

This request is related to this one
issue

CLM could wait forever for node decommissioning

CLM could wait forever for node decommissioning (if there is some kind of problem)

cluster-lifecycle-manager/pkg/updatestrategy/clc_update.go

Lines 48 to 89 in 769565c

 func (c *CLCUpdateStrategy) PrepareForRemoval(ctx context.Context, nodePoolDesc *api.NodePool) error { 

 c.logger.Infof("Preparing for removal of node pool '%s'", nodePoolDesc.Name) 

 for { 

 nodePool, err := c.nodePoolManager.GetPool(ctx, nodePoolDesc) 

 if err != nil { 

 return err 

 } 

 err = c.nodePoolManager.MarkPoolForDecommission(nodePoolDesc) 

 if err != nil { 

 return err 

 } 

 for _, node := range nodePool.Nodes { 

 err := c.nodePoolManager.DisableReplacementNodeProvisioning(ctx, node) 

 if err != nil { 

 return err 

 } 

 } 

 nodes, err := c.markNodes(ctx, nodePool, func(_ *Node) bool { 

 return true 

 }) 

 if err != nil { 

 return err 

 } 

 if nodes == 0 { 

 return nil 

 } 

 c.logger.WithField("node-pool", nodePoolDesc.Name).Infof("Waiting for decommissioning of the nodes (%d left)", nodes) 

 // wait for CLC to finish removing the nodes 

 select { 

 case <-ctx.Done(): 

 return ctx.Err() 

 case <-time.After(c.pollingInterval): 

 } 

 } 

 }

cluster-lifecycle-manager/pkg/updatestrategy/clc_update.go

Lines 109 to 140 in 769565c

 func (c *CLCUpdateStrategy) doUpdate(ctx context.Context, nodePoolDesc *api.NodePool) error { 

 for { 

 if err := ctx.Err(); err != nil { 

 return err 

 } 

 nodePool, err := c.nodePoolManager.GetPool(ctx, nodePoolDesc) 

 if err != nil { 

 return err 

 } 

 oldNodes, err := c.markNodes(ctx, nodePool, func(node *Node) bool { 

 return node.Generation != nodePool.Generation 

 }) 

 if err != nil { 

 return err 

 } 

 if oldNodes == 0 { 

 return nil 

 } 

 c.logger.WithField("node-pool", nodePoolDesc.Name).Infof("Waiting for decommissioning of old nodes (%d left)", oldNodes) 

 // wait for CLC to finish rolling the nodes 

 select { 

 case <-ctx.Done(): 

 return ctx.Err() 

 case <-time.After(c.pollingInterval): 

 } 

 } 

 }

and therefore does not update other node pools.

Check AZ status during upgrade

We should use the describe-availability-zones API to check if the zones are healthy during the upgrade and abort if there's any issues.

CLM should abandon outdated cluster updates

Current behaviour

Once CLM has detected an outdated cluster and starts to perform a cluster update, it will be stuck on that particular update until it's finished or CLM is restarted.

Problem

This can lead to issues when a problematic cluster update is in progress and needs to be fixed. After the fix is applied to our config repository or submitted via changes in Cluster Registry it's not going to be picked up by CLM unless CLM is restarted manually.

For example, configuring a wrong Ubuntu AMI will never allow CLM to finish the update. When the wrong AMI setting is fixed, CLM will not pick up that new configuration as it's still stuck in applying the old broken version.

Proposal

CLM should abandon on-going updates if there's already a new update available in the meantime. Similar to a build system, where an outdated in-progress build for the master branch is preempted in favour of the latest HEAD of the master branch that was just merged in.

There might be cases where it's desirable to first finish the current update and then start with the next update. However, CLM is mostly designed and used to not make such assumptions. Furthermore, we already assume that CLM can be restarted or fail and any point simulating the desired behaviour. Therefore, it seems safe to assume that such logic can be part of CLM itself.

On-premise cluster considerations

What would be required to support on-premise clusters? Obviously it would be some code changes, but given remote management such as IPMI and some decent tooling, this seems possible.

Include duration in logs when evicting pods

Would be nice if the CLM logged for how long it had tried to evict each pod, to make it more clear which pods are not evictable for a long time.

Run tests with -race on Travis

We don't run CDP builds for external contributors, and -race could catch some errors.

Dependabot can't parse your go.mod

Dependabot couldn't parse the go.mod found at /go.mod.

The error Dependabot encountered was:

go: github.com/go-swagger/[email protected] requires
	github.com/spf13/[email protected] requires
	github.com/grpc-ecosystem/[email protected] requires
	gopkg.in/[email protected]: invalid version: git fetch --unshallow -f origin in /opt/go/gopath/pkg/mod/cache/vcs/748bced43cf7672b862fbc52430e98581510f4f2c34fb30c0064b7102a68ae2c: exit status 128:
	fatal: The remote end hung up unexpectedly

View the update logs.

Fix coveralls check

It currently reports 100% coverage which is a lie..

CLM doesn't respect current size when setting max size

When the value of max nodes of a node pool is lower than the current CLM should refrain from setting it.

But today, it did.

Dependabot can't parse your go.mod

Dependabot couldn't parse the go.mod found at /go.mod.

You can mention @dependabot in the comments below to contact the Dependabot team.

certificate matching

zalando-incubator/kube-ingress-aws-controller#211

release notes

The current workflow to release this project does not include any release notes.
To document for us and the public audience we should use GH releases and set up release notes similar we do in in skipper, for example: https://github.com/zalando/skipper/releases/tag/v0.10.122

Idea:
I think we should have a 0.x.y release and document that it's not possible to really run this project as it is, without assumptions satisfied to run this project. We should document that as soon as we publish 1.0.0, people can use it "easily".

Memory usage causes errors and panics

CLM forks external processes when it updates the clusters (senza to prepare the template, kubectl to apply the manifests). The processes usually consume significant amount of memory, which means that if we use a large number of workers most of the calls will fail (or, even worse, stuff inside CLM will begin panic-ing because it can't allocate memory). However, we also don't want to reduce the number of workers because then we'll have to wait for ages to roll out the updates. Since most of the time is actually spent waiting in drain(), we should just set a separate limit of the number of external processes instead.

Decreasing the max-size of a node-pool doesn't terminate the nodes gracefully

If the node-pool size was originally 12 and was being fully used and we decrease the max-size of a node-pool i.e sth like:

zregistry clusters node-pools update --cluster-id <some-id> --node-pool <some-nodepool> --max-size 5

7 of these nodes will be terminated and not gracefully. We should fix this because it might lead to poor customer experience in case of a human error. If applications running on a cluster are affected significantly enough, this might also affect end users of those applications.

	func (c CLCUpdateStrategy) PrepareForRemoval(ctx context.Context, nodePoolDesc api.NodePool) error {
	c.logger.Infof("Preparing for removal of node pool '%s'", nodePoolDesc.Name)

	for {
	nodePool, err := c.nodePoolManager.GetPool(ctx, nodePoolDesc)
	if err != nil {
	return err
	}

	err = c.nodePoolManager.MarkPoolForDecommission(nodePoolDesc)
	if err != nil {
	return err
	}

	for _, node := range nodePool.Nodes {
	err := c.nodePoolManager.DisableReplacementNodeProvisioning(ctx, node)
	if err != nil {
	return err
	}
	}

	nodes, err := c.markNodes(ctx, nodePool, func(_ *Node) bool {
	return true
	})
	if err != nil {
	return err
	}

	if nodes == 0 {
	return nil
	}

	c.logger.WithField("node-pool", nodePoolDesc.Name).Infof("Waiting for decommissioning of the nodes (%d left)", nodes)

	// wait for CLC to finish removing the nodes
	select {
	case <-ctx.Done():
	return ctx.Err()
	case <-time.After(c.pollingInterval):
	}
	}
	}

	func (c CLCUpdateStrategy) doUpdate(ctx context.Context, nodePoolDesc api.NodePool) error {
	for {
	if err := ctx.Err(); err != nil {
	return err
	}

	nodePool, err := c.nodePoolManager.GetPool(ctx, nodePoolDesc)
	if err != nil {
	return err
	}

	oldNodes, err := c.markNodes(ctx, nodePool, func(node *Node) bool {
	return node.Generation != nodePool.Generation
	})
	if err != nil {
	return err
	}

	if oldNodes == 0 {
	return nil
	}

	c.logger.WithField("node-pool", nodePoolDesc.Name).Infof("Waiting for decommissioning of old nodes (%d left)", oldNodes)

	// wait for CLC to finish rolling the nodes
	select {
	case <-ctx.Done():
	return ctx.Err()
	case <-time.After(c.pollingInterval):
	}
	}
	}

zalando-incubator / cluster-lifecycle-manager Goto Github PK

cluster-lifecycle-manager's People

Contributors

Stargazers

Watchers

Forkers

cluster-lifecycle-manager's Issues

Feature request

Current behaviour

Problem

Proposal

Recommend Projects

Recommend Topics

Recommend Org