openshift / cluster-api-actuator-pkg Goto Github PK

License: Apache License 2.0

Go 98.55% Makefile 0.81% Shell 0.64%

cluster-api-actuator-pkg's Introduction

cluster-api-actuator-pkg

Shared packages for Cluster API actuators.

Running the cluster autoscaler operator e2e tests against an OpenShift cluster

This test suite is designed to run against a full OpenShift 4 cluster. The test suite is agnostic of the hosting environment and the choice of cloud provider is up to the reader.

These instructions are written for the cluster autoscaler operator though should work for any project using the cluster-api-actuator-pkg.

Create a cluster

The easiest way to get a cluster to test against is to use an installer that supports Installer-Provisioned Infrastructure (IPI).

Instructions for creating an IPI cluster are available for the following cloud providers:

Deploy the code to test

Before making any changes to the cluster components you wish to test, you must disable the Cluster-Version Operator (CVO). If you do not disable the CVO, when you try to deploy your test code, the CVO will revert the component back to the original version.

To disable the CVO, scale its deployment to 0 replicas:

oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator

Now deploy the code either directly into the cluster, or by running it locally.

Deploy the code to test within the cluster

To deploy your test code within the cluster, you must first build and push a container image to a repository. Once pushed, override the image within the deployment to deploy your code for testing:

oc set image -n machine-api-operator deployment/cluster-autoscaler-operator cluster-autoscaler-operator=<YOUR CONTAINER IMAGE>

Deploy the code to test locally

To deploy your test code locally, you must first disable the existing operator running within the OpenShift cluster:

oc scale --replicas 0 -n openshift-machine-api deployments/cluster-autoscaler-operator

Once the operator has been disabled, build your code to test locally and run it on your machine, pointing it to the cluster by passing the appropriate --kubeconfig flag:

make build

./bin/cluster-autoscaler-operator --kubeconfig=<PATH/TO/YOUR/CLUSTERS/KUBECONFIG>

Build the e2e tests

make build-e2e

Run the autoscaler e2e tests

NAMESPACE=kube-system ./hack/ci-integration.sh -focus "Autoscaler should" -v --dry-run

Adjust -focus as appropriate.

Some example expected output:

Running Suite: Machine Suite
============================
Random Seed: 1617813491
Will run 4 of 33 specs

SSSSSSSSSSSSSSSSSS
------------------------------
[Feature:Machines] Autoscaler should use a ClusterAutoscaler that has 100 maximum total nodes count 
  It scales from/to zero
  /cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:306
•
------------------------------
[Feature:Machines] Autoscaler should use a ClusterAutoscaler that has 100 maximum total nodes count 
  cleanup deletion information after scale down [Slow]
  /cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:370
•
------------------------------
[Feature:Machines] Autoscaler should use a ClusterAutoscaler that has 12 maximum total nodes count and balance similar nodes enabled 
  scales up and down while respecting MaxNodesTotal [Slow][Serial]
  /cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:521
•
------------------------------
[Feature:Machines] Autoscaler should use a ClusterAutoscaler that has 12 maximum total nodes count and balance similar nodes enabled 
  places nodes evenly across node groups [Slow]
  /cluster-api-actuator-pkg/pkg/autoscaler/autoscaler.go:605
•SSSSSSSSSSS
Ran 4 of 33 Specs in 0.000 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 29 Skipped
PASS

Ginkgo ran 1 suite in 1.887727166s
Test Suite Passed

cluster-api-actuator-pkg's People

Stargazers

Watchers

cluster-api-actuator-pkg's Issues

Future Release Branches Frozen For Merging | branch:release-4.16 branch:release-4.17

The following branches are being fast-forwarded from the current development branch (master) as placeholders for future releases. No merging is allowed into these release branches until they are unfrozen for production release.

release-4.16
release-4.17

For more information, see the branching documentation.

Future Release Branches Frozen For Merging | branch:release-4.10 branch:release-4.9

release-4.10
release-4.9

Contact the Test Platform or Automated Release teams for more information.

Switch glog for klog

Klog is a drop in replacement for glog which we use in various places throughout the test packages. However, klog has an advantage over glog. We can use klog.SetOutput(GinkgoWrite) to redirect log lines to the Ginkgo system. This means that if a particular test passes, the logs from that test are not shown, and if the test fails, only the logs from the failed test are shown.

This will help debugging failed tests as it will remove spurious logs from the test output.

Test should fail fast incsae of Unknown Platform

Executed the e2e test against a Power VS ocp cluster

All the tests were skipped because of unknown platform PowerVS but it took 3 hours and almost destroyed the cluster

./hack/ci-integration.sh -focus "Webhook" -v

Sample output and Results

S [SKIPPING] in Spec Setup (BeforeEach) [2709.244 seconds]
[Feature:Machines] Webhooks
/Users/karthikkn/IPI/cluster-api-actuator-pkg/pkg/infra/webhooks.go:23
  should return an error when removing required fields from the MachineSet providerSpec [BeforeEach]
  /Users/karthikkn/IPI/cluster-api-actuator-pkg/pkg/infra/webhooks.go:138  Platform PowerVS does not have webhooks, skipping.  /Users/karthikkn/IPI/cluster-api-actuator-pkg/pkg/infra/webhooks.go:44
------------------------------
SSSSSSSSSSSSS
Ran 0 of 33 Specs in 10839.695 seconds
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 33 Skipped
PASS

The cluster state after the test

karthikkn@Karthiks-MacBook-Pro ingress % oc get nodes -A
NAME                               STATUS                     ROLES    AGE   VERSION
rdr-kar27-ocp-mjvlj-master-1       Ready,SchedulingDisabled   master   43h   v1.22.1+8e73d6b
rdr-kar27-ocp-mjvlj-master-2       Ready,SchedulingDisabled   master   43h   v1.22.1+8e73d6b
rdr-kar27-ocp-mjvlj-worker-9kk2g   Ready,SchedulingDisabled   worker   43h   v1.22.1+8e73d6b

It deleted all the worker nodes and machineset of cluster

Use multiarch image in drain node before removing machine resource test

While running the "drain node before removing machine resource" test on Power platform(ppc64le architecture) I could see that the Pods created by replication controller were in CrashloopBackoff state, Later on changing the image to RHEL UBI image I could see the Pods in running state and the tests were passed.

Can we change this image to have a image which has multiarch support.

Use native error handling from ks8.oi/apimachinery

Use native error checking functions from github.com/kubernetes/apimachinery project [1] rather than implementing custom error checks as:

if strings.Contains(err.Error(), "not found") {
	return true, nil
}

Also valid for other error checks, e.g. already exists errors.

[1] https://github.com/kubernetes/apimachinery/blob/master/pkg/api/errors/errors.go#L418

Future Release Branches Frozen For Merging | branch:release-4.17 branch:release-4.18

release-4.17
release-4.18