mhausenblas / kboom Goto Github PK

View Code? Open in Web Editor NEW

303.0 8.0 24.0 711 KB

The Kubernetes scale & soak load tester

License: Apache License 2.0

Go 61.50% Dockerfile 9.05% Shell 29.45%

kubernetes load-testing scale performance

kboom's Introduction

A simple Kubernetes load testing tool

NOTE: this is WIP and also this is not an official AWS tool. Provided as is and use at your own risk.

Think of kboom as the Kubernetes equivalent of boom, allowing you to create short-term load for scale testing and long-term load for soak testing. Supported load out of the box for scale testing are pods and custom resources via CRDs for soak testing is planned.

Check out the interactive demo.

Why bother?

I didn't find a usable tool to do Kubernetes-native load testing, for scalability and/or soak purposes. Here's where I can imagine kboom might be useful for you:

You are a cluster admin and want to test how much "fits" in the cluster. You use kboom for a scale test and see how many pods can be placed and how long it takes.
You are a cluster or namespace admin and want to test how long it takes to launch a set number of pods in a new cluster, comparing it with what you already know from an existing cluster.
You are developer and want to test your custom controller or operator. You use kboom for a long-term soak test of your controller.

Install

Before you begin, you will need kubectl client version v1.12.0 or higher for kubectl plugin support.

To install kboom, do the following:

$ curl https://raw.githubusercontent.com/mhausenblas/kboom/master/kboom -o kubectl-kboom
$ chmod +x kubectl-kboom
$ sudo mv ./kubectl-kboom /usr/local/bin

From this point on you can use it as a kubectl plugin as in kubectl kboom. However, in order for you to generate the load, you'll have to also give it the necessary permissions (note: you only need to do this once, per cluster):

$ kubectl create ns kboom
$ kubectl apply -f https://raw.githubusercontent.com/mhausenblas/kboom/master/permissions.yaml

Now you're set up and good to go, next up, learn how to use kboom.

Use

Here's how you'd use kboom to do some scale testing. The load test is run in-cluster as a Kubernetes job so you do multiple runs and compare outcomes in a straight-forward manner. Note that by default kboom assumes there's a namespace kboom available and it will run in this namespace. If this namespace doesn't exist, create it with kubectl create ns kboom or otherwise use the --namespace parameter to overwrite it.

So, first we use the generate command to generate the load, launching 10 pods (that is, using busybox containers that just sleep) with a timeout of 14 seconds (that is, if a pod is not running within that time, it's considered a failure):

$ kubectl kboom generate --mode=scale:14 --load=pods:10
job.batch/kboom created

From now on you can execute the results command as often as you like, you can see the live progress there:

$ kubectl kboom results
Server Version: v1.12.6-eks-d69f1b
Running a scale test, launching 10 pod(s) with a 14s timeout ...

-------- Results --------
Overall pods successful: 6 out of 10
Total runtime: 14.061988653s
Fastest pod: 9.003997546s
Slowest pod: 13.003831951s
p50 pods: 12.003529448s
p95 pods: 13.003831951s

When you're done, and don't need the results anymore, use kubectl kboom cleanup to get rid of the run. Note: should you execute the cleanup command too soon for kboom to terminate all its test pods, you can use kubectl delete po -l=generator=kboom to get rid of all orphaned pods.

Known issues and plans

Need to come up with stricter permissions, currently too wide and not following the least privileges principle.
Add support for custom resources and soak testing (running for many hours).
Add support for other core resources, such as services or deployments.

kboom's People

Stargazers

Watchers

kboom's Issues

scale target over time

It think it would be great to have a capability to scale target over time, so that one can use the tool to determine an optimal scale for some given QPS and explore the relationship of QPS vs scale and where the limits are. This maybe already possible with a wrapper, but would be nice to see built-in automation for it.

Use Pod watch instead of polling

Currently kboom using a polling approach to see if all Pods are up and running. We should switch to a watch based approach because of the following benefits:

More fine granular start times (otherwise the granularity is limited to the polling interval)
Less overhead on the control plane since we only get updates for Pods

Installation & Packaging

Hi, I've seen the project and was wondering if it might be a good idea to add the plugin to the krew package index.

Krew is a kubectl plugin manager and it would be great to see kboom in there.

https://github.com/kubernetes-sigs/krew

If there are any questions I'm happy to help :)

Set up build automation

serviceaccount kboom-sa and ImagePullBackOff

Thanks for the nice work Michael!

After creating a service account named kboom-sa (whic is missing), I'm getting:

$ k logs kboom-trd2r
Error from server (BadRequest): container "kboom" in pod "kboom-trd2r" is waiting to start: trying and failing to pull image

seems ecr needs proper auth :-)

$ k describe pod kboom-trd2r
Failed to pull image "661776721573.dkr.ecr.us-east-2.amazonaws.com/kboom:latest": rpc 

error: code = Unknown desc = Error response from daemon: Get 

https://661776721573.dkr.ecr.us-east-2.amazonaws.com/v2/kboom/manifests/latest: no basic auth credentials

And

k create -f permissions.yaml

is needed.

Decode json errors

I'm occasionally getting spurious json decode errors when trying to run larger numbers of pods.

kubectl kboom generate --mode=scale:60 --load=pods:1000

<snip>
2019/04/27 04:10:35 Can't create pod scale-sleeper-936: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-1176: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-767: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-275: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-896: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-160: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-376: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-130: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-921: decode error status 429: decode json: invalid character 'T' looking for beginning of value
2019/04/27 04:10:35 Can't create pod scale-sleeper-403: decode error status 429: decode json: invalid character 'T' looking for beginning of value
<snip>

[BUG] Running on OpenShift gives

Describe the bug

Installed as per the readme.

Running kboom in OpenShift fails due to the SSC

kubectl kboom generate --mode=scale:14 --load=pods:100

kubectl kboom results

2021/12/08 17:18:52 Can't create pod scale-sleeper-19: kubernetes api: Failure 403 pods "scale-sleeper-19" is forbidden: unable to validate against any security context constraint: [spec.containers[0].securityContext.runAsUser: Invalid value: 65534: must be in the ranges: [1000690000, 1000699999]]

To Reproduce

OpenShift 4.7

Expected behavior
Successful run

Additional context

OpenShift uses a security context, which is what is being referenced.
https://docs.openshift.com/container-platform/4.7/cicd/pipelines/using-pods-in-a-privileged-security-context.html

Option to include PVC creation and volumeMounts in the Pod

In my experience with different Kubernetes services and BYO solutions there's disparity in between vendors and in-tree/out-of-tree drivers. It could be how effective PVC creation and subsequent PV binding perform. What also differs is how reliable attach/detach is during node drains or mass deployment, which sometimes fails altogether.

"Storage is hard"

... is the most used term in this space and it would therefor be very handy to use kboom as a sizing/discovery tool for what the optimal PVC/PV count is for a certain cluster and node size. Once you have an idea of where the boundaries are the cluster-admin can then divide volume counts across the projected namespaces and therefor make the cluster safer and won't run off a cliff blindly when not knowing where the ceiling is for the particular environment.

custom image instead of busybox

Is it possibly to provide a different image than busybox on the cli?