Giter Club home page Giter Club logo

assisted-swarm's Introduction

! Warning !

Although the code in this repo tries its best to be non-destructive, it has a big potential to mess up the machine it's running on - so you should probably run it in a disposable VM.

What is this

This is a tool to launch a swarm of asssisted installer agents (and their corresponding cluster CRs) that look to the service like actual cluster host agents, all the way from discovery/bmh to completed installation and controller progress reports, for the purpose of load-testing the service. This is made possible by utilizing the dry-run mode of the agent/installer/controller.

Background

Originally, the assisted installer service has been load-tested by installing actual clusters on thousands of VMs. That approach has the advantage of testing the real thing - complete e2e OCP assisted installation - it gave a perfectly accurate representation of the load on the service and also helped find rare installation bugs. However, using this method is very costly and requires a lot of machines to host all these VMs. Since this amount of hardware is not always immediately available, a need arose to find a cheaper, less hardware-intensive way to load test the service, by faking agent traffic rather then performing actual installations. This has the obvious downside that it doesn't find actual rare installation bugs, but at-least it allows us to check how the service handles a lot of seemingly real agent traffic.

I considered two approaches to fake agent traffic:

  1. Complete emulation - using tools such as JMeter or Locust, or writing custom fake agents that behave like real one, then running a lot of those on a single machine.

  2. No-op agent - use the existing agent, as is, let it do everything it usually does - but replace destructive actions such as installation, disk wiping, etc. with no-ops. The assisted controller that runs on the cluster (the cluster which doesn't really exist in this case) will simply run locally, and will also be modified to use mocked kube-api calls that cause it to feel like it's running on an actual cluster.

Option (1) is definitely possible, but I felt it would be hard to maintain it and keep it up to date with all the API changes / agent behavior changes that will be added in the future.

Option (2) is what this repo is. Originally this repo contained the patches to make the agent/installer/controller no-op, but today the patches have been upstreamed and now the agent/installer/controller have a "dry run" mode that does exactly that, and this repo makes use of that

Architecture

Alt text

TODO

  • Run with auth enabled (load testing without auth is a bit unfair - I presume it adds a lot of CPU usage)
  • Query prometheus, extract interesting metrics (graphana dashboards? matplotlib?)
  • AI SaaS clusters (currently only kube-api is supported)

Usage

  1. Launch a kube-api assisted service on your cluster. This part is up to you. Make sure the service is accessible from the machine running the swarm.
  2. Configure the service -
    • AUTH_TYPE set to none
    • SKIP_CERT_VERIFICATION set to "true"
    • HW_VALIDATOR_REQUIREMENTS can optionally be modified if your swarm machine has less RAM/Storage then is required by default
  3. See "Getting rid of CBO on OpenShift section below"
  4. On the swarm machine, install the packages in requirements.txt and make sure you have kubectl and oc binaries in your PATH.
  5. Prepare a test plan - see testplan.example.yaml
  6. Prepare a service config file - see service_config.example.yaml
  7. Install requirements.txt - python3 -m pip install -r requirements.txt
  8. Ensure you're using a fairly modern version of podman (3.4 or later)
  9. Use sudo to run ./main.py, for example, to run with the example configurations and KUBECONFIG at /path/to/kubeconfig, run:

sudo KUBECONFIG=/path/to/kubeconfig ./main.py 200 testplan.example.yaml service_config.example.yaml

Getting rid of CBO on OpenShift

If you're running CBO on your hub cluster, you'd have to scale it down to 0 so it won't interfere with the swarm's BMH simulation.

  1. Set cluster-baremetal-operator to unmanaged, so CVO doesn't fight us when we later try to scale it down -
$ cat <<EOF >cbo-patch.yaml
- op: add
  path: /spec/overrides
  value:
  - kind: Deployment
    group: apps
    name: cluster-baremetal-operator
    namespace: openshift-machine-api
    unmanaged: true
EOF
$ oc patch clusterversion version --type json -p "$(cat cbo-patch.yaml)"
  1. Scale the bare metal operator deployment to 0
$ oc scale deployment/cluster-baremetal-operator -n openshift-machine-api --replicas=0
  1. Scale the metal3 deployment to 0
$ oc scale deployment/metal3 -n openshift-machine-api --replicas=0
  1. Remove baremetal-operator webhook
$ oc delete validatingwebhookconfiguration baremetal-operator-validating-webhook-configuration

assisted-swarm's People

Contributors

omertuc avatar ori-amizur avatar romfreiman avatar trewest avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.