Giter Club home page Giter Club logo

machine-controller's Introduction

last stable release go report card godoc

Overview / User Guides

Kubermatic Kubernetes Platform is in an open source project to centrally manage the global automation of thousands of Kubernetes clusters across multicloud, on-prem and edge with unparalleled density and resilience.

All user documentation is available at the Kubermatic Kubernetes Platform docs website.

Editions

There are two editions of Kubermatic Kubernetes Platform:

Kubermatic Kubernetes Platform Community Edition (CE) is available freely under the Apache License, Version 2.0. Kubermatic Kubernetes Platform Enterprise Edition (EE) includes premium features that are most useful for organizations with large-scale Kubernetes installations with more than 50 clusters. To access the Enterprise Edition and get official support please become a subscriber.

Licensing

See the LICENSE file for licensing information as it pertains to files in this repository.

Installation

We strongly recommend that you use an official release of Kubermatic Kubernetes Platform. Follow the instructions under the Installation section of our documentation to get started.

The code and sample YAML files in the main branch of the kubermatic repository are under active development and are not guaranteed to be stable. Use them at your own risk!

More information

The documentation provides a getting started guide, plus information about building from source, architecture, extending kubermatic, and more.

Please use the version selector at the top of the site to ensure you are using the appropriate documentation for your version of kubermatic.

Troubleshooting

If you encounter issues file an issue or talk to us on the #kubermatic channel on the Kubermatic Community Slack (click here to join).

Contributing

Thanks for taking the time to join our community and start contributing!

Before you start

  • Please familiarize yourself with the Code of Conduct before contributing.
  • See CONTRIBUTING.md for instructions on the developer certificate of origin that we require.

Repository layout

├── addons    # Default Kubernetes addons
├── charts    # The Helm charts we use to deploy
├── cmd       # Various Kubermatic binaries for the controller-managers, operator etc.
├── codegen   # Helper programs to generate Go code and Helm charts
├── docs      # Some basic developer-oriented documentation
├── hack      # scripts for development and CI
└── pkg       # most of the actual codebase

Development environment

git clone [email protected]:kubermatic/kubermatic.git
cd kubermatic

There are a couple of scripts in the hacks directory to aid in running the components locally for testing purposes.

Running components locally

user-cluster-controller-manager

In order to instrument the seed-controller to allow for a local user-cluster-controller-manager, you need to add a worker-name label with your local machine's name as its value. Additionally, you need to scale down the already running deployment.

# Using a kubeconfig, which points to the seed-cluster
export cluster_id="<id-of-your-user-cluster>"
kubectl label cluster ${cluster_id} worker-name=$(uname -n)
kubectl scale deployment -n cluster-${cluster_id} usercluster-controller --replicas=0

Afterwards, you can start your local user-cluster-controller-manager.

# Using a kubeconfig, which points to the seed-cluster
./hack/run-user-cluster-controller-manager.sh
seed-controller-manager
./hack/run-seed-controller-manager.sh
master-controller-manager
./hack/run-master-controller-manager.sh

Run linters

Before every push, make sure you run:

make lint

Run tests

make test

Update code generation

The Kubernetes code-generator tool does not work outside of GOPATH (upstream issue), so the script below will automatically run the code generation in a Docker container.

hack/update-codegen.sh

Pull requests

  • We welcome pull requests. Feel free to dig through the issues and jump in.

Changelog

See the list of releases to find out about feature changes.

machine-controller's People

Contributors

ahmedwaleedmalik avatar alvaroaleman avatar anx-mschaefer avatar dependabot[bot] avatar dermorz avatar eiabea avatar embik avatar guusvw avatar happy2c0de avatar hdurand0710 avatar irozzo-1a avatar kdomanski avatar kron4eg avatar littlefox94 avatar lucakuendig avatar mate4st avatar mfranczy avatar mlavacca avatar moadqassem avatar moelsayed avatar mrincompetent avatar nikhita avatar p0lyn0mial avatar pratikdeoghare avatar sachintiptur avatar sankalp-r avatar thz avatar wozniakjan avatar xmudrii avatar xrstf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-controller's Issues

Vsphere E2E tests

Add the following test cases to the existing E2E test suite.

Ubuntu + Docker 1.13
Ubuntu + Docker 17.03
Ubuntu + CRI-O 1.9

Clean up .circle/config.yaml, so that it doesn't run the test using create-and-destroy-machine.sh script

Test and document centos on vsphere

As a user I want to be able to spin up worker nodes on vsphere that use CentOS as distribution.

Acceptance criteria:

  • There is a documentation on how to import/create a suitable image for centos on vsphere
  • An image in our vsphere test cluster was created by following the steps documented
  • The e2e tests are extended to also test CentOS on vsphere

add prometheus metrics

Add the following metrics:

  • Total number of errors
  • Total number of machines
  • Total number of nodes
  • Time how long it took to create/delete a instance on the cloud provider
  • Timedifference between node.CreationTimestamp -
    machine.CreationTimestamp

Not possible to delete machines using AWS

We have 2 clusters on dev.kubermatic.io which cannot be deleted because the machine-controller is not able to delete the machines.

Logs:
kubectl -n cluster-dt56ds7tsb logs machine-controller-559788b7f9-89q9v

E0411 07:29:28.561133       1 machine.go:200] machine-kubermatic-dt56ds7tsb-gf4xr failed with: failed to delete machine at cloudprovider, due to instance not found
E0411 07:29:28.594842       1 machine.go:200] machine-kubermatic-dt56ds7tsb-d5pgz failed with: failed to delete machine at cloudprovider, due to instance not found
E0411 07:29:28.613675       1 machine.go:200] machine-kubermatic-dt56ds7tsb-64ql4 failed with: failed to delete machine at cloudprovider, due to instance not found

e2e tests modify manifest by providing a field selector

At the moment tests replace desired fields in the manifest based on string matching. For example:

params = fmt.Sprintf("%s,<< MACHINE_NAME >>=%s,<< NODE_NAME >>=%s", params, machineName, nodeName)
params = fmt.Sprintf("%s,<< OS_NAME >>=%s,<< CONTAINER_RUNTIME >>=%s,<< CONTAINER_RUNTIME_VERSION >>=%s", params, testCase.osName, testCase.containerRuntime, testCase.containerRuntimeVersion)

we would like to change that by providing the field path for example spec.providerConfig. cloudProvider. this would not only look better but would also allow to consume manifest under example directory.

Extend circle pipeline

Whats missing:
Building a docker image

  • On push using the commit hash as docker tag
  • On tag using the git tag as docker tag & latest

Running into AWS rate-limits

When creating 5 machines simultanously, we're getting rate limited by AWS - on all machines.

It seems it happens during validation. Thus the errors we get from AWS are being handled as terminal.

Make container runtime version optional

Based upon the entered Kubernetes version and the selected OS we should default to a docker/cri-o version.

For now the logic should be:

  • cri-o
    • Kubernetes v1.8 + Ubuntu 16.04 -> error as theres no cri-o 1.8 package in the repos
    • Kubernetes v1.9 + Ubuntu 16.04 -> cri-o 1.9
    • Kubernetes v1.8 + Container Linux -> error as theres no cri-o for coreos
    • Kubernetes v1.8 + Container Linux -> error as theres no cri-o for coreos
  • docker
    • Kubernetes v1.8 + Ubuntu 16.04 -> docker 1.13
    • Kubernetes v1.9 + Ubuntu 16.04 -> docker 1.13
    • Kubernetes v1.8 + Container Linux -> docker 1.12
    • Kubernetes v1.8 + Container Linux -> docker 1.12

Defaulting for Openstack

Usage of the Openstack provider would be easier if there was defaulting for

  • availabilityZone
  • Region
  • Network
  • Subnet
  • FloatingIPPool

To achieve this, the machine-controller should request a list of the given resource, check if there is exactly one and if yes default to that.

RBAC broken

#56 apparently broke RBAC:

GET https://10.96.0.1:443/api/v1/configmaps?resourceVersion=13569859&timeoutSeconds=346&watch=true
I0203 01:42:59.303274       1 round_trippers.go:439] Response Status: 403 Forbidden in 1 milliseconds
I0203 01:42:59.303287       1 round_trippers.go:442] Response Headers:
I0203 01:42:59.303293       1 round_trippers.go:445]     Content-Type: application/json
I0203 01:42:59.303298       1 round_trippers.go:445]     X-Content-Type-Options: nosniff
I0203 01:42:59.303302       1 round_trippers.go:445]     Content-Length: 277
I0203 01:42:59.303307       1 round_trippers.go:445]     Date: Sat, 03 Feb 2018 01:42:59 GMT
I0203 01:42:59.303628       1 request.go:873] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"configmaps is forbidden: User \"system:serviceaccount:kube-system:machine-controller\" cannot watch configmaps at the cluster scope","reason":"Forbidden","details":{"kind":"configmaps"},"code":403}

What surprises me a little is that it watches all configmaps, shouldn't a watch on the cluster-info configmap in the kube-public namespace be enough?

Allow to specify ssh-key via flag

Current state:
On initial start, we check if a secret with a private ssh key exists.
If no secret is found, we generate a secret with a private key.

This ssh key will be later used when creating instances at cloud-providers.
This was made so the user does not have to specify a ssh public key in the machine-manifest, as some cloud providers require to specify a public key when creating a instance (aws).

All public keys from the machine manifest are getting deployed via cloud-init.

Desired state:
The controller should accept a path to a private key via a command line flag.
If the flag is specified and a valid key got found, this key should be taken.
If no flag was specified or the key was not found, the old logic with the secret should apply.

Make security-group creation on aws a fallback

We need to add a config variable for the securityGroups & should only create a security-group on AWS when none is defined. As a convenience/quickstart help.
Also we need to log this with a loglevel of 2 probably.

Openstack: Floating IPs are not reused which may result in FIP exhaustion

Basically title, from the the machine controller log:

E0120 13:47:35.431740       1 machine.go:162] machine-controller failed with: failed to create machine at cloudprovider: failed to allocate a floating ip: Expected HTTP response code [201 202] when accessing [POST http://192.168.0.39:9696/v2.0/floatingips], but got 409 instead
{"NeutronError": {"message": "No more IP addresses available on network 06fb6e98-4e98-4320-9f00-34e028ed53cb.", "type": "IpAddressGenerationFailure", "detail": ""}}

I'd expect the machine-controller to reuse already assigned but unused FIPs instead of requesting a new one.

running machine-controller through leader election is optional

the machine controller has been incorporated into kubermatic and is an inherent part of every cluster.
That made local development/testing impossible as is highly possible that the machine controller which runs inside kubermatic will acquire a lock right before the local instance.

Making leader election optional seems to remedy this issue.

Do not use cluster-info configmap anymore

Right now the machine-controller uses the cluster-info configmap to get the CACert and the endpoint for the apiserver.

Instead it should get the CACert from its kubeconfig or from /run and the apiserver endpoints from its kubeconfig or from the endpoints of the kubernetes service when running in-cluster.

This will reduce the configuration overhead and help ppl to get started faster.

Use `kubeadm join` instead of manually maintaining kubelet config

Right now we maintain the kubelet config as part of the distro-specific templates. This has some drawbacks:

  • We may miss important configuration parameters
  • Whenever we change something, we have to change it at multiple places
  • There is no way to have different configs based on Kubelet version

Instead it would be easier if we just used kubeadm join to configure the Kubelet

Extend e2e testing

We should add the following test cases:

  • Hetzner
    • Ubuntu + Docker 1.13
    • Ubuntu + Docker 17.03
    • Ubuntu + CRI-O 1.9
  • Digitalocean
    • Ubuntu + Docker 1.13
    • Ubuntu + Docker 17.03
    • Ubuntu + CRI-O 1.9
    • CoreOS + Docker 1.13
    • CoreOS + Docker 17.03
  • AWS
    • Ubuntu + Docker 1.13
    • Ubuntu + Docker 17.03
    • Ubuntu + CRI-O 1.9
    • CoreOS + Docker 1.13
    • CoreOS + Docker 17.03
  • Openstack (We need a sponsor here)
    • Ubuntu + Docker 1.13
    • Ubuntu + Docker 17.03
    • Ubuntu + CRI-O 1.9
    • CoreOS + Docker 1.13
    • CoreOS + Docker 17.03

Hetzner E2E tests

Add the following test cases to the existing E2E test suite.

Ubuntu + Docker 1.13
Ubuntu + Docker 17.03
Ubuntu + CRI-O 1.9

Clean up .circle/config.yaml, so that it doesn't run the test using create-and-destroy-machine.sh script

Add integration testing script

To be able to properly validate the machine-controller is working as intended, we need some kind of integration testing.

Because it is not possible to both test external PRs automatically and be sure they are not used to steal credentials, this script is not supposed to be executed automatically. Instead it will:

  • Take credentials for a cloud provider, e.G. from environment
  • Take a ssh pubkey
  • Create a single node k8s cluster via kubeadm at cloudprovider
  • Deploy machine-controller in a version built from git HEAD into the newly created cluster using the deployment in the repo
  • Verify machine-controller is running
  • Provide a teardown functionality

Schedule E2E night tests run

Since running the complete e2e suite takes too long, as a temporary step we could schedule a night tests run. Running the test frequently would increase confidence and hopefully would reveal potential issues that might crop up.

Move cloudprovider secrets out of machine definition and into a secret

Right now the machine definition contains all access secrets to the cloud provider it is spawned on. This has two drawbacks:

  • Someone who shall have the permission to create machines has to know these credentials
  • There are some objects (e.G. security groups, ssh keys) whose lifetime is not coupled to a machine but to the usage of the cloudprovider, meaning as long as any machine uses that cloud provider, they have to exist

Instead we want to move the cloudprovider secrets into an actual secret which is then referenced by machines.

Add end-to-end testing to CircleCI pipeline

To better know if PRs add bugs, we should add the existing end-to-end testing to the CircleCI-Pipeline.

This requires:

  • Configuring the ssh keypair name at the cloudprovider with a random prefix
  • Deleting the ssh keypair before ending the e2e test
  • Adding the test-e2e target to circleci

Trigger events

With the implementation of transient and terminal errors we now correctly set machine.status.errorReason & machine.status.errorMessage when the controller runs into a terminal error.

Transient errors though are not reported back. The only way to see those is by investigating the logs.
Instead if just logging, we should trigger a event which is attached to the machine.

simple e2e test tool

having a simple command line tool that would verify whether a node has been created serves not only as a good warm up exercise but also as a handy test tool.

the idea is that we would have a list of predefined machine manifests that would need some customisation in terms of credentials. The credentials could be accepted as a command line arguments and passed all the way down to the manifests. After POST'ing the given manifests to the kube-api server the test tool would read the current cluster state in order to determine the correctness of machine-controller

the test tool would use the standard client-go library to talk to the api server and would read the kubeconfig configuration file to discover where the cluster is actually located.

assumptions:

  • cluster was created manually
  • kube config is accessible
  • there is a list of predefined machine manifests

for example, running the following command: verify -input path_to_manifest -parameters key=value, key2=value would print a machine "node-docker" has been crated to stdout.

Parse versions via a semver library

We need to parse the user given versions (kubelet & container runtime) to process them correctly in the end.
Especially so we can accept v1.9.2 and 1.9.2 as input.
Currently we require the kubelet version to have a leading v but we dont require it for the container runtime version

deadlock when trying to delete a machine

Steps to reproduce :

  1. Create an invalid machine - you can use the following manifest that doesn't specify required credentials https://github.com/kubermatic/machine-controller/blob/master/examples/machine-digitalocean.yaml
  2. Delete the previously created machine.
  3. List machine resources

Result:
The machine was not deleted and the server keeps saying machine1 failed with: failed to get instance for machine machine1 after the delete got triggered
The only way of getting out of this situation is to manually edit the machine's spec and remove finalizers.

In general the described state exists because we add finalizers to a machine before creating a node because we want to prevent deletion of a machine resource.

As I can imagine that the call that requests a node can fail for many reasons, I think that this issue could help us track discussion on possible solutions to this issue.

process machines which were annotated

the machine controller has been incorporated into kubermatic and is an inherent part of every cluster. That made local development/testing impossible as every machine is processed by incluster machine controller.

we could annotate a machine manifest with some arbitrary data and at the same time we could introduce a new command line flag. On a successful match a controller should continue otherwise it should leave it to others. An empty annotation means there is no preference.

Add support for accepting cloud-provider credentials as EnvVar's

The Machine object accepts multiple sources for cloudProviderSpec fields:

  • Direct value
...
spec:
...
  providerConfig:
    cloudProvider: "aws"
    cloudProviderSpec:
      accessKeyId: "foo"
  • Secret ref
...
spec:
...
  providerConfig:
    cloudProvider: "aws"
    cloudProviderSpec:
      accessKeyId:
        secretKeyRef:
          namespace: kube-system
          name: machine-controller-aws
          key: accessKeyId
  • ConfigMap ref
...
spec:
...
  providerConfig:
    cloudProvider: "aws"
    cloudProviderSpec:
      accessKeyId:
        configMapKeyRef:
          namespace: kube-system
          name: machine-controller-aws
          key: accessKeyId

It should also be possible to pass in the secret values implicitly as environment variable.
The secret values differ from cloud provider.

  • AWS
    • Access Key ID
    • Secret Access Key
  • Hetzner
    • Token
  • Digitalocean
    • Token
  • OpenStack
    • Username
    • Password

Each secret field needs to have one specific environment key. Like AWS_ACCESS_KEY_ID.
During the processing of the cloudProviderSpec we would need to check if the environment variable is set, and if so we need to use this value.

Reason: In scenarios where the master components is managed by an external entity (Loodse kubermatic/ SAP Gardener) it might not be possible to expose the cloud provider specific secrets to the users.

Create temporary ssh key during instance creation when required. Delete afterwards

Current state:
On initial start, we check if a secret with a private ssh key exists.
If no secret is found, we generate a secret with a private key.

This ssh key will be later used when creating instances at cloud-providers.
This was made so the user does not have to specify a ssh public key in the machine-manifest, as some cloud providers require to specify a public key when creating a instance (digitalocean).

All public keys from the machine manifest are getting deployed via cloud-init.

Desired state:
The whole ssh key logic should be removed.
If a cloud provider requires a ssh key during instance creation:

  • Create a temporary key before the instance get created
  • Use the temporary key for instance creation
  • Delete the temporary key after the key has been created

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.