kubermatic / machine-controller Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
It should be possible to set the metadata on a openstack instance via a machine.yaml
Very similar to the tags in AWS
At the moment running the complete test suite takes ~1h. This prevents us from running the test suite on "pull request" basis.
We would like to take a deeper look into this and hopefully improve test time as much as possible.
the completed runs:
https://circleci.com/gh/kubermatic/machine-controller/2738
Add the following test cases to the existing E2E test suite.
Ubuntu + Docker 1.13
Ubuntu + Docker 17.03
Ubuntu + CRI-O 1.9
Clean up .circle/config.yaml
, so that it doesn't run the test using create-and-destroy-machine.sh
script
Add the following test cases to the existing E2E test suite.
Ubuntu + Docker 1.13
Ubuntu + Docker 17.03
Ubuntu + CRI-O 1.9
Clean up .circle/config.yaml
, so that it doesn't run the test using create-and-destroy-machine.sh
script
the machine controller has been incorporated into kubermatic and is an inherent part of every cluster. That made local development/testing impossible as every machine
is processed by incluster machine controller.
we could annotate a machine manifest with some arbitrary data and at the same time we could introduce a new command line flag. On a successful match a controller should continue otherwise it should leave it to others. An empty annotation means there is no preference.
Steps to reproduce :
Result:
The machine was not deleted and the server keeps saying machine1 failed with: failed to get instance for machine machine1 after the delete got triggered
The only way of getting out of this situation is to manually edit the machine's spec and remove finalizers.
In general the described state exists because we add finalizers to a machine before creating a node because we want to prevent deletion of a machine resource.
As I can imagine that the call that requests a node can fail for many reasons, I think that this issue could help us track discussion on possible solutions to this issue.
Since running the complete e2e suite takes too long, as a temporary step we could schedule a night tests run. Running the test frequently would increase confidence and hopefully would reveal potential issues that might crop up.
Usage of the Openstack provider would be easier if there was defaulting for
To achieve this, the machine-controller should request a list of the given resource, check if there is exactly one and if yes default to that.
Circle-CI needs to build when we add a tag.
The tag should be used for creating the docker-image
the machine controller has been incorporated into kubermatic and is an inherent part of every cluster.
That made local development/testing impossible as is highly possible that the machine controller which runs inside kubermatic will acquire a lock right before the local instance.
Making leader election optional seems to remedy this issue.
Based upon the entered Kubernetes version and the selected OS we should default to a docker/cri-o version.
For now the logic should be:
Right now the machine-controller
uses the cluster-info
configmap to get the CACert and the endpoint for the apiserver.
Instead it should get the CACert from its kubeconfig
or from /run
and the apiserver endpoints from its kubeconfig
or from the endpoints of the kubernetes
service when running in-cluster.
This will reduce the configuration overhead and help ppl to get started faster.
When creating 5 machines simultanously, we're getting rate limited by AWS - on all machines.
It seems it happens during validation. Thus the errors we get from AWS are being handled as terminal.
Aside from RedHat ContainerLinux and Ubuntu, we should also support Enterprise Linux based distros, e.G. CentOS
The Machine object accepts multiple sources for cloudProviderSpec fields:
...
spec:
...
providerConfig:
cloudProvider: "aws"
cloudProviderSpec:
accessKeyId: "foo"
...
spec:
...
providerConfig:
cloudProvider: "aws"
cloudProviderSpec:
accessKeyId:
secretKeyRef:
namespace: kube-system
name: machine-controller-aws
key: accessKeyId
...
spec:
...
providerConfig:
cloudProvider: "aws"
cloudProviderSpec:
accessKeyId:
configMapKeyRef:
namespace: kube-system
name: machine-controller-aws
key: accessKeyId
It should also be possible to pass in the secret values implicitly as environment variable.
The secret values differ from cloud provider.
Each secret field needs to have one specific environment key. Like AWS_ACCESS_KEY_ID
.
During the processing of the cloudProviderSpec
we would need to check if the environment variable is set, and if so we need to use this value.
Reason: In scenarios where the master components is managed by an external entity (Loodse kubermatic/ SAP Gardener) it might not be possible to expose the cloud provider specific secrets to the users.
Basically title, from the the machine controller log:
E0120 13:47:35.431740 1 machine.go:162] machine-controller failed with: failed to create machine at cloudprovider: failed to allocate a floating ip: Expected HTTP response code [201 202] when accessing [POST http://192.168.0.39:9696/v2.0/floatingips], but got 409 instead
{"NeutronError": {"message": "No more IP addresses available on network 06fb6e98-4e98-4320-9f00-34e028ed53cb.", "type": "IpAddressGenerationFailure", "detail": ""}}
I'd expect the machine-controller to reuse already assigned but unused FIPs instead of requesting a new one.
Right now we maintain the kubelet config as part of the distro-specific templates. This has some drawbacks:
Instead it would be easier if we just used kubeadm join
to configure the Kubelet
At the moment tests replace desired fields in the manifest based on string matching. For example:
params = fmt.Sprintf("%s,<< MACHINE_NAME >>=%s,<< NODE_NAME >>=%s", params, machineName, nodeName)
params = fmt.Sprintf("%s,<< OS_NAME >>=%s,<< CONTAINER_RUNTIME >>=%s,<< CONTAINER_RUNTIME_VERSION >>=%s", params, testCase.osName, testCase.containerRuntime, testCase.containerRuntimeVersion)
we would like to change that by providing the field path for example spec.providerConfig. cloudProvider
. this would not only look better but would also allow to consume manifest under example
directory.
We should add the following test cases:
To be able to properly validate the machine-controller is working as intended, we need some kind of integration testing.
Because it is not possible to both test external PRs automatically and be sure they are not used to steal credentials, this script is not supposed to be executed automatically. Instead it will:
Integration the machine-controller based on the proposal
having a simple command line tool that would verify whether a node has been created serves not only as a good warm up exercise but also as a handy test tool.
the idea is that we would have a list of predefined machine manifests that would need some customisation in terms of credentials. The credentials could be accepted as a command line arguments and passed all the way down to the manifests. After POST'ing the given manifests to the kube-api
server the test tool would read the current cluster state in order to determine the correctness of machine-controller
the test tool would use the standard client-go
library to talk to the api
server and would read the kubeconfig
configuration file to discover where the cluster is actually located.
assumptions:
kube config
is accessiblefor example, running the following command: verify -input path_to_manifest -parameters key=value, key2=value
would print a machine "node-docker" has been crated
to stdout.
Whats missing:
Building a docker image
#56 apparently broke RBAC:
GET https://10.96.0.1:443/api/v1/configmaps?resourceVersion=13569859&timeoutSeconds=346&watch=true
I0203 01:42:59.303274 1 round_trippers.go:439] Response Status: 403 Forbidden in 1 milliseconds
I0203 01:42:59.303287 1 round_trippers.go:442] Response Headers:
I0203 01:42:59.303293 1 round_trippers.go:445] Content-Type: application/json
I0203 01:42:59.303298 1 round_trippers.go:445] X-Content-Type-Options: nosniff
I0203 01:42:59.303302 1 round_trippers.go:445] Content-Length: 277
I0203 01:42:59.303307 1 round_trippers.go:445] Date: Sat, 03 Feb 2018 01:42:59 GMT
I0203 01:42:59.303628 1 request.go:873] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"configmaps is forbidden: User \"system:serviceaccount:kube-system:machine-controller\" cannot watch configmaps at the cluster scope","reason":"Forbidden","details":{"kind":"configmaps"},"code":403}
What surprises me a little is that it watches all configmaps, shouldn't a watch on the cluster-info
configmap in the kube-public
namespace be enough?
Add MarshalJSON function to all configVar* types.
Also check if we can refactor the logic in the Unmarshall functions.
Add the following metrics:
for reference see also:
#159 (comment)
#159
#129
#178 (e2e tests modify manifest by providing a field selector)
As a user I want to be able to spin up worker nodes on vsphere that use CentOS as distribution.
Acceptance criteria:
We have 2 clusters on dev.kubermatic.io which cannot be deleted because the machine-controller is not able to delete the machines.
Logs:
kubectl -n cluster-dt56ds7tsb logs machine-controller-559788b7f9-89q9v
E0411 07:29:28.561133 1 machine.go:200] machine-kubermatic-dt56ds7tsb-gf4xr failed with: failed to delete machine at cloudprovider, due to instance not found
E0411 07:29:28.594842 1 machine.go:200] machine-kubermatic-dt56ds7tsb-d5pgz failed with: failed to delete machine at cloudprovider, due to instance not found
E0411 07:29:28.613675 1 machine.go:200] machine-kubermatic-dt56ds7tsb-64ql4 failed with: failed to delete machine at cloudprovider, due to instance not found
Current state:
On initial start, we check if a secret with a private ssh key exists.
If no secret is found, we generate a secret with a private key.
This ssh key will be later used when creating instances at cloud-providers.
This was made so the user does not have to specify a ssh public key in the machine-manifest, as some cloud providers require to specify a public key when creating a instance (digitalocean).
All public keys from the machine manifest are getting deployed via cloud-init.
Desired state:
The whole ssh key logic should be removed.
If a cloud provider requires a ssh key during instance creation:
It should be possible to set the AWS AMI id.
Right now if a node gets deleted for whatever reason (e.G. manually), the machine-controller will recreate it. This is fine, but results in the node not having an ownerRef
.
We need to parse the user given versions (kubelet & container runtime) to process them correctly in the end.
Especially so we can accept v1.9.2
and 1.9.2
as input.
Currently we require the kubelet version to have a leading v
but we dont require it for the container runtime version
We need to add a config variable for the securityGroups & should only create a security-group on AWS when none is defined. As a convenience/quickstart help.
Also we need to log this with a loglevel of 2 probably.
Right now the machine definition contains all access secrets to the cloud provider it is spawned on. This has two drawbacks:
Instead we want to move the cloudprovider secrets into an actual secret which is then referenced by machines.
To better know if PRs add bugs, we should add the existing end-to-end testing to the CircleCI-Pipeline.
This requires:
test-e2e
target to circleciWith the implementation of transient and terminal errors we now correctly set machine.status.errorReason
& machine.status.errorMessage
when the controller runs into a terminal error.
Transient errors though are not reported back. The only way to see those is by investigating the logs.
Instead if just logging, we should trigger a event which is attached to the machine.
In https://github.com/kubermatic/machine-controller/blob/master/cmd/controller/main.go#L211
We start the informerFactories in a separate go routine.
The informerFactory.Start(stopCh)
itself is non blocking.
This might be the reason i see sometimes the controller trying to create something what already exists.
Just the Lister doesn't have it yet.
Just observed that deleting a machine does delete the machine at the cloud provide but does not delete the kubernetes node, at least for Hetzner. Didn't try other provider so far.
First we need to have general Prometheus support being merged with #49.
See docs: https://godoc.org/github.com/heptiolabs/healthcheck#example-package--Metrics
Use the leaderelection component from the kubernetes go-client
Right now the e2e tests sometimes fail due to kubectl
trying to create the secret when its already there
We should probably just not use a secret in the e2e tests and instead just put the required credentials into the machine spec.
Doing a make machine-controller
results in a Binary owned by root.
Can we pass in the UID from the user and do a chown
in the build-container?
We use operatingSystem:coreos
.
As it got renamed to ContainerLinux
we should adapt
Current state:
On initial start, we check if a secret with a private ssh key exists.
If no secret is found, we generate a secret with a private key.
This ssh key will be later used when creating instances at cloud-providers.
This was made so the user does not have to specify a ssh public key in the machine-manifest, as some cloud providers require to specify a public key when creating a instance (aws).
All public keys from the machine manifest are getting deployed via cloud-init.
Desired state:
The controller should accept a path to a private key via a command line flag.
If the flag is specified and a valid key got found, this key should be taken.
If no flag was specified or the key was not found, the old logic with the secret should apply.
We should only create a security-group on Openstack when none is defined.
Also we need to log this with a loglevel of 2 probably
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.