Giter Club home page Giter Club logo

cluster-api-provider-vsphere's Issues

Move all yaml and startup scripts out of code or allow overrides

There are quite a few yaml definitions and startup scripts currently captured in code. This makes the development/debug cycle very onerous. It may be fine to keep these in code, but we should allow overrides via clusterctl or other CLI. The following are scripts and yaml definitions in the code:

  1. cluster-apiserver yaml in the main cluster-api repo at cluster-api/pkg/deployer/clusterapiservertemplate.go
  2. cni yaml in the vsphere provider repo at cluster-api-provider-vsphere/cloud/vsphere/templates.go
  3. master and node startup shell scripts in the provider repo at cluster-api-provider-vsphere/cloud/vsphere/templates.go

As the above described, there are yaml definitions in the code as well as in the input files to clusterctl. It's a bit of spaghetti and forces a rebuild and docker push to a registry to do a new round of debugging.

Proposal:

  • separate these non-go code and definitions out of the code into separate files.
  • If we chose to ensure there are defaults, we can package these files into the docker container
  • add command line parameters to the binaries for cluster-apiserver to load its yaml definition from a file.
  • add command line parameters (or add to provider-components.yaml) to clusterctl to load up the cni yaml definition and master/node startup scripts from files.

Add ability to log to a remote log server

Debugging the cluster api stack is very time consuming and very complex. If the system had the ability to log to a remote log server, it will significantly improve development productivity. It will also be helpful for users of the cluster apis. Not sure if glog can concurrently write to a remote server or we need another sidecar container to stream the logging to the server.

Feature: persist bootstrap cluster config in the event of crash

We can have a scenario where a single bootstrap cluster is created and never destroyed. In the event that this cluster goes down, there needs to be a persistent source of truth with which we can recreate the bootstrap cluster. Part of this issue is picking the persistent store. Some ideas are native vSphere store or an etcd cluster outside of kubernetes.

Deleting the cluster object takes too long and often fails

When deleting an existing target cluster, clusterctl usually takes 10s of minutes or simply times out trying the last step in the workflow -- delete cluster objects. The suspicion is that the cluster object has too many finalizers as shown in the snippet below,

apiVersion: cluster.k8s.io/v1alpha1
kind: Cluster
metadata:
  creationTimestamp: 2018-09-10T23:00:58Z
  deletionGracePeriodSeconds: 0
  deletionTimestamp: 2018-09-10T23:01:59Z
  finalizers:
  - cluster.cluster.k8s.io
  - cluster.cluster.k8s.io
  - cluster.cluster.k8s.io

The above is from the cluster object on the bootstrap cluster.

Add provider specific cluster controller for vsphere provider

The "API Endpoint" for the cluster ideally should be handled by the provider specific cluster controller. Without this we are dependent on the clusterctl client to drive the workflow as currently the cli updates the API Endpoint at the moment.

We need to add a cluster controller for the vsphere implementation and have that maintain the "API Endpoint" for the cluster

This topic has been discussed in the following places:
https://github.com/kubernetes-sigs/cluster-api/issues/158
kubernetes-sigs/cluster-api#467 (comment)

Bug: clusterctl using the minikube to bootstrap the external cluster fails

This bug involves the minikube vsphere PR to use Fusion/Workstation as the vm driver.

When I use minikube directly to create the bootstrap cluster and then call clusterctl, the cli is able to get passed applying the cluster api server onto it.

  1. minikube --vm-driver vmware --bootstrapper kubeadm start
  2. clusterctl create cluster ...

When I use clusterctl with no existing external cluster, it fails to create an external cluster with minikube.

I0821 10:00:10.499364   73247 minikube.go:46] Running: minikube [start --bootstrapper=kubeadm --vm-driver=vmware]
I0821 10:01:41.570315   73247 minikube.go:50] Ran: minikube [start --bootstrapper=kubeadm --vm-driver=vmware] Output: Starting local Kubernetes v1.9.3 cluster...
Starting VM...
Getting VM IP address...
Moving files into cluster...
Setting up certs...
Connecting to cluster...
Setting up kubeconfig...
Starting cluster components...
Kubectl is now configured to use the cluster.
Loading cached images from config file.
I0821 10:01:41.580665   73247 loader.go:357] Config loaded from file /var/folders/2d/23rfh_dx16s92sy13wzqdvxw0000gr/T/207938425
I0821 10:01:41.586088   73247 clusterdeployer.go:129] Applying Cluster API stack to external cluster
I0821 10:01:41.586242   73247 clusterdeployer.go:312] Applying Cluster API APIServer
I0821 10:01:42.222581   73247 clusterclient.go:381] Waiting for kubectl apply...
I0821 10:01:42.666698   73247 clusterclient.go:409] Waiting for Cluster v1alpha resources to become available...
I0821 10:01:42.683291   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1 404 Not Found in 14 milliseconds
I0821 10:01:42.684082   73247 request.go:1075] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0821 10:01:52.688428   73247 clusterclient.go:409] Waiting for Cluster v1alpha resources to become available...
I0821 10:01:52.689665   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1 404 Not Found in 1 milliseconds
I0821 10:01:52.690056   73247 request.go:1075] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0821 10:02:02.687728   73247 clusterclient.go:409] Waiting for Cluster v1alpha resources to become available...
I0821 10:02:02.689234   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1 404 Not Found in 1 milliseconds
I0821 10:02:02.689548   73247 request.go:1075] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0821 10:02:12.687086   73247 clusterclient.go:409] Waiting for Cluster v1alpha resources to become available...
I0821 10:02:12.688202   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1 404 Not Found in 1 milliseconds
I0821 10:02:12.688546   73247 request.go:1075] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0821 10:02:22.687213   73247 clusterclient.go:409] Waiting for Cluster v1alpha resources to become available...
I0821 10:02:22.688728   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1 404 Not Found in 1 milliseconds
I0821 10:02:22.688917   73247 request.go:1075] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0821 10:02:32.682933   73247 clusterclient.go:409] Waiting for Cluster v1alpha resources to become available...
I0821 10:02:32.694161   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1 200 OK in 11 milliseconds
I0821 10:02:32.697090   73247 clusterclient.go:422] Waiting for Cluster v1alpha resources to be listable...
I0821 10:02:32.730378   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/clusters 200 OK in 31 milliseconds
I0821 10:02:32.732530   73247 clusterdeployer.go:318] Applying Cluster API Provider Components
I0821 10:02:32.732554   73247 clusterclient.go:381] Waiting for kubectl apply...
I0821 10:02:33.114937   73247 clusterdeployer.go:134] Provisioning internal cluster via external cluster
I0821 10:02:33.114975   73247 clusterdeployer.go:136] Creating cluster object test1 on external cluster
I0821 10:02:33.139273   73247 round_trippers.go:436] POST https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/clusters 201 Created in 23 milliseconds
I0821 10:02:33.140314   73247 clusterdeployer.go:141] Creating master
I0821 10:02:33.155061   73247 round_trippers.go:436] POST https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines 201 Created in 14 milliseconds
I0821 10:02:33.156248   73247 clusterclient.go:433] Waiting for Machine vs-master-8x4mt to become ready...
I0821 10:02:33.165717   73247 round_trippers.go:436] GET https://192.168.218.129:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/vs-master-8x4mt 200 OK in 9 milliseconds

Minikube never creates the vm. This fails on both Mac w/ Fusion and Linux w/ Workstation.

Refactor template.go out of terraform provisioner

Now that we are adding a govmomi based implementation as well, the script templates as they stand today should be used by both govmomi and terraform implementations. Thus refactoring the template.go is needed to be consumable with both implementations

ci/cd: create pods/containers/scripts to kick off clusterctl tests in CI/CD

Based on what I've been told, we will likely kick off clusterctl tests by creating a pod on the existing kubernetes cluster in CI. We need the pods/containers/scripts that kick off clusterctl tests in CI written. The following are categories of tests we need written,

  1. clusterctl create cluster
  2. clusterctl delete cluster
  3. clusterctl validate cluster

Note, each of these tests must be able to kick off a test to validate each commandline params. This issue may actually be an epic as it is a large task.

Bug: clusterctl fail when using existing external cluster

Using the clusterctl from this repo fails when using an existing cluster. Using a nested ESX on Fusion on Mac, I attempted to deploy a cluster with the following command,

clusterctl create cluster --existing-bootstrap-cluster-kubeconfig ~/.kube/config -m machines.yaml -c cluster.yaml -p provider-components.yaml --provider vsphere -v 6

What I see is a constant loop waiting for the master to come up,

I0821 08:55:10.288137   70954 clusterdeployer.go:136] Creating cluster object test1 on external cluster
I0821 08:55:10.376143   70954 round_trippers.go:436] POST https://192.168.218.131:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/clusters 201 Created in 87 milliseconds
I0821 08:55:10.377509   70954 clusterdeployer.go:141] Creating master
I0821 08:55:10.474704   70954 round_trippers.go:436] POST https://192.168.218.131:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines 201 Created in 96 milliseconds
I0821 08:55:10.476449   70954 clusterclient.go:433] Waiting for Machine vs-master-9pmdb to become ready...
I0821 08:55:10.480937   70954 round_trippers.go:436] GET https://192.168.218.131:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/vs-master-9pmdb 200 OK in 4 milliseconds
I0821 08:55:20.480953   70954 clusterclient.go:433] Waiting for Machine vs-master-9pmdb to become ready...
I0821 08:55:20.576983   70954 round_trippers.go:436] GET https://192.168.218.131:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/vs-master-9pmdb 200 OK in 95 milliseconds
I0821 08:55:30.484846   70954 clusterclient.go:433] Waiting for Machine vs-master-9pmdb to become ready...
I0821 08:55:30.573614   70954 round_trippers.go:436] GET https://192.168.218.131:8443/apis/cluster.k8s.io/v1alpha1/namespaces/default/machines/vs-master-9pmdb 200 OK in 88 milliseconds
I0821 08:55:40.484931   70954 clusterclient.go:433] Waiting for Machine vs-master-9pmdb to become ready...

What I see from the machine actuator's logs (via kubectl log):

ERROR: logging before flag.Parse: I0821 15:55:34.303364       1 queue.go:38] Start NodeWatcher Queue
ERROR: logging before flag.Parse: I0821 15:55:34.305496       1 queue.go:38] Start Machine Queue
ERROR: logging before flag.Parse: I0821 15:55:34.385695       1 controller.go:91] Running reconcile Machine for vs-master-9pmdb
ERROR: logging before flag.Parse: I0821 15:55:34.475709       1 machineactuator.go:175] Attempting to stage tf state for machine vs-master-9pmdb
ERROR: logging before flag.Parse: I0821 15:55:34.475913       1 machineactuator.go:177] machine does not have annotations, state does not exist
ERROR: logging before flag.Parse: I0821 15:55:34.475995       1 machineactuator.go:658] Instance existance checked in directory
ERROR: logging before flag.Parse: I0821 15:55:34.476064       1 controller.go:134] reconciling machine object vs-master-9pmdb triggers idempotent create.
ERROR: logging before flag.Parse: I0821 15:55:34.672349       1 machineactuator.go:201] Cleaning up the staging dir for machine vs-master-9pmdb
ERROR: logging before flag.Parse: I0821 15:55:34.673281       1 machineactuator.go:175] Attempting to stage tf state for machine vs-master-9pmdb
ERROR: logging before flag.Parse: I0821 15:55:34.673299       1 machineactuator.go:177] machine does not have annotations, state does not exist
ERROR: logging before flag.Parse: I0821 15:55:34.673303       1 machineactuator.go:284] Staged for machine create at /tmp/cluster-api/machines/vs-master-9pmdb/

ci/cd: add software to CI to run cluster api with a minikube bootstrap cluster

To test clusterctl with a bootstrap cluster using minikube, we need some software installed on some machines in CI. Specifically,

  1. Vmware workstation
  2. minikube with the Vmware support built in (kubernetes/minikube#2606)
  3. the docker-machine driver for vmware (https://github.com/machine-drivers/docker-machine-driver-vmware/releases)

It doesn't matter if these components are installed on some static VMs or on the CI/CD kubernetes cluster nodes as long as we can run clusterctl on the machine that these components are installed on.

Note, this requirement is contingent on the minikube bootstrap workflow remaining in clusterctl, for which there are currently no discussions on removing.

Replace the terraform code with govmomi

Terraform calls govmomi underneath so using Terraform for the vSphere provider adds yet another layer in the provisioning stack that we need to debug. In addition, Terraform has it's own state file for their source of truth. We've found from previous projects that letting vSphere be the source of truth is much less problematic in the long run.

Feature: support zones

We've just added zone support in the vSphere cloud provider. We should add the ability to recognize zones when performing CRUD operation on a cluster.

Make Cluster object provide the status of the realized cluster

The current ClusterStatus object does not contain any high level field that someone could look at and tell if the target cluster is ready for consumption. Given that the very definition of ready is subjective. As we could claim ready potentially at different stages:

  1. master/s are ready and the APIs are up
  2. workers are all deployed and nodes are in ready state

This is an open issue to be discussed in the cluster-api SIG at this moment.
Given that, I propose to further enrich the ProviderStatus field in the Cluster to track part of this status. Given the definition of Ready for the cluster is not formalized, thus the proposal is to add APIStatus field in the ProviderStatus representing whether the kubernetes APIs are ready to be interacted with in the target cluster.

Create a docker container that contains clusterctl and minikube drivers

Currently, if users want to use clusterctl to create a cluster using minikube in the bootstrap workflow, requires a lot of manual steps just to get minikube working. There must be a way to create a docker container that has clusterctl and all the necessary minikube drivers.

This issue differs from #6. That issue is for creating a Dockerfile for a container to build clusterctl. This issue is to actually build a container for clusterctl and drivers.

Add vcsim tests for the machine actuator's create workflow

We can use vc sim (simulator) to run unit tests for the machine actuator. We need unit tests for the existing machine actuator. There may be some problems running Terraform against vc sim. We need to identify those issues and raise them in the vcsim repo.

Feature: isolated pod support

Kubernetes will support isolated pods (e.g. pod vm) in the future. There is a proposal for RuntimeClass that will support in-host isolated pods (e.g. Kata and gVisor). This may not be the only method of supporting RuntimeClass. We should support this feature when deploying a cluster to vSphere.

Feature: Add multiple nic support

Add support multiple nics on to the node machines.
This will help in setting the cluster in a way that the k8s API network and the data plane network could be possibly isolated

Feature: device support

Support VM nodes that have access to the host's devices (e.g. GPU). There is currently a proposal for ResourceClass that will allow this support.

Cloud provider config should not come from master config

The cloud provider configuration to be setup for the cluster today is rendered using the machine variables used for the master node. This assumption is not correct as the machine variables are there to provide information about where and what to deploy as a master node and not to serve as the input for the cloud provider config that the target kubernetes will use.
We need to clearly abstract out the information intended to be used for generating the cloud provider configuration for the cluster.
One possible place where this cloud provider configuration information could live is in the Cluster object's providerConfig. Currently it only contains the VC endpoint and credentials. Thus we would need to expand that definition.

Feature: custom cloud init scripts

The cloud init scripts are currently hardcoded in the template.go file. We can make those the default, but we should also allow users to pass in custom user-defined scripts.

Design and implement a central store to track clusters created by the cluster api

For enterprise use case, we want some central source of truth for information about clusters created with the cluster api.

  1. Identity of the clusters created via vSphere console
  2. Identity of the clusters created via clusterctl
  3. Retrieval of the kubeconfig for the cluster

Background:

Today, the Terraform provisioner keeps information about the cluster in various places, including config maps within kubernetes and the terraform state file. The cluster api has two mode of operations, pivot and non-pivot. In the pivot mode, a bootstrap cluster is created, that will then be used to deploy the target cluster. Afterwards, the bootstrap cluster is deleted, leaving information about the cluster in the target cluster itself. If multiple users create clusters using clusterctl, there is no central way to identify all the clusters created on the vCenter.

Options:

  1. Use vSphere tags to identify clusters, masters, machines
  • Machine actuator can retrieve IP for the masters and pull kubeconfig from the api server of specific cluster.
  1. Assume a bootstrap cluster always exist (a deployed OVA with the cluster) and keep all information in kubernetes objects in that cluster.
  2. A combination of 1 and 2. Use tags so vSphere console can quickly identify clusters. Use the bootstrap cluster's kubernetes objects for all other information.
  3. Implement a vSphere extension. This option may not be viable as it might get deprecated.

Feature: Support spreading nodes across multiple VC

When deploying a machineset, we should support specifying a different VC in the spec that overrides the VC specified in the cluster object. This way from the infrastructure point of view, we can spread the node VMs across VCs. In doing so that assumption is that the user deploying this cluster will make sure that the VMs created across the VC will have a proper L3 connectivity to each other.

Feature: give clusterctl the ability to query created clusters

Give clusterctl the ability to query a list of created cluster and also describe the cluster. Currently, clusterctl can create and delete the cluster. A user can create multiple clusters, but there is no way to query existing clusters and describe the configuration of the clusters from clusterctl.

Calico Manifest assumes Service IP range

From @krousey on June 11, 2018 16:57

A Calico manifest has a hard-coded service IP that makes an assumption that it falls within the cluster's service cidr. This assumption is made https://github.com/kubernetes-sigs/cluster-api/blob/7fdecc5cc4b4174ab5c540a027fcfccc7183f66f/cloud/vsphere/templates.go#L372 and https://github.com/kubernetes-sigs/cluster-api/blob/7fdecc5cc4b4174ab5c540a027fcfccc7183f66f/cloud/vsphere/templates.go#L492

This needs to be an address that falls withing the cluster's Service CIDR because it can be changed by the end-user here

/kind cluster-api-vsphere
/cc @karan

Copied from original issue: kubernetes-sigs/cluster-api#324

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.