metal3-io / cluster-api-provider-baremetal Goto Github PK

View Code? Open in Web Editor NEW

62.0 62.0 61.0 67.25 MB

DEPRECATED Please refer to https://github.com/metal3-io/cluster-api-provider-metal3

License: Apache License 2.0

Dockerfile 0.36% Makefile 3.74% Go 89.36% Shell 3.79% Python 2.49% Starlark 0.26%

cluster-api-provider-baremetal's People

Contributors

Stargazers

Watchers

cluster-api-provider-baremetal's Issues

[v1alpha2] refactor test suite for controllers folder

Currently the test suite for the controllers folder heavily relies on functionality from MachineManager and ClusterManager. We should decouple the concerns to do proper unit testing. So we should mock the two managers. We need to add an interface and a fakeManager factory for both managers

Add support for Cluster provisioning

This is probably more of an epic that needs to be broken down into smaller tasks, but this is a starting point.

We started cluster-api-provider-baremetal only supporting the Machine actuator interface.

Next we need to add optional support for the higher level cluster-api primitives that allow provisioning a cluster configured with kubeadm.

It would be nice to see what code could be shared with other providers in terms of the kubeadm integration. Each provider seems to have its own code for this, and some of this should be shared. There's also a node lifecycle workstream under cluster-api right now looking at these issues, among other things.

Another "works on x86_64 only"-project

Add addresses to Machine

We need to populate the addresses field of Machines.

The contents must match what shows up in the associated Node, but the information must come from elsewhere, as it's needed sooner.

This has a dependency on the baremetal-operator completing integration of host introspection. The host introspection data will contain IP and hostname information that we can then copy into the Machine.

[v1alpha2] Pivoting to target cluster

Once target cluster's controlplane is up and running, we should pivot the source cluster provider components (controllers, CRDs, CRs ) , CAPI objects, BMH etc to the target cluster so as to make the target cluster independent and autonomous for the rest of its life-cycle and make the ephimeral/bootstrap cluster node free. We should look into the pivoting carefully in baremetal scenario since we have extra objects to pivot (BMH, secrets) and also we need to keep the status of few of the cluster components.

[v1alpha2] Make setError functions generic

The value of Status.ErrorMessage and Status.ErrorReason need to be set in the controller code. To this end, four SetError* methods are created, one for each Object type. We need to centralize the error related code as it is easier to maintain.

To Do:

Create a folder for error related parts (much like cluster-api/errors/)
Define new error constants specific to BareMetalMachine and clusters
Define generic SetError and ClearError functions that all four objects in a generic way.

hack/ensure-xxx does not work properly

root@gyliu-dev1:~/go/src/github.com/metal3-io/cluster-api-provider-baremetal# ./hack/ensure-kind.sh
++ go env GOPATH
+ GOPATH_BIN=/root/go/bin/
+ MINIMUM_KIND_VERSION=v0.4.0
+ verify_kind_version
++ command -v kind
+ '[' -x /root/go/bin/kind ']'
+ local kind_version
++ kind version
+ kind_version='kind v0.6.0 go1.13.4 linux/amd64'
++ sort -s -t. -k 1,1 -k 2,2n -k 3,3n
++ echo -e 'v0.4.0\nkind v0.6.0 go1.13.4 linux/amd64'
++ head -n1
+ [[ v0.4.0 != kind v0.6.0 go1.13.4 linux/amd64 ]]
+ cat
Detected kind version: kind v0.6.0 go1.13.4 linux/amd64.
Requires v0.4.0 or greater.
Please install v0.4.0 or later.
+ return 2

I was using Kind 0.6.0 but it still report error.

only set image on hosts we want provisioned

We should only set the image fields on hosts that we want provisioned. That means if a host already has an image, we should not change the settings and if the host already has a machine reference (indicating that it was externally provisioned) we should not add an image.

actuator sets config drive secret on BareMetalHost

A "config drive" Secret will be present in the cluster, and it will contain data that the BareMetalHost needs in order to configure a provisioned host. For now the secret reference is always the same, so it can be hard-coded. In the future the data may be read from the Machine's provider spec.

This task is to ensure the actuator writes the values to corresponding fields on the BareMetalHost.

As an example, currently in openshift the secret that contains worker config data is showing up with this identity:

  name: worker-user-data
  namespace: openshift-machine-api

CAPB is missing support for -namespace flag

Pod errors out with:

[root@dell-r730-021 dev-scripts]# oc logs machine-api-controllers-fcd484fb5-qgzbh -n openshift-machine-api -c machine-controller
flag provided but not defined: -namespace
Usage of /machine-controller-manager:

Caused by openshift/machine-api-operator#315

Openstack hit the same issue: openshift/cluster-api-provider-openstack#47

Kubebuilder v2 migration

First step towards v1alpha2 we are trying to migrate current CAPBM to use Kubebuilder v2.

[v1alpha2] flake on TestAssociate for baremetal machine manager

The TestAssociate case is flaky :

 --- FAIL: TestAssociate (0.00s)
    baremetalmachine_manager_test.go:1620: ## TC-Associate machine, host empty, baremetal machine spec set ##
    baremetalmachine_manager_test.go:1636: requeue in: 30s
    baremetalmachine_manager_test.go:1620: ## TC-Associate empty machine, baremetal machine spec nil ##
    baremetalmachine_manager_test.go:1620: ## TC-Associate empty machine, baremetal machine spec set ##
    baremetalmachine_manager_test.go:1620: ## TC-Associate empty machine, host empty, baremetal machine spec set ##

Cloud-init handling in v1alpha2

This issue is related to #101.
Since we are working with physical nodes, we might have node-specific configuration (for example networking). This configuration is tightly tied to the physical node. However, the cloud-init configuration in v1alpha2 is coming from CABPK, hence the possible need for a reconciliation between the content generated by CABPK and the additional configuration related to the node itself, or to the setup of the control-plane load-balancer.
A possibility is to create an additional field containing the user configuration in BaremetalHost and merge its content with the cloud-init generated by CABPK in the CAPI-provider-baremetal controller to generate the final cloud-init to populate the userdata field in the BaremetalHost before provisioning.
We will submit a design document for discussion.

Add support for cluster-api v1alpha2

The current code only supports the Machine interface of the v1alpha1 API. At some point we should create a branch to maintain the v1alpha1 code, and evolve master to support the work happening in cluster-api for v1alpha2.

Quoting https://discuss.kubernetes.io/t/the-actuator-interfaces-have-gone-in-the-latest-source-code/7360/2

Hi, there are no interfaces for providers to implement any more. Instead, providers implement full, regular controllers that watch provider-specific custom resources. We have an open PR to describe the differences between v1alpha1 and v1alpha2 - please see kubernetes-sigs/cluster-api#1211. Also take a look at https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/proposals/20190610-machine-states-preboot-bootstrapping.md for more details on some of the design changes coming in v1alpha2.

Consider Setting ProviderID field on Machines

We current annotate Machines to identify the associated BareMetalHost:

apiVersion: v1
items:
- apiVersion: cluster.k8s.io/v1alpha1
  kind: Machine
  metadata:
    annotations:
      metal3.io/BareMetalHost: metal3/master-0
 ...

The spec for a Machine also contains a providerID field, which we do not currently set.

            providerID:
              description: ProviderID is the identification ID of the machine provided
                by the provider. This field must match the provider ID as seen on
                the node object corresponding to this machine. This field is required
                by higher level consumers of cluster-api. Example use case is cluster
                autoscaler with cluster-api as provider. Clean-up logic in the autoscaler
                compares machines to nodes to find out machines at provider which
                could not get registered as Kubernetes nodes. With cluster-api as
                a generic out-of-tree provider for autoscaler, this field is required
                by autoscaler to be able to have a provider view of the list of machines.
                Another list of nodes is queried from the k8s apiserver and then a
                comparison is done to find out unregistered machines and are marked
                for delete. This field will be set by the actuators and consumed by
                higher level entities like autoscaler that will be interfacing with
                cluster-api as generic provider.
              type: string

It seems that it would be appropriate for us to set this field with something similar to the BareMetalHost annotation.

Based on this description though, perhaps this should not be done until we can also ensure that the providerID field is set equivalently on Nodes.

Despite the written description here about how providerID is used on a Machine, I don't see any code that actually uses it (yet?).

Note that there's more strict documentation on the format of the providerID on a Node:

"providerID":    "ID of the node assigned by the cloud provider in the format: <ProviderName>://<ProviderSpecificNodeID>",

Unable to setup dev environment for cluster-api

My aim is to setup development environment for cluster api using baremetal provider. I am using minikube for this.

Following link https://github.com/metal3-io/cluster-api-provider-baremetal/blob/master/docs/dev/minikube.md

What I did

kubectl apply -k vendor/sigs.k8s.io/cluster-api/config/crds/
kubectl apply -f vendor/github.com/metal3-io/baremetal-operator/deploy/crds/metal3_v1alpha1_baremetalhost_crd.yaml
Followed all the steps for creation of baremetal host https://github.com/metal3-io/baremetal-operator/blob/master/docs/dev-setup.md

But following this link I am stuck at command operator-sdk up local --namespace=metal3

INFO[0000] Running the operator locally.
INFO[0000] Using namespace metal3.
{"level":"info","ts":1560334189.93631,"logger":"cmd","msg":"Go Version: go1.12.5"}
{"level":"info","ts":1560334189.9363344,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1560334189.9363453,"logger":"cmd","msg":"Version of operator-sdk: v0.4.0+git"}
{"level":"info","ts":1560334189.9372797,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1560334189.9372985,"logger":"leader","msg":"Skipping leader election; not running in a cluster."}
{"level":"info","ts":1560334189.9877634,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1560334189.9880195,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"metal3-baremetalhost-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1560334189.9881308,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"metal3-baremetalhost-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1560334189.9882035,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1560334190.0884702,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"metal3-baremetalhost-controller"}
{"level":"info","ts":1560334190.1887379,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"metal3-baremetalhost-controller","worker count":1}

After waiting for half an hour, I am still seeing this
Would be very helpful if someone could help me understand what can be done to overcome this issue.

Machine controller does not see if a chosen BareMetalHost is deleted

cluster-api-provider-baremetal was originally coded to assume that a BareMetalHost could not be deleted if it was associated with a Machine. There's a bug open to prevent this case in the baremetal-operator: metal3-io/baremetal-operator#34

The BareMetalHost spec is changing this reference from MachineRef to a more generic field, so it really shouldn't make Machine related behavior based on this field. Instead, I think we should just ensure we reconcile Machines on BareMetalHost changes to ensure we catch a Machine that no longer has its underlying BareMetalHost.

Machine deletion is broken

PR #86 changed the behavior of machine deletion. The intention was to have the machine actuator delay Machine deletion until the BareMetalHost was deprovisioned.

It does this by first clearing the image field on the BareMetalHost and then watching BMH host status until it later will clear the ConsumerRef field.

The problem with this approach is that the baremetal-operator treats a BMH with a ConsumerRef field set but no image as an "externally provisioned" host. As a result, machine deletion can never finish, as a deleted machine turns the BMH into an "externally provisioned" host.

See metal3-io/metal3-dev-env#26 for where this problem was originally reported.

[v1alpha3] study and migration

Cluster API v1alpha3 is on its way and we need to study what it brings along and how these changes are migrated to CAPBM.

Switch to upstream cluster-api

This repository currently uses openshift/cluster-api. We need to fix that and move to the upstream kubernetes-sigs/cluster-api repository instead.

The openshift/cluster-api-provider-baremetal fork can contain the changes necessary to use OpenShift's copy.

Update dev env instructions for upstream Kubernetes

The current dev env instructions only talk about developing against an OpenShift cluster. After #26 merges, we should fix the dev env instructions to reflect how to run against an upstream Kubernetes dev env.

Test crashes after chooseHost() fails due to live checksum lookup

go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
CRD manifests generated under '/home/rbryant/go/src/github.com/metalkube/cluster-api-provider-baremetal/config/crds' 
RBAC manifests generated under '/home/rbryant/go/src/github.com/metalkube/cluster-api-provider-baremetal/config/rbac' 
kustomize build config/default/ > provider-components.yaml
2019/03/22 14:20:31 Adding nameprefix and namesuffix to Namespace resource will be deprecated in next release.
echo "---" >> provider-components.yaml
cd vendor && kustomize build sigs.k8s.io/cluster-api/config/default/ >> ../provider-components.yaml
2019/03/22 14:20:31 Adding nameprefix and namesuffix to Namespace resource will be deprecated in next release.
go test ./pkg/... ./cmd/... -coverprofile cover.out
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/apis	[no test files]
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/apis/baremetal	[no test files]
ok  	github.com/metalkube/cluster-api-provider-baremetal/pkg/apis/baremetal/v1alpha1	10.400s	coverage: 23.7% of statements
2019/03/22 14:20:34 looking for checksum for http://172.22.0.1/images/redhat-coreos-maipo-latest.qcow2 at http://172.22.0.1/images/redhat-coreos-maipo-latest.qcow2.md5sum
--- FAIL: TestChooseHost (30.01s)
    actuator_test.go:121: Get http://172.22.0.1/images/redhat-coreos-maipo-latest.qcow2.md5sum: dial tcp 172.22.0.1:80: i/o timeout
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x158 pc=0xe7a24a]

goroutine 20 [running]:
testing.tRunner.func1(0xc000236700)
	/usr/lib/golang/src/testing/testing.go:792 +0x387
panic(0xf4fe20, 0x1aea420)
	/usr/lib/golang/src/runtime/panic.go:513 +0x1b9
github.com/metalkube/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine.TestChooseHost(0xc000236700)
	/home/rbryant/go/src/github.com/metalkube/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine/actuator_test.go:123 +0x7ea
testing.tRunner(0xc000236700, 0x10d37e8)
	/usr/lib/golang/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:878 +0x35c
FAIL	github.com/metalkube/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine	30.030s
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/controller	[no test files]
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/webhook	[no test files]
?   	github.com/metalkube/cluster-api-provider-baremetal/cmd/manager	[no test files]
make: *** [Makefile:17: unit] Error 1

Note this line that causes the failure:

    actuator_test.go:121: Get http://172.22.0.1/images/redhat-coreos-maipo-latest.qcow2.md5sum: dial tcp 172.22.0.1:80: i/o timeout

actuator needs to wait for host to be deprovisioned before returning from Delete()

In order to correctly fence hosts, we need to ensure that the Machine objects are not deleted until the host has been wiped, powered off, or both -- so that it is no longer trying to be a node.

It should be able to determine that the host is wiped by watching for its Status.Provisioning.Image field to have an empty image.

Today, the Machine controller appears to support returning a special type of error to indicate that a Machine needs to be requeued [1]. That works differently in the OpenShift version of the controller code [2], though, so we either need to update the copy of the cluster-api code used there or do something else.

[1] https://github.com/metal3-io/cluster-api-provider-baremetal/blob/master/vendor/sigs.k8s.io/cluster-api/pkg/controller/machine/controller.go#L172
[2] https://github.com/openshift/cluster-api-provider-baremetal/blob/master/vendor/github.com/openshift/cluster-api/pkg/controller/machine/controller.go#L206

LeaderElectionID should be unique

This issue was reported in other providers - Running several Cluster API pods (e.g. CAPI + CABPK + an infra provider) in the same namespace results in leader election errors.
More details here - kubernetes-retired/cluster-api-bootstrap-provider-kubeadm#271

Define release guidelines, and start publishing releases using semantic versioning

Currently, metal-3 sub-projects capbm and bmo do not publish releases.

We should adopt a releasing approach similar to that of CAPI and follow the well-known semantic versioning guidelines (see references below).

Besides this being a good practice, integration with the redesigned clusterctl will require that projects follow the above mentioned releasing strategy.

References:
https://github.com/kubernetes-sigs/cluster-api/blob/master/docs/developer/releasing.md
https://semver.org/

Add support for a hostSelector field for specifying BareMetalHost selection criteria

Particularly with bare metal hosts, there will be cases where administrators would like to provision a specific BareMetalHost instance. Our current behavior is simple: choose any available BareMetalHost.

To allow some selection criteria, we should add a hostSelector field (or something similar) which allows a selector expression on labels of BareMetalHost objects.

https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

This selector can either be specified directly on a Machine, or a MachineSet. This needs to reside in our providerSpec, so this PR is a prerequisite: #59

tests fail

I'm seeing this on the master branch.

$ make test
go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go crd
CRD files generated, files can be found under path /home/mhrivnak/golang/src/github.com/metalkube/cluster-api-provider-baremetal/config/crds.
kustomize build config/default/ > provider-components.yaml
2019/03/04 10:16:12 Adding nameprefix and namesuffix to Namespace resource will be deprecated in next release.
echo "---" >> provider-components.yaml
cd vendor && kustomize build github.com/openshift/cluster-api/config/default/ >> ../provider-components.yaml
2019/03/04 10:16:21 Adding nameprefix and namesuffix to Namespace resource will be deprecated in next release.
go test ./pkg/... ./cmd/... -coverprofile cover.out
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/apis	[no test files]
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/apis/baremetal	[no test files]
2019/03/04 10:16:23 failed to start the controlplane. retried 5 times: fork/exec /usr/local/kubebuilder/bin/etcd: no such file or directory
FAIL	github.com/metalkube/cluster-api-provider-baremetal/pkg/apis/baremetal/v1alpha1	0.011s
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine	[no test files]
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/controller	[no test files]
?   	github.com/metalkube/cluster-api-provider-baremetal/pkg/webhook	[no test files]
?   	github.com/metalkube/cluster-api-provider-baremetal/cmd/manager	[no test files]
make: *** [Makefile:15: test] Error 1

Consider adding nodeRef to Machine.Status

It may be a good idea to populate a machine's Status.NodeRef so that the MachineSet controller can consider each Machine to be ready.

https://github.com/kubernetes-sigs/cluster-api/blob/0.1.0/pkg/controller/machineset/status.go#L49-L64

Create actuator skeleton

First step - fill this repo with a skeleton for the actuator.

I've got a start in https://github.com/russellb/cluster-api-provider-baremetal

Use annotation to identify the associated BareMetalHost

The actuator currently uses a label on the BareMetalHost as the primary way of identifying which host is associated with a Machine.

This should be changed such that the primary source is an annotation on the Machine. If the annotation is not present for some reason, it should iterate over BareMetalHosts looking for a host with a matching MachineRef.

Match against Hosts ready to be provisioned

Once BareMetalHost provides an easy way to determine whether a given host is ready to be provisioned, it should be used as part of the Host selection criteria in the Machine actuator.

Makefile build process broken due to kustomize changes

The Makefile is not currently working and it appears to be due to changed behavior in customize. It will fail with a message like:

Error: unable to find one of 'kustomization.yaml', 'kustomization.yml' or 'Kustomization' in directory '/home/rbryant/go/src/github.com/metalkube/cluster-api-provider-baremetal/config/default'

PR #4 works around this by adding a new, simple "make build" target.

This can be fixed by moving config/default/kustomization.yaml to the config/ directory, and fixing paths. The problem then happens when we run kustomize against our vendored cluster-api.

cd vendor && kustomize build github.com/openshift/cluster-api/config/default/ >> ../provider-components.yaml
Error: rawResources failed to read Resources: Load from path ../crds/cluster_v1alpha1_cluster.yaml failed: security; file '../crds/cluster_v1alpha1_cluster.yaml' is not in or below '/tmp/kustomize-111684861/config/default'

So we'll need to rearrange the files there, first.

Implement Actuator Delete()

The Delete() method of the Actuator must be implemented. Roughly:

Look up the corresponding host. If none found, return success.
Remove the Machine reference from the BareMetalHost

controller-manager is in CrashLoopBackOff

[vagrant@metal3 go]$ kubectl get pods -n metal3
NAME                                                  READY   STATUS             RESTARTS   AGE
cluster-api-controller-manager-0                      1/1     Running            0          51m
cluster-api-provider-baremetal-controller-manager-0   1/2     CrashLoopBackOff   12         51m
metal3-baremetal-operator-6ffb74dfbb-864f6            3/3     Running            0          51m

[vagrant@metal3 go]$ kubectl logs -n metal3        cluster-api-provider-baremetal-controller-manager-0 -c manager
{"level":"info","ts":1559870504.1104558,"logger":"baremetal-controller-manager","msg":"Found API group metal3.io/v1alpha1"}
{"level":"info","ts":1559870504.8181705,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"machine-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1559870505.0203254,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"machine-controller"}
{"level":"info","ts":1559870505.1231196,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"machine-controller","worker count":1}
2019/06/07 01:21:45 Checking if machine centos exists.
2019/06/07 01:21:45 Machine centos exists.
2019/06/07 01:21:45 Updating machine centos .
E0607 01:21:45.325251       1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:573
/usr/local/go/src/runtime/panic.go:502
/usr/local/go/src/runtime/panic.go:63
/usr/local/go/src/runtime/signal_unix.go:388
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine/actuator.go:459
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine/actuator.go:421
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine/actuator.go:187
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/cluster-api/pkg/controller/machine/controller.go:208
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:2361
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xebd65f]

goroutine 125 [running]:
github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x107
panic(0x1122520, 0x1bd1470)
	/usr/local/go/src/runtime/panic.go:502 +0x229
github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine.(*Actuator).nodeAddresses(0xc4203596f0, 0xc4200b3200, 0xc4200a60f0, 0x2, 0xc420301240, 0x19, 0x0)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine/actuator.go:459 +0x5f
github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine.(*Actuator).updateMachineStatus(0xc4203596f0, 0x13953a0, 0xc4200480f0, 0xc42037edc0, 0xc4200b3200, 0x0, 0x0)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine/actuator.go:421 +0x39
github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine.(*Actuator).Update(0xc4203596f0, 0x13953a0, 0xc4200480f0, 0x0, 0xc42037edc0, 0x1, 0x0)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/pkg/cloud/baremetal/actuators/machine/actuator.go:187 +0x1d3
github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/cluster-api/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc420524b80, 0xc42035be00, 0x6, 0xc42035bdf0, 0x6, 0x1be40e0, 0x0, 0x0, 0x0)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/cluster-api/pkg/controller/machine/controller.go:208 +0x6e1
github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc4200e4000, 0x0)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215 +0x188
github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1()
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158 +0x36
github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc42024efc0)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc42024efc0, 0x3b9aca00, 0x0, 0x1, 0xc4202c8780)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbd
github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc42024efc0, 0x3b9aca00, 0xc4202c8780)
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
created by github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start
	/go/src/github.com/metal3-io/cluster-api-provider-baremetal/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:157 +0x3

Should we rename Metal3 provider ?

In the cluster API office hour, it was requested to change the name of this provider, as baremetal is confusing. Renaming it on CAPI side is rather trivial, as it is mostly documentation. However, should this change be propagated deeper, specifically in this repository, inc. the repo name, metal3-docs and the website (try-it.md in particular), quay.io images etc. ?

A first step could be to list all places where we use the CAPBM name and then decide to which level we should perform the change.

can't build v1alpha1 branch

have go 1.13.5 and export GO111MODULE=on and can't build the branch v1alpha1
(have placed the project in ~/go/src/github.com/metal3-io/cluster-api-provider-baremetal)

$(go mod init)
$(make) fails with error


go: github.com/fsnotify/[email protected] used for two different module paths (github.com/fsnotify/fsnotify and gopkg.in/fsnotify.v1) (

Now I have replaced in all files: gopkg.in/fsnotify -> github.com/fsnotify to get past that error and get a new one:

go: finding golang.org/x/sys v0.0.0-20190712062909-fae7ac547cb7
# k8s.io/client-go/rest
../../../../pkg/mod/k8s.io/[email protected]+incompatible/rest/request.go:598:31: not enough arguments in call to watch.NewStreamWatcher
        have (*versioned.Decoder)
        want (watch.Decoder, watch.Reporter)
make: *** [Makefile:8: build] Error 2

so a mismatch with go-client happened somewhere.

now go.sum says we are using several different versions of client-go , now that doesn't seem right.


k8s.io/client-go v0.0.0-20190228174230-b40b2a5939e4/go.mod h1:7vJpHMYJwNQCWgzmNV+VYUl1zCObLyodBc8nIyt8L5s=                                                                                                                                                                      
k8s.io/client-go v10.0.0+incompatible h1:F1IqCqw7oMBzDkqlcBymRq1450wD0eNqLE9jzUrIi34=                                                                                                                                                                                           
k8s.io/client-go v10.0.0+incompatible/go.mod h1:7vJpHMYJwNQCWgzmNV+VYUl1zCObLyodBc8nIyt8L5s=

Please instruct how to build this branch.

Integrate with OpenShift's machine API actuator e2e test suite

This came up in the following PR:

openshift/machine-api-operator#235

"FYI for e2e testing we are building a suite https://github.com/openshift/cluster-api-actuator-pkg/tree/master/pkg/e2e which validates OpenShift/machine-API conformance user stories regardless of the implementation details."

Do we have plan to create "Operator"

Do we have plan to create Operator to prepare bare metal cluster api environment ?

Research Machine selection when scaling down a MachineSet

When a MachineSet is scaled down, we may have additional criteria to help chose which Machine is the most appropriate one to Delete. We should determine what the current logic is, and if we should introduce some additional functionality where we can influence which Machine is chosen.

Actuator never sees BareMetalHosts if controller is started before CRD defined

This machine controller acts as a client of the BareMetalHost custom resource. If it happens to get started before the BareMetalHost CRD has been defined in the cluster, the machine-controller will silently fail. It produces no log messages about BareMetalHosts. When it tries to reconcile a machine, it will emit a message that no valid host was found.

When a cluster is in this state, if you restart the machine controller, it will then successfully see the BareMetalHosts and will be able to associate Machines and BareMetalHosts as it should.

Actuator using stale BareMetalHost data

In my last test, I observed a couple of things that implied the actuator using stale data about BareMetalHosts.

When the actuator was launched, I had no BareMetalHosts defined. I created them, but I had to restart the actuator to get it to see them.

After the restart, the actuator claimed the same BareMetalHost for all Machines. The annotation on each pointed to the same host.

metal3-io / cluster-api-provider-baremetal Goto Github PK

cluster-api-provider-baremetal's People

Contributors

Stargazers

Watchers

Forkers

cluster-api-provider-baremetal's Issues

Recommend Projects

Recommend Topics

Recommend Org