Giter Club home page Giter Club logo

cluster-api-provider-baremetal's People

Contributors

asalkeld avatar bcrochet avatar danil-grigorev avatar derekhiggins avatar dhellmann avatar dtantsur avatar elfosardo avatar elmiko avatar honza avatar joelspeed avatar jupierce avatar kirankt avatar lobziik avatar longkb avatar mhrivnak avatar n1r1 avatar ondrejmular avatar openshift-bot avatar openshift-ci[bot] avatar openshift-merge-bot[bot] avatar openshift-merge-robot avatar rdoxenham avatar russellb avatar rwsu avatar sadasu avatar slintes avatar stbenjam avatar vrutkovs avatar wking avatar zaneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cluster-api-provider-baremetal's Issues

Mismatched dependencies for CAPB and openshift/cluster-api

openshift/installer@e8a2267 broke cluster-api-provider-baremetal vendoriong in openshift/installer

There's now some conflicts, openshift/cluster-api wants sigs.k8s.io/controller-runtime 0.1.0, but CAPB wants 0.1.1.

Solving failure: No versions of github.com/metal3-io/cluster-api-provider-baremetal met constraints:
	master: Could not introduce github.com/metal3-io/cluster-api-provider-baremetal (from https://github.com/openshift/cluster-api-provider-baremetal/pkg/apis)@master, as it has a dependency on sigs.k8s.io/controller-runtime with constraint ^0.1.1, which has no overlap with the following existing constraints:
	release-0.2 from github.com/openshift/[email protected]
	release-0.2 from github.com/openshift/[email protected]
	release-0.2 from github.com/openshift/[email protected]

make manifests returns "Error: no kustomization.yaml file under /tmp/kustomize-490795293/config"

The manifest target fails with

[rwsu@localhost-live cluster-api-provider-baremetal]$ make manifests
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
CRD manifests generated under '/home/rwsu/go/src/github.com/metal3-io/cluster-api-provider-baremetal/config/crds' 
RBAC manifests generated under 'config/rbac' 
webhook manifests generated under '/home/rwsu/go/src/github.com/metal3-io/cluster-api-provider-baremetal/config/webhook' directory
kustomize build config/ > provider-components.yaml
2020/02/03 13:54:50 Adding nameprefix and namesuffix to Namespace resource will be deprecated in next release.
echo "---" >> provider-components.yaml
cd vendor && kustomize build github.com/openshift/cluster-api/config >> ../provider-components.yaml
Error: no kustomization.yaml file under /tmp/kustomize-212061250/config
make: *** [Makefile:40: manifests] Error 1

[rwsu@localhost-live cluster-api-provider-baremetal]$ go version
go version go1.13.6 linux/amd64
[rwsu@localhost-live cluster-api-provider-baremetal]$ cat /etc/redhat-release 
Fedora release 31 (Thirty One)
[rwsu@localhost-live cluster-api-provider-baremetal]$ kustomize version
Version: {KustomizeVersion:1.0.11 GitCommit:8f701a00417a812558a7b785e8354957afa469ae BuildDate:2018-12-04T18:42:24Z GoOs:unknown GoArch:unknown}

Actuator should not delete Machine objects

https://bugzilla.redhat.com/show_bug.cgi?id=1868104

Currently, if a Host that has been provisioned by a Machine is deleted by an administrator, we respond by deleting the Machine object. However, this is inconsistent with the expectations of the Cluster API and the behaviour of other actuators when an underlying resource is externally deleted.

If the Host is deleted (or forcibly deprovisioned), the Machine should enter the Failed phase, but not be deleted.

If enabled, the MachineHealthcheck controller will delete any failed Machines, once openshift/machine-api-operator#688 is merged.

No indication that a Machine can't acquire a Host

If we create a Machine and there are no Hosts available to provision, the actuator will just leave the Machine in the "Provisioning" phase and it will keep looking for a Host. Currently there is no indication that there is no Host available and the Machine will never exit the Provisioning phase.

This means that if a user e.g. accidentally scales up their MachineSet to a size larger than the number of actual Hosts, scaling down again doesn't just delete the orphans but will by default delete provisioned Machines running real workloads at random. If you scale up by a large enough amount then this will end up deleting all of your workloads with high probability.

Instead, we should flag an error on the Machine, and clear it once a suitable Host is found. This will make the Machine among the first candidates for deletion when scaling down.

Machine->Host mapping is not necessarily unique over time

A Machine remembers the Host is has provisioned by its name and namespace.

The Machine object doesn't own a physical host per se; it owns a particular deployment to that Host (just as other actuators own a particular cloud VM deployment). It's fair to trust that with the Consumer reference set on the Host, no changes are happening to the deployment that are not commanded by the actuator. However there is one edge case in which the Consumer reference cannot protect us: if the Host is deleted and another created with the same name, the actuator will believe it still owns that Host and may attempt to reuse the Machine for a new deployment, which is a no-no.

Deleting a Host doesn't immediately affect Machine

When a BareMetalHost is marked for deletion, it doesn't enter the deleting provisioning state until it has been deprovisioned (if necessary). When a Host is deleted, we don't currently delete the Machine until the Host reaches that state. This means that if the Host was provisioned, the Machine will to all outward appearances seem healthy until the Host is fully deprovisioned. If the cluster is scaled down in the meantime (an obvious thing to do when removing a Host from inventory), the MachineSet will remove a random Machine rather than necessarily the one whose Host is going away.

Document downstream delete annotation

We need to document somewhere that in downstream, the annotation to put on a Machine when you want it to be deleted during a MachineSet scale-down is machine.openshift.io/cluster-api-delete-machine=yes.

BareMetalHost doesn't reconcile when a Machine config is specified with the BMH as the providerID

If a BMH CR is in Ready state and this is followed by creating a Machine CR that points to the BMH CR instance (using the ProviderID and the annotation of the BMH), the BMH CR doesn't get reconciled correctly to go from Ready to Provisioned state. Instead it stays in the "Ready" state and the machine provisioning fails.

I was expecting that the annotation and the provider ID link the BMH to the machine and in the next reconciliation loop of the BMH CR, the BMH should have provisioned correctly and the machine reconciliation should have said "Provisioning" and the corresponding node should have eventually gotten created.

Example:

Create a BMH:

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: ostest-master-3-bmc-secret
  namespace: openshift-machine-api
type: Opaque
data:
  username: YWRtaW4=
  password: cGFzc3dvcmQ=
---
apiVersion: metal3.io/v1alpha1
kind: BareMetalHost
metadata:
  name: ostest-master-3
  namespace: openshift-machine-api
spec:
  online: true
  bootMACAddress: 00:a3:19:55:91:56
  bmc:
    address: redfish+http://192.168.111.1:8000/redfish/v1/Systems/fec49389-ffb7-4c53-91ef-673a548c03b2
    credentialsName: ostest-master-3-bmc-secret
EOF

BMH will go into Ready state assuming the virsh xml is valid and the bootMacAddress and the bmcAddress are correct.

Now create a machine CR

cat <<EOF | oc apply -f -
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations: 
    metal3.io/BareMetalHost: openshift-machine-api/ostest-master-3
  labels:
    machine.openshift.io/cluster-api-cluster: ostest-pbcnb
    machine.openshift.io/cluster-api-machine-role: master
    machine.openshift.io/cluster-api-machine-type: master
  name: ostest-pbcnb-master-3
  namespace: openshift-machine-api
spec:
  providerID: baremetalhost:///openshift-machine-api/ostest-master-3
  providerSpec:
    value:
      hostSelector: {}
      image:
        checksum: http://172.22.0.3:6180/images/rhcos-46.82.202009222340-0-openstack.x86_64.qcow2/rhcos-46.82.202009222340-0-compressed.x86_64.qcow2.md5sum
        url: http://172.22.0.3:6180/images/rhcos-46.82.202009222340-0-openstack.x86_64.qcow2/rhcos-46.82.202009222340-0-compressed.x86_64.qcow2
      userData:
        name: master-user-data
EOF

The Machine CR shows status as Failed and the BMH CR stays in Ready state.

ignore externally provisioned hosts when fulfilling a machine

When a Machine is created CAPBM tries to find a host to fulfill it. That query filters hosts based on the BareMetalHost.Available() method, which does not take the externallyProvisioned flag into account. We should update CAPBM to ignore hosts with externallyProvisioned == true.

"make test" fails on older version of kustomize

"make test" prompts user to login when kustomize is v1.0.11.

$ make test
go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
CRD manifests generated under '/home/rwsu/go/src/github.com/metal3-io/cluster-api-provider-baremetal/config/crds' 
RBAC manifests generated under 'config/rbac' 
webhook manifests generated under '/home/rwsu/go/src/github.com/metal3-io/cluster-api-provider-baremetal/config/webhook' directory
kustomize build config/ > provider-components.yaml
2020/03/04 14:22:05 Adding nameprefix and namesuffix to Namespace resource will be deprecated in next release.
echo "---" >> provider-components.yaml
kustomize build vendor/github.com/openshift/cluster-api/config >> provider-components.yaml
Username for 'https://github.com':

The issue doesn't happen when kustomize is updated to latest version (v3.5.4).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.