Giter Club home page Giter Club logo

metal3-dev-env's Introduction

Metal³ Development Environment

This repository includes scripts to set up a Metal³ development environment.

Build Status

Ubuntu dev env integration main build status CentOS dev env integration main build status

Instructions

Instructions can be found here: https://book.metal3.io/developer_environment/tryit

Quickstart

Version v1beta1 is later referred as v1betaX.

The v1betaX deployment can be done with Ubuntu 18.04, 20.04, 22.04 or Centos 9 Stream target host images. By default, for Ubuntu based target hosts we are using Ubuntu 22.04

Requirements

Dev env size

The requirements for the dev env machine are, when deploying Ubuntu target hosts:

  • 8GB of memory
  • 4 cpus

And when deploying Centos target hosts:

  • 16GB of memory
  • 4 cpus

The Minikube machine is deployed with 4GB of RAM, and 2 vCPUs, and the target hosts with 4 vCPUs and 4GB of RAM.

Environment variables

export CAPM3_VERSION=v1beta1
export CAPI_VERSION=v1beta1

The following environment variables need to be set for Centos:

export IMAGE_OS=centos

And the following environment variables need to be set for Ubuntu:

export IMAGE_OS=ubuntu

And the following environment variables need to be set for Flatcar:

export IMAGE_OS=flatcar

By default the virtualization hypervisor used is kvm. To be able to use it the nested virtualization needs to be enabled in the host. In case kvm or nested virtualization are not available it is possible to switch to qemu, although at this moment there are limitations in the execution and it is considered as experimental configuration. To switch to the qemu hypervisor apply the following setting:

export LIBVIRT_DOMAIN_TYPE=qemu

You can check a list of all the environment variables here

Deploy the metal3 Dev env

Note: These scripts are invasive and will reconfigure part of the host OS in addition to package installation, and hence it is recommended to run dev-env in a VM. Please read the scripts to understand what they do before running them on your machine.

./01_prepare_host.sh
./02_configure_host.sh
./03_launch_mgmt_cluster.sh
./04_verify.sh

or

make

Deploy the target cluster

./tests/scripts/provision/cluster.sh
./tests/scripts/provision/controlplane.sh
./tests/scripts/provision/worker.sh

Pivot to the target cluster

./tests/scripts/provision/pivot.sh

Delete the target cluster

kubectl delete cluster "${CLUSTER_NAME:-"test1"}" -n metal3

Deploying and developing with Tilt

It is possible to use Tilt to run the CAPI, BMO, CAPM3 and IPAM components. Tilt ephemeral cluster will utilize Kind and Docker, so it requires an Ubuntu host. For this, run:

By default, Metal3 components are not built locally. To develop with Tilt, you must export BUILD_[CAPM3|BMO|IPAM|CAPI]_LOCALLY=true, and then you can edit the code in ~/go/src/github.com/metal3-io/... and it will be picked up by Tilt. You can also specify repository URL, branch and commit with CAPM3REPO, CAPM3BRANCH and CAPM3COMMIT to make dev-env start the component with your development branch content. Same for IPAM, BMO and CAPI. See vars.md for more information.

After specifying the components and paths to your liking, bring the cluster up by setting the ephemeral cluster type to Tilt and image OS to Ubuntu.

export IMAGE_OS=ubuntu
export EPHEMERAL_CLUSTER="tilt"
make

If you are running tilt on a remote machine, you can forward the web interface by adding the following parameter to the ssh command -L 10350:127.0.0.1:10350

Then you can access the Tilt dashboard locally here

Note: It is easiest if you configure all these in config_<username>.sh file, which is automatically sourced if it exists.

Recreating local ironic containers

In case, you want recreate the local ironic containers enabled with TLS, you need to use the following instructions:

source lib/common.sh
source lib/network.sh

export IRONIC_HOST="${CLUSTER_BARE_METAL_PROVISIONER_HOST}"
export IRONIC_HOST_IP="${CLUSTER_BARE_METAL_PROVISIONER_IP}"

source lib/ironic_tls_setup.sh
source lib/ironic_basic_auth.sh

cd ${BMOPATH}
./tools/run_local_ironic.sh

Here ${BMOPATH} points to the baremetal operator directory. For more information, regarding the TLS setup and running ironic locally please refer to these documents: TLS , Run local ironic.

Test Matrix

The following table describes which branches are tested for different test triggers:

test suffix CAPM3 branch IPAM branch BMO branch/tag Keepalived tag Ironic tag
main main main main latest latest
release-1-7 release-1.7 release-1.7 release-0.6 v0.6.1 v24.1.1
release-1-6 release-1.6 release-1.6 release-0.5 v0.5.1 v24.0.0
release-1-5 release-1.5 release-1.5 release-0.4 v0.4.2 v23.1.0

metal3-dev-env's People

Contributors

adilghaffardev avatar alosadagrande avatar ardaguclu avatar derekhiggins avatar dhellmann avatar dtantsur avatar elfosardo avatar fmuyassarov avatar furkatgofurov7 avatar jaakko-os avatar jan-est avatar kashifest avatar lentzi90 avatar macaptain avatar maelk avatar markmc avatar mboukhalfa avatar metal3-io-bot avatar mikkosest avatar mquhuy avatar namnx228 avatar renovate-bot avatar renovate[bot] avatar rozzii avatar russellb avatar smoshiur1237 avatar stbenjam avatar sunnatillo avatar tuminoid avatar wgslr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metal3-dev-env's Issues

Validate the deployment

Adding a validation script would help catch issues in the deployment. Specially in CI, the metal3-dev-env deployment needs to be checked. The following points should probably be verified :

  • Running kubernetes cluster, reachable
  • Running baremetal operator
  • Running CAPI provider baremetal operator
  • CRDs defined
  • Host vms defined
  • Networking ok
  • Ironic containers running
  • baremetal hosts CRs created

We could create a make target to check the setup after deployment

Failed to pull quay.io/metal3-io/ironic when building from scratch

Build failed when running make from scratch:

+ for IMAGE_VAR in IRONIC_IMAGE IRONIC_INSPECTOR_IMAGE
+ IMAGE=quay.io/metal3-io/ironic
+ sudo podman pull quay.io/metal3-io/ironic
Trying to pull docker://quay.io/metal3-io/ironic...time="2019-05-29T12:13:59Z" level=error msg="Error pulling image ref //quay.io/metal3-io/ironic:latest: Error determining manifest MIME type for docker://quay.io/metal3-io/ironic:latest: Error reading manifest latest in quay.io/metal3-io/ironic: unknown: Tag latest was deleted or has expired. To pull, revive via time machine" 
Failed
(0x555b17585420,0xc4207c57c0)
Error: error pulling image "quay.io/metal3-io/ironic": Invalid image name "quay.io/metal3-io/ironic", expected colon-separated transport:reference
make: *** [configure_host] Error 125

Failed to host deprovisioning

Hi,

I ran into problem with host deprovisioning. After creating machine with create_machine.sh and deleting machine object with: kubectl delete machine centos -n metal3. The hosts provisioning status is like this.

NAME       STATUS   PROVISIONING STATUS                       
master-0   OK       externally provisioned             

When running:
kubectl logs -n metal3 pod/cluster-api-provider-baremetal-controller-manager-0 -c manager
Logs shows: 2019/07/02 08:06:44 Deleting machine centos and it is in loop and never finishing the delete.

It seems that controller is unable to delete that machine object. kubectl get machine -n metal3 still shows centos machine object.

I am using version CentOS 7 and version 3.10.0-957.21.3.el7.x86_64. Virtual machine is KVM with libvirtd (libvirt) 4.0.0.

Does anyone have the same problem or workaround with this issue?

Installation fails if an older version of kubectl exists previously

I am installing the dev environment on a Centos7 latest. It already had kubectl installed, version 1.5 which is the one available at official repo:

[alosadag@smc-master metal3-dev-env]$ rpm -qa | grep kube
kubernetes-client-1.5.2-0.7.git269f928.el7.x86_64

$ yum whatprovides kubectl
kubernetes-client-1.5.2-0.7.git269f928.el7.x86_64 : Kubernetes client tools
Repo        : extras
Matched from:
Filename    : /usr/bin/kubectl

Taking into account to 01_prepare_host.sh

if ! command -v kubectl 2>/dev/null ; then
    curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
    chmod +x kubectl
    sudo mv kubectl /usr/local/bin/.
fi

Kubectl won't be updated to latest version (1.16) which is causing not able to process some yamls via kubectl apply -f.

I would suggest to check version of the kubectl installed as well. In case it is older than latest, update to latest. Take into account that kubectl installed from rpm (at least on CentOS) is placed at /usr/bin/ instead of /usr/local/bin/

Running `make clean` and `make` after a failed deployment fails in yum update

CentOS 7 freshly installed. I've faced #136 so I've tried the workaround by commenting the init_minikube function in the 01 script. I've executed make clean then make and now it fails in the first yum update:

+++ '[' '!' -d /opt/metal3-dev-env ']'
++ sudo yum install -y libselinux-utils
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * base: mirror.linux.duke.edu
 * epel: mirror.es.its.nyu.edu
 * extras: mirror.linux.duke.edu
 * updates: repo1.ash.innoscale.net
7 packages excluded due to repository priority protections
Package libselinux-utils-2.5-14.1.el7.x86_64 already installed and latest version
Nothing to do
++ selinuxenabled
++ sudo setenforce permissive
++ sudo sed -i s/=enforcing/=permissive/g /etc/selinux/config
++ sudo yum -y update
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
 * base: mirror.linux.duke.edu
 * epel: mirror.es.its.nyu.edu
 * extras: mirror.linux.duke.edu
 * updates: mirror.linux.duke.edu
7 packages excluded due to repository priority protections
Resolving Dependencies
--> Running transaction check
---> Package oniguruma.x86_64 0:5.9.5-3.el7 will be updated
--> Processing Dependency: libonig.so.2()(64bit) for package: jq-1.6-1.el7.x86_64
---> Package oniguruma.x86_64 0:6.7.0-1.el7 will be an update
--> Finished Dependency Resolution
Error: Package: jq-1.6-1.el7.x86_64 (@epel)
           Requires: libonig.so.2()(64bit)
           Removing: oniguruma-5.9.5-3.el7.x86_64 (@epel)
               libonig.so.2()(64bit)
           Updated By: oniguruma-6.7.0-1.el7.x86_64 (delorean-master-testing)
              ~libonig.so.4()(64bit)
 You could try using --skip-broken to work around the problem
 You could try running: rpm -Va --nofiles --nodigest
make: *** [install_requirements] Error 1

Just in case, I've disabled both yum update in the centos_install_requirements.sh script and the 01 script finish successfully.

Introspection timing out on Ubuntu environment

The introspection times out on Ubuntu environment :

kubectl get bmh -n metal3
NAME       STATUS   PROVISIONING STATUS   CONSUMER   BMC                         HARDWARE PROFILE   ONLINE   ERROR
master-0   error    inspecting                       ipmi://192.168.111.1:6230                      true     Introspection timeout
worker-0   error    inspecting                       ipmi://192.168.111.1:6231                      true     Introspection timeout

No idea yet what causes it, investigating

Error: Failed to associate the BaremetalHost to the Metal3Machine

While I was setting metal3-dev-env to create target clusters based in CentOS 8 as base image I face this error with Machines:

NAME                           PROVIDERID                                      PHASE
centos8-controlplane-ghjx2     metal3://60914f8e-2a41-4bce-82c2-e652083aac3d   Failed
centos8-md-0-9fbd54c6d-6xqfr   metal3://4a06806b-413d-453d-95d1-d24f306840ba   Failed
centos8-md-0-9fbd54c6d-nlwd7   metal3://bfca3312-1860-488d-9bdc-7a3b58e5940f   Failed
centos8-md-0-9fbd54c6d-t4r7w   metal3://1e2d61e7-d44b-4eb4-9d90-a7df245165dc   Failed

I can say that I see them Running and suddenly become Failed after an uncertain period of time. However, it is not always like that. Sometimes only the control-plane or some workers. I can see from the capi-controller-manager:

[alosadag@eko4 metal3-dev-env]$ kubectl logs -f capi-controller-manager-664c75c4df-2qhnt -n capi-system

I0320 09:42:03.053773       1 machine_controller_noderef.go:53] controllers/Machine "msg"="Machine doesn't have a valid ProviderID yet" "cluster"="centos8" "machine"="centos8-md-0-9fbd54c6d-zkh9x" "namespace"="metal3" 
E0320 09:42:03.053834       1 machine_controller.go:232] controllers/Machine "msg"="Reconciliation for Machine asked to requeue" "error"="Infrastructure provider for Machine \"centos8-md-0-9fbd54c6d-zkh9x\" in namespace \"metal3\" is not ready, requeuing: requeue in 30s" "cluster"="centos8" "machine"="centos8-md-0-9fbd54c6d-zkh9x" "namespace"="metal3" 
I0320 09:42:03.066965       1 machine_controller_noderef.go:53] controllers/Machine "msg"="Machine doesn't have a valid ProviderID yet" "cluster"="centos8" "machine"="centos8-md-0-9fbd54c6d-zkh9x" "namespace"="metal3" 
E0320 09:42:03.069221       1 machine_controller.go:232] controllers/Machine "msg"="Reconciliation for Machine asked to requeue" "error"="Infrastructure provider for Machine \"centos8-md-0-9fbd54c6d-zkh9x\" in namespace \"metal3\" is not ready, requeuing: requeue in 30s" "cluster"="centos8" "machine"="centos8-md-0-9fbd54c6d-zkh9x" "namespace"="metal3" 

From my point of view everything looks like everything is running fine except the Machines error message:

============== cluster =================
NAME      PHASE
centos8   Provisioned
============ metal3cluster ==========
NAME      READY   ERROR   CLUSTER   ENDPOINT
centos8   true            centos8   map[host:192.168.111.249 port:6443]
================ bareMetalHost ================
NAME     STATUS   PROVISIONING STATUS   CONSUMER                     BMC                         HARDWARE PROFILE   ONLINE   ERROR
node-0   OK   ready                                              ipmi://192.168.111.1:6230   unknown            false
node-1   OK   provisioned           centos8-md-0-btx7x           ipmi://192.168.111.1:6231   unknown            true
node-2   OK   provisioned           centos8-controlplane-2gsvr   ipmi://192.168.111.1:6232   unknown            true
node-3   OK   provisioned           centos8-md-0-7fqjh           ipmi://192.168.111.1:6233   unknown            true
node-4   OK   provisioned           centos8-md-0-qd7fc           ipmi://192.168.111.1:6234   unknown            true
node-5   OK   ready                                              ipmi://192.168.111.1:6235   unknown            true
=============== Metal3Machine ===============
NAME                         PROVIDERID                                      READY   CLUSTER   PHASE
centos8-controlplane-2gsvr   metal3://60914f8e-2a41-4bce-82c2-e652083aac3d   true    centos8
centos8-md-0-7fqjh           metal3://4a06806b-413d-453d-95d1-d24f306840ba   true    centos8
centos8-md-0-btx7x           metal3://bfca3312-1860-488d-9bdc-7a3b58e5940f   true    centos8
centos8-md-0-qd7fc           metal3://1e2d61e7-d44b-4eb4-9d90-a7df245165dc   true    centos8
=================== Machines ===================
NAME                           PROVIDERID                                      PHASE
centos8-controlplane-ghjx2     metal3://60914f8e-2a41-4bce-82c2-e652083aac3d   Failed
centos8-md-0-9fbd54c6d-6xqfr   metal3://4a06806b-413d-453d-95d1-d24f306840ba   Failed
centos8-md-0-9fbd54c6d-nlwd7   metal3://bfca3312-1860-488d-9bdc-7a3b58e5940f   Failed
centos8-md-0-9fbd54c6d-t4r7w   metal3://1e2d61e7-d44b-4eb4-9d90-a7df245165dc   Failed
=============== Machinedeployment ===============
NAME           PHASE     REPLICAS   AVAILABLE   READY
centos8-md-0   Running   3          3           3
 =============== Metal3MachineTemplate ===============
NAME                   AGE
centos8-controlplane   15h
centos8-md-0           14h
============= kubeAdmConfigTemplate =============
NAME           AGE
centos8-md-0   14h
================= kubeAdmConfig =================
NAME                         AGE
centos8-controlplane-7wrcb   15h
centos8-md-0-9qqfh           10h
centos8-md-0-bhq65           11h
centos8-md-0-dg6fx           10h

Also, the target cluster looks OK to me:

[centos@node-2 ~]$ kubectl  get nodes -o wide
NAME     STATUS   ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
node-1   Ready    <none>   10h   v1.17.4   192.168.111.21   <none>        CentOS Linux 8 (Core)   4.18.0-147.3.1.el8_1.x86_64   docker://18.9.1
node-2   Ready    master   15h   v1.17.4   192.168.111.22   <none>        CentOS Linux 8 (Core)   4.18.0-147.3.1.el8_1.x86_64   docker://18.9.1
node-3   Ready    <none>   10h   v1.17.4   192.168.111.23   <none>        CentOS Linux 8 (Core)   4.18.0-147.3.1.el8_1.x86_64   docker://18.9.1
node-4   Ready    <none>   10h   v1.17.4   192.168.111.24   <none>        CentOS Linux 8 (Core)   4.18.0-147.3.1.el8_1.x86_64   docker://18.9.1

I do not know what implications has this Machine objects to be in failed state. Also I am not sure if it can be related to run the target cluster with CentOS8 instead of CentOS 7.

No DEPLOY_KERNEL_URL variable set deploying with kustomize

After using Kustomize to deploy kubectl apply -k deploy/ the baremetal-operator container goes into CLBO with the error Cannot start: No DEPLOY_KERNEL_URL variable set I adjusted the command in the pod to dump the env before trying to start /baremetal-operator and confirmed that the environment variable is indeed being set correctly in the container.
command: ['sh', '-c', 'echo The app is running! && env && /baremetal-operator']

Looking for suggestions while in parallel reviewing the baremetal-operator code to see why it can't see the environment variable.

Kubectl version information for reference:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:11:03Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.8", GitCommit:"211047e9a1922595eaa3a1127ed365e9299a6c23", GitTreeState:"clean", BuildDate:"2019-10-15T12:02:12Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}

Control-plane creation error when using CentOS 8 as IMAGE_NAME_CENTOS

I have been testing the official cloud CentOS 8 image as IMAGE_NAME_CENTOS
in metal3-dev-env without success.

Some issues/ facts:

  • I am using cloud image CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2
  • CentOS 8 ships cloud-init 18.5, so it is the same version as the centos-update.qcow2 that works perfectly OK.
  • Found some issues when moving from CentOS7 to 8 with docker dependencies. They were addressed by adding --nobest option to yum (worth researching cri-o or containerd support in kubeadm. Probably once we get this fixed with docker):
    preKubeadmCommands:
      - ifup eth1
      - yum update -y
      - yum install yum-utils -y
      - yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
      - setenforce 0
      - sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
      - yum install docker-ce docker-ce-cli --disableexcludes=kubernetes --nobest -y
      - >-
        yum install gcc kernel-headers kernel-devel keepalived
        device-mapper-persistent-data lvm2
        kubelet kubeadm kubectl --disableexcludes=kubernetes -y
      - usermod -aG docker centos
      - systemctl enable --now docker keepalived kubelet

Then everytime I run the script provision_controlplane.sh I always stuck at the fact that the /tmp/kubeadm.yaml file is not created in the control-plane VM. So I understand that kubeadm init command cannot be executed and provisioning is not finished.

From the cloud-init logs I can see:

2020-03-18 16:31:01,467 - util.py[WARNING]: Running module write-files (<module 'cloudinit.config.cc_write_files' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_write_files.py'>) failed
2020-03-18 16:31:01,467 - util.py[DEBUG]: Running module write-files (<module 'cloudinit.config.cc_write_files' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_write_files.py'>) failed
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 1419, in chownbyname
    uid = pwd.getpwnam(user).pw_uid
KeyError: "getpwnam(): name not found: 'centos'"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 813, in _run_modules
    freq=freq)
  File "/usr/lib/python3.6/site-packages/cloudinit/cloud.py", line 54, in run
    return self._runners.run(name, functor, args, freq, clear_on_fail)
  File "/usr/lib/python3.6/site-packages/cloudinit/helpers.py", line 187, in run
    results = functor(*args)
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_write_files.py", line 82, in handle
    write_files(name, files)
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_write_files.py", line 122, in write_files
    util.chownbyname(path, u, g)
  File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 1423, in chownbyname
    raise OSError("Unknown user or group: %s" % (e))
OSError: Unknown user or group: "getpwnam(): name not found: 'centos'"
...
...
...
Complete!
Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /usr/lib/systemd/system/docker.service.
Created symlink /etc/systemd/system/multi-user.target.wants/keepalived.service → /usr/lib/systemd/system/keepalived.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kubelet.service → /usr/lib/systemd/system/kubelet.service.
unable to read config from "/tmp/kubeadm.yaml" : open /tmp/kubeadm.yaml: no such file or directory
To see the stack trace of this error execute with --v=5 or higher
cp: cannot stat '/etc/kubernetes/admin.conf': No such file or directory
chown: cannot access '/home/centos/.kube/config': No such file or directory
Cloud-init v. 18.5 running 'modules:final' at Wed, 18 Mar 2020 16:31:03 +0000. Up 11.11 seconds.
2020-03-18 16:34:36,605 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
2020-03-18 16:31:03,463 - util.py[DEBUG]: Running command ['/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=False, capture=False)
2020-03-18 16:34:36,605 - util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
2020-03-18 16:34:36,606 - util.py[DEBUG]: Failed running /var/lib/cloud/instance/scripts/runcmd [1]
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 876, in runparts
    subp(prefix + [exe_path], capture=False)
  File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 2068, in subp
    cmd=args)
cloudinit.util.ProcessExecutionError: Unexpected error while running command.
Command: ['/var/lib/cloud/instance/scripts/runcmd']
Exit code: 1
Reason: -
Stdout: -
Stderr: -
2020-03-18 16:34:36,608 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)

2020-03-18 16:34:36,609 - handlers.py[DEBUG]: finish: modules-final/config-scripts-user: FAIL: running config-scripts-user with frequency once-per-instance
2020-03-18 16:34:36,609 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
2020-03-18 16:34:36,609 - util.py[DEBUG]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 813, in _run_modules
    freq=freq)
  File "/usr/lib/python3.6/site-packages/cloudinit/cloud.py", line 54, in run
    return self._runners.run(name, functor, args, freq, clear_on_fail)
  File "/usr/lib/python3.6/site-packages/cloudinit/helpers.py", line 187, in run
    results = functor(*args)
  File "/usr/lib/python3.6/site-packages/cloudinit/config/cc_scripts_user.py", line 45, in handle
    util.runparts(runparts_path)
  File "/usr/lib/python3.6/site-packages/cloudinit/util.py", line 883, in runparts
    % (len(failed), len(attempted)))
RuntimeError: Runparts: 1 failures in 1 attempted commands

2020-03-18 16:34:36,608 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
2020-03-18 16:34:36,609 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3.6/site-packages/cloudinit/config/cc_scripts_user.py'>) failed
Cloud-init v. 18.5 finished at Wed, 18 Mar 2020 16:34:36 +0000. Datasource DataSourceConfigDrive [net,ver=2][source=/dev/sda2].  Up 224.45 seconds
  • Any ideas, why the kubeadm.yaml file is not copied?
  • Who is responsible for copying it?

Since the centos-update.qcow2 which is located in the nordix CI runs successfully I tried with CentOS7 cloud image + cloud-init 18.5. However, the result is similar to the CentOS 8 image: missing file /tmp/kubeadm.yaml.

  • So I am wondering what changes were applied to the centos-update image? Anyone has that info documented?

Freeze kubeadm, kubelet and kubectl version

Provisioning a control plane is failing for the reason shown below.
Currently, we need to make changes such as this in order to start using the new version.

However, this does not make the metal3-dev-env statble.

Possible solution.

  • Install specific versions of kubeadm, kubelet and kubectl (instead of latest).
  • Verify that irrespective of new kuberenetes release the environment is usable
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR KubeletVersion]: the kubelet version is higher than the control plane version. This is not a supported version skew and may lead to a malfunctional cluster. Kubelet version: "1.18.0" Control plane version: "1.17.0"
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
cp: cannot stat '/etc/kubernetes/admin.conf': No such file or directory

Ubuntu target nodes networking setup fails

When deploying the Ubuntu target cluster, the networking setup done with netplan fails with the following error :

/etc/netplan/60-ironicendpoint.yaml:6:20: Error in network definition: ironicendpoint: interface 'enp1s0' is not defined
      interfaces: [enp1s0]

/assign @kashifest
/kind bug
/priority important-soon

CentOS target cluster deployment fails

#273 broke the CentOS deployment of a CAPI v1alpha3 cluster. It now fails with :

systemctl enable --now docker keepalived kubelet
Failed to enable unit: Unit file /etc/systemd/system/kubelet.service is masked.

/assign @Xenwar

PROVISIONING_NETMASK is not calculated

ipcalc has no flag named --netmask, at least for the version running on Ubuntu 18.04.1 LTS. This results in PROVISIONING_NETMASK not being calculated properly.

+++ ipcalc --netmask 172.22.0.0/24
+++ cut -d= -f2
++ export 'PROVISIONING_NETMASK=Unknown option: --netmask'
++ PROVISIONING_NETMASK='Unknown option: --netmask'

Alternative is to just write a simple bash function to do this calculation. I'll send a PR for that.

OS: Ubuntu 18.04.1 LTS

metal3-baremetal-operator pod does not successfully deploy on Ubuntu 18.04

When building Metal3 on Ubuntu 18.04 using the latest master, the metal3-baremetal-operator does not come up:

kubectl get pods -A

NAMESPACE     NAME                                                  READY   STATUS             RESTARTS   AGE
kube-system   coredns-6955765f44-qnh9c                              1/1     Running            1          30m
kube-system   coredns-6955765f44-wrcnf                              1/1     Running            1          30m
kube-system   etcd-minikube                                         1/1     Running            1          30m
kube-system   kube-addon-manager-minikube                           1/1     Running            1          30m
kube-system   kube-apiserver-minikube                               1/1     Running            2          30m
kube-system   kube-controller-manager-minikube                      1/1     Running            1          30m
kube-system   kube-proxy-vb6qv                                      1/1     Running            1          30m
kube-system   kube-scheduler-minikube                               1/1     Running            1          30m
kube-system   storage-provisioner                                   1/1     Running            1          30m
metal3        cluster-api-controller-manager-0                      1/1     Running            0          9m15s
metal3        cluster-api-provider-baremetal-controller-manager-0   2/2     Running            0          9m15s
metal3        metal3-baremetal-operator-5cbbd7b87d-rdvkk            5/6     CrashLoopBackOff   6          9m14s

Logs show that the ironic-dnsmasq container fails to start because it can't bind to port 53:

kubectl -n metal3 logs metal3-baremetal-operator-5cbbd7b87d-rdvkk -c ironic-dnsmasq

Waiting for eth2 interface to be configured
/bin/rundnsmasq: line 18: python: command not found

dnsmasq: failed to create listening socket for port 53: Address already in use

and indeed this is already in use by another dnsmasq process:

sudo ss -talunp | grep ":53 "

udp   UNCONN  0        0                                   192.168.111.1:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=30240,fd=5))
udp   UNCONN  0        0                                   192.168.122.1:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=13358,fd=5))
udp   UNCONN  0        0                                       127.0.0.1:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=13169,fd=6))
udp   UNCONN  0        0                                     10.67.17.59:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=13169,fd=4))
udp   UNCONN  0        0                                   127.0.0.53%lo:53                                          0.0.0.0:*                                   users:(("systemd-resolve",pid=560,fd=12))
udp   UNCONN  0        0                                           [::1]:53                                             [::]:*                                   users:(("dnsmasq",pid=13169,fd=10))
udp   UNCONN  0        0           [fe80::aa71:f0aa:4bf4:3d68]%enp0s31f6:53                                             [::]:*                                   users:(("dnsmasq",pid=13169,fd=8))
tcp   LISTEN  0        32                                  192.168.111.1:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=30240,fd=6))
tcp   LISTEN  0        32                                  192.168.122.1:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=13358,fd=6))
tcp   LISTEN  0        32                                      127.0.0.1:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=13169,fd=7))
tcp   LISTEN  0        32                                    10.67.17.59:53                                          0.0.0.0:*                                   users:(("dnsmasq",pid=13169,fd=5))
tcp   LISTEN  0        128                                 127.0.0.53%lo:53                                          0.0.0.0:*                                   users:(("systemd-resolve",pid=560,fd=13))
tcp   LISTEN  0        32                                          [::1]:53                                             [::]:*                                   users:(("dnsmasq",pid=13169,fd=11))
tcp   LISTEN  0        32          [fe80::aa71:f0aa:4bf4:3d68]%enp0s31f6:53                                             [::]:*                                   users:(("dnsmasq",pid=13169,fd=9))

The baremetalhosts remain indefinitely in Provisioning Status "inspecting":

kubectl get baremetalhosts -A

NAMESPACE   NAME     STATUS   PROVISIONING STATUS   CONSUMER   BMC                         HARDWARE PROFILE   ONLINE   ERROR
metal3      node-0   OK       inspecting                       ipmi://192.168.111.1:6230                      true     
metal3      node-1   OK       inspecting                       ipmi://192.168.111.1:6231                      true 

If I revert to an earlier commit (e.g 127e640) then everything deploys just fine. I haven't investigated the subsequent changes in any depth but I am thinking it looks like dnsmasq is now getting deployed at some earlier point in the installation process, thus breaking the metal3-baremetal-operator pod?

Development environment on Ubuntu

I'm thinking about a dev-env which can run on Ubuntu, so we will have more OS choices. Besides, this dev-env should be in Ansible playbooks only, not playbooks call bash-script and bash-scripts call playbook, so it will be easier to maintain and add more feature to it.
What do you think about this ?

Support for running it on GCP

I followed the steps to create a GCP VM instance using 8vCPU,Intel Haswell,32GB RAM with nested Virtualization enabled.
https://cloud.google.com/compute/docs/instances/enable-nested-virtualization-vm-instances
http://www.brianlinkletter.com/enable-nested-virtualization-on-google-cloud/

However the Ansible playbook failed with below error :
"stderr": "(1): Command failed: Fail to establish a connection with libvirt URI "qemu+ssh://[email protected]/system?&keyfile=/root/.ssh/id_rsa_virt_power&no_verify=1&no_tty=1". Error: Cannot recv data: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).: Connection reset by peer",

I think the passwordless root access is mandatory ? Isnt there a way to execute it if passwordless root access is not there and just using the sudo ?

Error provisioning the default dev env

I'm dealing with an error on this repo provisioning the default environment.

Environment:

CentOS Linux release 7.6.1810 (Core)

TL;DR Error:

RBAC manifests generated under '/home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal/config/rbac' 
kustomize build config/ > provider-components.yaml
Error: json: cannot unmarshal string into Go struct field Kustomization.patches of type types.Patch
make[1]: *** [manifests] Error 1
make[1]: Leaving directory `/home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal'
make: *** [launch_mgmt_cluster] Error 2

How to reprooduce it:

git clone https://github.com/metal3-io/metal3-dev-env.git
cd metal3-dev-env
make

Full Error Log (Fails on script 03, the others works fine):

./03_launch_mgmt_cluster.sh
+ source lib/logging.sh
+++ dirname ./03_launch_mgmt_cluster.sh
++ LOGDIR=./logs
++ '[' '!' -d ./logs ']'
+++ basename ./03_launch_mgmt_cluster.sh .sh
+++ date +%F-%H%M%S
++ LOGFILE=./logs/03_launch_mgmt_cluster-2019-07-23-091112.log
++ echo 'Logging to ./logs/03_launch_mgmt_cluster-2019-07-23-091112.log'
Logging to ./logs/03_launch_mgmt_cluster-2019-07-23-091112.log
++ exec
+++ tee ./logs/03_launch_mgmt_cluster-2019-07-23-091112.log
+ source lib/common.sh
+++ go env
++ eval 'GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jparrill/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jparrill/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build469136617=/tmp/go-build -gno-record-gcc-switches"'
+++ GOARCH=amd64
+++ GOBIN=
+++ GOCACHE=/home/jparrill/.cache/go-build
+++ GOEXE=
+++ GOFLAGS=
+++ GOHOSTARCH=amd64
+++ GOHOSTOS=linux
+++ GOOS=linux
+++ GOPATH=/home/jparrill/go
+++ GOPROXY=
+++ GORACE=
+++ GOROOT=/usr/lib/golang
+++ GOTMPDIR=
+++ GOTOOLDIR=/usr/lib/golang/pkg/tool/linux_amd64
+++ GCCGO=gccgo
+++ CC=gcc
+++ CXX=g++
+++ CGO_ENABLED=1
+++ GOMOD=
+++ CGO_CFLAGS='-g -O2'
+++ CGO_CPPFLAGS=
+++ CGO_CXXFLAGS='-g -O2'
+++ CGO_FFLAGS='-g -O2'
+++ CGO_LDFLAGS='-g -O2'
+++ PKG_CONFIG=pkg-config
+++ GOGCCFLAGS='-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build469136617=/tmp/go-build -gno-record-gcc-switches'
++++ dirname lib/common.sh
+++ cd lib
+++ pwd
++ SCRIPTDIR=/home/jparrill/projects/kni/repos/metal3-dev-env/lib
+++ whoami
++ USER=jparrill
++ '[' -z '' ']'
++ '[' '!' -f /home/jparrill/projects/kni/repos/metal3-dev-env/lib/../config_jparrill.sh ']'
++ CONFIG=/home/jparrill/projects/kni/repos/metal3-dev-env/lib/../config_jparrill.sh
++ source /home/jparrill/projects/kni/repos/metal3-dev-env/lib/../config_jparrill.sh
++ ADDN_DNS=
++ EXT_IF=
++ PRO_IF=
++ MANAGE_BR_BRIDGE=y
++ MANAGE_PRO_BRIDGE=y
++ MANAGE_INT_BRIDGE=y
++ INT_IF=
++ ROOT_DISK_NAME=/dev/sda
++ export EXTERNAL_SUBNET=192.168.111.0/24
++ EXTERNAL_SUBNET=192.168.111.0/24
++ export SSH_PUB_KEY=/home/jparrill/.ssh/id_rsa.pub
++ SSH_PUB_KEY=/home/jparrill/.ssh/id_rsa.pub
++ FILESYSTEM=/
++ WORKING_DIR=/opt/metal3-dev-env
++ NODES_FILE=/opt/metal3-dev-env/ironic_nodes.json
++ NODES_PLATFORM=libvirt
++ export NUM_MASTERS=1
++ NUM_MASTERS=1
++ export NUM_WORKERS=1
++ NUM_WORKERS=1
++ export VM_EXTRADISKS=false
++ VM_EXTRADISKS=false
++ export IRONIC_IMAGE=quay.io/metal3-io/ironic:master
++ IRONIC_IMAGE=quay.io/metal3-io/ironic:master
++ export IRONIC_INSPECTOR_IMAGE=quay.io/metal3-io/ironic-inspector
++ IRONIC_INSPECTOR_IMAGE=quay.io/metal3-io/ironic-inspector
++ export IRONIC_DATA_DIR=/opt/metal3-dev-env/ironic
++ IRONIC_DATA_DIR=/opt/metal3-dev-env/ironic
++ export LIBVIRT_DEFAULT_URI=qemu:///system
++ LIBVIRT_DEFAULT_URI=qemu:///system
++ '[' jparrill '!=' root -a /run/user/1000 == /run/user/0 ']'
++ sudo -n uptime
+++ awk -F= '/^ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
++ [[ ! centos =~ ^(centos|rhel)$ ]]
+++ awk -F= '/^VERSION_ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
+++ cut -f1 -d.
++ [[ 7 -ne 7 ]]
+++ df / --output=fstype
+++ grep -v Type
++ FSTYPE=xfs
++ case ${FSTYPE} in
+++ xfs_info /
+++ grep -q ftype=1
++ [[ -n '' ]]
++ '[' '!' -d /opt/metal3-dev-env ']'
++ go env
+ eval 'GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jparrill/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jparrill/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/golang"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build147695623=/tmp/go-build -gno-record-gcc-switches"'
++ GOARCH=amd64
++ GOBIN=
++ GOCACHE=/home/jparrill/.cache/go-build
++ GOEXE=
++ GOFLAGS=
++ GOHOSTARCH=amd64
++ GOHOSTOS=linux
++ GOOS=linux
++ GOPATH=/home/jparrill/go
++ GOPROXY=
++ GORACE=
++ GOROOT=/usr/lib/golang
++ GOTMPDIR=
++ GOTOOLDIR=/usr/lib/golang/pkg/tool/linux_amd64
++ GCCGO=gccgo
++ CC=gcc
++ CXX=g++
++ CGO_ENABLED=1
++ GOMOD=
++ CGO_CFLAGS='-g -O2'
++ CGO_CPPFLAGS=
++ CGO_CXXFLAGS='-g -O2'
++ CGO_FFLAGS='-g -O2'
++ CGO_LDFLAGS='-g -O2'
++ PKG_CONFIG=pkg-config
++ GOGCCFLAGS='-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build147695623=/tmp/go-build -gno-record-gcc-switches'
+ M3PATH=/home/jparrill/go/src/github.com/metal3-io
+ BMOPATH=/home/jparrill/go/src/github.com/metal3-io/baremetal-operator
+ CAPBMPATH=/home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal
+ clone_repos
+ mkdir -p /home/jparrill/go/src/github.com/metal3-io
+ '[' '!' -d /home/jparrill/go/src/github.com/metal3-io/baremetal-operator ']'
+ pushd /home/jparrill/go/src/github.com/metal3-io/baremetal-operator
~/go/src/github.com/metal3-io/baremetal-operator ~/projects/kni/repos/metal3-dev-env
+ git pull -r
Current branch master is up to date.
+ popd
~/projects/kni/repos/metal3-dev-env
+ '[' '!' -d /home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal ']'
+ pushd /home/jparrill/go/src/github.com/metal3-io
~/go/src/github.com/metal3-io ~/projects/kni/repos/metal3-dev-env
+ git clone https://github.com/metal3-io/cluster-api-provider-baremetal.git
Cloning into 'cluster-api-provider-baremetal'...
+ popd
~/projects/kni/repos/metal3-dev-env
+ pushd /home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal
~/go/src/github.com/metal3-io/cluster-api-provider-baremetal ~/projects/kni/repos/metal3-dev-env
+ git pull -r
Current branch master is up to date.
+ popd
~/projects/kni/repos/metal3-dev-env
+ configure_minikube
+ minikube config set vm-driver kvm2
! These changes will take effect upon a minikube delete and then a minikube start
+ launch_minikube
+ minikube start
* minikube v1.2.0 on linux (amd64)
* Downloading Minikube ISO ...
 129.33 MB / 129.33 MB [============================================] 100.00% 0s
* Creating kvm2 VM (CPUs=2, Memory=2048MB, Disk=20000MB) ...
* Configuring environment for Kubernetes v1.15.0 on Docker 18.09.6
* Downloading kubeadm v1.15.0
* Downloading kubelet v1.15.0
* Pulling images ...
* Launching Kubernetes ... 
* Verifying: apiserver proxy etcd scheduler controller dns
* Done! kubectl is now configured to use "minikube"
+ sudo virsh attach-interface --domain minikube --model virtio --source provisioning --type network --config
Interface attached successfully

+ minikube stop
* Stopping "minikube" in kvm2 ...
* "minikube" stopped.
+ minikube start
* minikube v1.2.0 on linux (amd64)
* Tip: Use 'minikube start -p <name>' to create a new cluster, or 'minikube delete' to delete this one.
* Restarting existing kvm2 VM for "minikube" ...
* Waiting for SSH access ...
* Configuring environment for Kubernetes v1.15.0 on Docker 18.09.6
* Relaunching Kubernetes v1.15.0 using kubeadm ... 
* Verifying: apiserver proxy etcd scheduler controller dns
* Done! kubectl is now configured to use "minikube"
+ launch_baremetal_operator
+ pushd /home/jparrill/go/src/github.com/metal3-io/baremetal-operator
~/go/src/github.com/metal3-io/baremetal-operator ~/projects/kni/repos/metal3-dev-env
+ make deploy
make[1]: Entering directory `/home/jparrill/go/src/github.com/metal3-io/baremetal-operator'
echo "{ \"kind\": \"Namespace\", \"apiVersion\": \"v1\", \"metadata\": { \"name\": \"metal3\", \"labels\": { \"name\": \"metal3\" } } }" | kubectl apply -f -
namespace/metal3 created
kubectl apply -f deploy/service_account.yaml -n metal3
serviceaccount/metal3-baremetal-operator created
kubectl apply -f deploy/role.yaml -n metal3
role.rbac.authorization.k8s.io/metal3-baremetal-operator created
kubectl apply -f deploy/role_binding.yaml
rolebinding.rbac.authorization.k8s.io/metal3-baremetal-operator created
kubectl apply -f deploy/crds/metal3_v1alpha1_baremetalhost_crd.yaml
customresourcedefinition.apiextensions.k8s.io/baremetalhosts.metal3.io created
kubectl apply -f deploy/operator.yaml -n metal3
deployment.apps/metal3-baremetal-operator created
make[1]: Leaving directory `/home/jparrill/go/src/github.com/metal3-io/baremetal-operator'
+ popd
~/projects/kni/repos/metal3-dev-env
+ apply_bm_hosts
+ list_nodes
+ make_bm_hosts
+ read name address user password mac
+ cat /opt/metal3-dev-env/ironic_nodes.json
+ jq '.nodes[] | {
           name,
           driver,
           address:.driver_info.ipmi_address,
           port:.driver_info.ipmi_port,
           user:.driver_info.ipmi_username,
           password:.driver_info.ipmi_password,
           mac: .ports[0].address
           } |
           .name + " " +
           .driver + "://" + .address + (if .port then ":" + .port else "" end)  + " " +
           .user + " " + .password + " " + .mac'
+ sed 's/"//g'
+ go run /home/jparrill/go/src/github.com/metal3-io/baremetal-operator/cmd/make-bm-worker/main.go -address ipmi://192.168.111.1:6230 -password password -user admin -boot-mac 00:5f:69:f3:f0:df master-0
+ read name address user password mac
+ go run /home/jparrill/go/src/github.com/metal3-io/baremetal-operator/cmd/make-bm-worker/main.go -address ipmi://192.168.111.1:6231 -password password -user admin -boot-mac 00:5f:69:f3:f0:e3 worker-0
+ read name address user password mac
+ kubectl apply -f bmhosts_crs.yaml -n metal3
secret/master-0-bmc-secret created
baremetalhost.metal3.io/master-0 created
secret/worker-0-bmc-secret created
baremetalhost.metal3.io/worker-0 created
+ launch_cluster_api
+ pushd /home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal
~/go/src/github.com/metal3-io/cluster-api-provider-baremetal ~/projects/kni/repos/metal3-dev-env
+ make deploy
make[1]: Entering directory `/home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal'
go run vendor/sigs.k8s.io/controller-tools/cmd/controller-gen/main.go all
CRD manifests generated under '/home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal/config/crds' 
RBAC manifests generated under '/home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal/config/rbac' 
kustomize build config/ > provider-components.yaml
Error: json: cannot unmarshal string into Go struct field Kustomization.patches of type types.Patch
make[1]: *** [manifests] Error 1
make[1]: Leaving directory `/home/jparrill/go/src/github.com/metal3-io/cluster-api-provider-baremetal'
make: *** [launch_mgmt_cluster] Error 2

Hints:

  • dev-scripts works well

ideas?

Add a clusterctl based deployment method

Now that CAPI v1alpha3 is released, we should add a clusterctl based deployment to validate the integration.

What needs to be done:

  • download clusterctl
  • run init command instead of deploying CAPM3
  • run config command to create the cluster instead of using our own template, with the proper variables in place.

There could be a switch to select which deployment method should be used.

ipv6 provisioning doesn't work on CentOS

Follow-up issue from #96 - there are two problems preventing any CentOS support for ipv6 provisioning atm:

There aren't any CentOS 8 cloud images yet in http://cloud.centos.org/centos/8/

CentOS 7 doesn't work is because the cloud images are BIOS only. FCOS and Ubuntu both have dual BIOS/UEFI images. Hopefully when CentOS 8 images come out they will be dual boot.

The deployment does currently work on RHEL8, it's not yet been tested on Ubuntu but sounds like that is expected to work.

Can not provision success in ubuntu 18.04

I have try on ubuntu 18.04, then I get this:

pengli@reeve1:~$ kubectl -n metal3 get baremetalhosts
NAME     STATUS   PROVISIONING STATUS   CONSUMER   BMC                         HARDWARE PROFILE   ONLINE   ERROR
node-0   error    registration error               ipmi://192.168.111.1:6230                      true     Failed to get power state for node 56108961-f233-443d-a6a2-abd87d2ed432. Error: IPMI call failed: power status.
node-1   error    registration error               ipmi://192.168.111.1:6231                      true     Failed to get power state for node a8ff2bf7-d351-4bf0-b7d9-ef5576c78396. Error: IPMI call failed: power status.

I checked the vbmc and it like this:

pengli@reeve1:~$ vbmc list
+-------------+--------+---------+------+
| Domain name | Status | Address | Port |
+-------------+--------+---------+------+
| node_0      | down   | ::      | 6230 |
| node_1      | down   | ::      | 6231 |
+-------------+--------+---------+------+

I try to start them manually, but not work.
Anything I missing, thanks.

zeromq installation failed

I use centos/7 vagrant box. centos version:
[vagrant@metal3 ~]$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

zeromq installation failed.

After the following workaround, the installation was finished:
#workaroud zeromq compatible issue
sed -i "s/zeromq/xxxxx/g" 01_install_requirements.sh
make

Add an alternative to Minikube based on a Kind cluster

A lighter option, for CI for example, would be to create a Kind cluster and deploy BMO, CAPI, CAPM3 inside while deploying Ironic outside of the cluster. This would help us get rid of the Minikube stability issues.

Failed to run a custom cluster-api-provider-baremetal

Hi,

I am trying to follow the steps mentioned in README file to run a custom cluster-api-provider-baremetal from local repositories but the make run is stuck after the following :

go generate ./pkg/... ./cmd/...
go fmt ./pkg/... ./cmd/...
go vet ./pkg/... ./cmd/...
go run ./cmd/manager/main.go
{"level":"info","ts":1562319542.715248,"logger":"baremetal-controller-manager","msg":"Found API group metal3.io/v1alpha1"}
{"level":"info","ts":1562319542.7608466,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"machine-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1562319542.7609959,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"machine-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1562319542.8617175,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"machine-controller"}
{"level":"info","ts":1562319542.9623592,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"machine-controller","worker count":1}

The initial guess is that the process stucks during Reconcilation . (Code refers to /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go ).

Has anyone else faced the same issue? Am I doing something wrong?

I am using CentOS 7 - 3.10.0-957.21.3.el7.x86_64 and libvirtd (libvirt) 4.5.0

BR
Kashif

Deployment fails on new CentOS7 host due to firewall disabled

On a new minimal CentOS box we see this error from 01_prepare_host in configure_minikube:

X Unable to start VM. Please investigate and run 'minikube delete' if possible: create: Error creating machine: Error in driver during machine creation: creating network: creating network minikube-net: virError(Code=89, Domain=47, Message='The name org.fedoraproject.FirewallD1 was not provided by any .service files')

There seem to be two problems - it's assuming FirewallD but we assume iptables on CentoOS7 - AFAICS firewalld is the correct choice on 7.7 though:

$ sudo systemctl status firewalld.service
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
   Active: inactive (dead)
     Docs: man:firewalld(1)

Feb 12 09:39:23 localhost.localdomain systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 12 09:39:23 localhost.localdomain systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 12 11:41:14 localhost.localdomain systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 12 11:41:15 localhost.localdomain systemd[1]: Stopped firewalld - dynamic firewall daemon.
$ cat /etc/redhat-release 
CentOS Linux release 7.7.1908 (Core)

The second issue is we enable the firewall in 02_configure_host.sh which is too late since the error above happens during 01_prepare_host.sh on a freshly installed box

vbmc issues running on RHEL 8

Testing on a RHEL 8 box, there seem to be some stray vbmc add processes, which don't get cleaned up by make clean, and also the systemd service doesn't look healthy, although deployment does work provided you kill the stray processes after a make clean

[shardy@localhost dev-scripts]$ sudo ps aux | grep vbmc
root     14190  0.0  0.0 603024 35624 ?        Sl   15:42   0:00 /usr/bin/python3.6 /usr/local/bin/vbmc add openshift_master_0 --port 6230 --libvirt-uri qemu+ssh://[email protected]/system?&keyfile=/root/.ssh/id_rsa_virt_power&no_verify=1&no_tty=1
root     14691  0.0  0.0 603024 30440 ?        Sl   15:43   0:00 /usr/bin/python3.6 /usr/local/bin/vbmc add openshift_master_0 --port 6230 --libvirt-uri qemu+ssh://[email protected]/system?&keyfile=/root/.ssh/id_rsa_virt_power&no_verify=1&no_tty=1
root     14718  0.0  0.0 603024 30500 ?        Sl   15:43   0:00 /usr/bin/python3.6 /usr/local/bin/vbmc add openshift_master_0 --port 6230 --libvirt-uri qemu+ssh://[email protected]/system?&keyfile=/root/.ssh/id_rsa_virt_power&no_verify=1&no_tty=1
root     14744  0.0  0.0 603024 30500 ?        Sl   15:43   0:00 /usr/bin/python3.6 /usr/local/bin/vbmc add openshift_master_0 --port 6230 --libvirt-uri qemu+ssh://[email protected]/system?&keyfile=/root/.ssh/id_rsa_virt_power&no_verify=1&no_tty=1
root     14770  0.0  0.0 603024 30528 ?        Sl   15:43   0:00 /usr/bin/python3.6 /usr/local/bin/vbmc add openshift_master_0 --port 6230 --libvirt-uri qemu+ssh://[email protected]/system?&keyfile=/root/.ssh/id_rsa_virt_power&no_verify=1&no_tty=1
shardy   18603  0.0  0.0 221864   968 pts/3    S+   15:58   0:00 grep --color=auto vbmc
[shardy@localhost dev-scripts]$ sudo /usr/local/bin/vbmc list
+--------------------+---------+---------+------+
| Domain name        | Status  | Address | Port |
+--------------------+---------+---------+------+
| openshift_master_0 | running | ::      | 6230 |
| openshift_master_1 | running | ::      | 6231 |
| openshift_master_2 | running | ::      | 6232 |
| openshift_worker_0 | running | ::      | 6233 |
+--------------------+---------+---------+------+
[shardy@localhost dev-scripts]$ sudo systemctl status virtualbmc
● virtualbmc.service - Virtual Baseboard Management Controller Emulation service
   Loaded: loaded (/etc/systemd/system/virtualbmc.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-07-11 15:42:59 BST; 15min ago
  Process: 14186 ExecStart=/usr/bin/vbmcd --foreground (code=exited, status=203/EXEC)
 Main PID: 14186 (code=exited, status=203/EXEC)

Jul 11 15:42:59 localhost.localdomain systemd[1]: virtualbmc.service: Service RestartSec=1s expired, scheduling restart.
Jul 11 15:42:59 localhost.localdomain systemd[1]: virtualbmc.service: Scheduled restart job, restart counter is at 5.
Jul 11 15:42:59 localhost.localdomain systemd[1]: Stopped Virtual Baseboard Management Controller Emulation service.
Jul 11 15:42:59 localhost.localdomain systemd[1]: virtualbmc.service: Start request repeated too quickly.
Jul 11 15:42:59 localhost.localdomain systemd[1]: virtualbmc.service: Failed with result 'exit-code'.
Jul 11 15:42:59 localhost.localdomain systemd[1]: Failed to start Virtual Baseboard Management Controller Emulation service.

Existing KUBECONFIG makes install fail

I had KUBECONFIG set from a previous deployment with OpenShift Metal3. The workaround was just unset KUBECONFIG and run again.

make[1]: Entering directory '/home/notstack/go/src/github.com/metal3-io/baremetal-operator'
echo "{ \"kind\": \"Namespace\", \"apiVersion\": \"v1\", \"metadata\": { \"name\": \"metal3\", \"labels\": { \"name\": \"metal3\" } } }" | kubectl apply -f -
W1014 15:08:27.304759   33478 loader.go:223] Config not found: /home/notstack/dev-scripts/ocp/auth/kubeconfig
error: unable to recognize "STDIN": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused
make[1]: *** [Makefile:126: deploy] Error 1
make[1]: Leaving directory '/home/notstack/go/src/github.com/metal3-io/baremetal-operator'

Error: IPMI call failed: power status on CentOS Linux release 7.7.1908 (Core)

Hi, I'm currently trying to set up my dev env and keep hitting the error above. A bit new to the bare metal stuff so any help would be very appreciated.

This is what my virsh looks like:

sudo virsh list --all
 Id    Name                           State
----------------------------------------------------
 25    minikube                       running
 28    node_0                         running
 29    node_1                         running

Here's what kubectl get bmh -n metal3 returns

NAME     STATUS   PROVISIONING STATUS   CONSUMER   BMC                         HARDWARE PROFILE   ONLINE   ERROR
node-0   error    registration error               ipmi://192.168.111.1:6230                      true     Failed to get power state for node be2287c0-0834-486d-b43c-5f1fe4aef4f7. Error: IPMI call failed: power status.
node-1   error    registration error               ipmi://192.168.111.1:6231                      true     Failed to get power state for node 55201488-7ee7-4e5f-97e8-c2b45897a76a. Error: IPMI call failed: power status.

Logs for the operator show the same story:

{"level":"info","ts":1576875459.2814622,"logger":"baremetalhost_ironic","msg":"validating management access","host":"node-1"}
{"level":"info","ts":1576875459.3040826,"logger":"baremetalhost_ironic","msg":"found existing node by ID","host":"node-1"}
{"level":"info","ts":1576875459.304359,"logger":"baremetalhost_ironic","msg":"current provision state","host":"node-1","lastError":"","current":"verifying","target":"manageable"}
{"level":"info","ts":1576875459.3044887,"logger":"baremetalhost","msg":"response from validate","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","provResult":{"Dirty":true,"RequeueAfter":0,"ErrorMessage":""}}
{"level":"info","ts":1576875459.3046327,"logger":"baremetalhost","msg":"host not ready","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","wait":0}
{"level":"info","ts":1576875459.304894,"logger":"baremetalhost","msg":"saving host status","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","operational status":"OK","provisioning state":"registering"}
{"level":"info","ts":1576875459.3119812,"logger":"baremetalhost","msg":"done","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","requeue":true,"after":0}
{"level":"info","ts":1576875459.312225,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"metal3","Request.Name":"node-1"}
{"level":"info","ts":1576875459.312614,"logger":"baremetalhost","msg":"registering and validating access to management controller","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","credentials":{"credentials":{"name":"node-1-bmc-secret","namespace":"metal3"},"credentialsVersion":"6943"}}
{"level":"info","ts":1576875459.3128545,"logger":"baremetalhost_ironic","msg":"validating management access","host":"node-1"}
{"level":"info","ts":1576875459.3331463,"logger":"baremetalhost_ironic","msg":"found existing node by ID","host":"node-1"}
{"level":"info","ts":1576875459.3331733,"logger":"baremetalhost_ironic","msg":"current provision state","host":"node-1","lastError":"","current":"verifying","target":"manageable"}
{"level":"info","ts":1576875459.3331816,"logger":"baremetalhost","msg":"response from validate","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","provResult":{"Dirty":true,"RequeueAfter":0,"ErrorMessage":""}}
{"level":"info","ts":1576875459.3331926,"logger":"baremetalhost","msg":"host not ready","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","wait":0}
{"level":"info","ts":1576875459.333346,"logger":"baremetalhost","msg":"saving host status","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","operational status":"OK","provisioning state":"registering"}
{"level":"info","ts":1576875459.340921,"logger":"baremetalhost","msg":"done","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","requeue":true,"after":0}
{"level":"info","ts":1576875459.9524033,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"metal3","Request.Name":"node-1"}
{"level":"info","ts":1576875459.9526906,"logger":"baremetalhost","msg":"registering and validating access to management controller","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","credentials":{"credentials":{"name":"node-1-bmc-secret","namespace":"metal3"},"credentialsVersion":"6943"}}
{"level":"info","ts":1576875459.9529614,"logger":"baremetalhost_ironic","msg":"validating management access","host":"node-1"}
{"level":"info","ts":1576875459.9883375,"logger":"baremetalhost_ironic","msg":"found existing node by ID","host":"node-1"}
{"level":"info","ts":1576875459.988423,"logger":"baremetalhost_ironic","msg":"current provision state","host":"node-1","lastError":"Failed to get power state for node 55201488-7ee7-4e5f-97e8-c2b45897a76a. Error: IPMI call failed: power status.","current":"enroll","target":""}
{"level":"info","ts":1576875459.9885383,"logger":"baremetalhost","msg":"response from validate","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","provResult":{"Dirty":false,"RequeueAfter":0,"ErrorMessage":"Failed to get power state for node 55201488-7ee7-4e5f-97e8-c2b45897a76a. Error: IPMI call failed: power status."}}
{"level":"info","ts":1576875459.9886603,"logger":"baremetalhost","msg":"saving host status","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","operational status":"error","provisioning state":"registration error"}
{"level":"info","ts":1576875459.9938457,"logger":"baremetalhost","msg":"publishing event","reason":"RegistrationError","message":"Failed to get power state for node 55201488-7ee7-4e5f-97e8-c2b45897a76a. Error: IPMI call failed: power status."}
{"level":"info","ts":1576875460.0001786,"logger":"baremetalhost","msg":"stopping on host error","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registering","message":"Failed to get power state for node 55201488-7ee7-4e5f-97e8-c2b45897a76a. Error: IPMI call failed: power status."}
{"level":"info","ts":1576875460.00066,"logger":"baremetalhost","msg":"Reconciling BareMetalHost","Request.Namespace":"metal3","Request.Name":"node-1"}
{"level":"info","ts":1576875460.0009577,"logger":"baremetalhost","msg":"stopping on host error","Request.Namespace":"metal3","Request.Name":"node-1","provisioningState":"registration error","message":"Failed to get power state for node 55201488-7ee7-4e5f-97e8-c2b45897a76a. Error: IPMI call failed: power status."}

sudo vbmc list returned:

+-------------+---------+---------+------+
| Domain name | Status  | Address | Port |
+-------------+---------+---------+------+
| node_0      | running | ::      | 6230 |
| node_1      | running | ::      | 6231 |
+-------------+---------+---------+------+

I've tried deleting the bmh crds and applying them again with the same result. Thoughts?

Move Ironic into the cluster

The scripts currently run the ironic containers on the host. All of these containers should move into the cluster as part of the baremetal-operator pod. The baremetal-operator repo includes a sample pod manifest that includes ironic as a starting point. It will likely need some additions to ensure the pod downloads some required images - IPA and an OS image like the CentOS image that the scripts download right now.

[v1alpha3] study and migration

Cluster API v1alpha3 is on its way and we need to study what it brings along and how these changes are migrated to metal3-dev-env

Local image testing doesn't work with docker

The local image support we added in #104 only works with podman. docker push doesn't support the same options as podman, it doesn't support the tls-verify option, nor does it support two arguments (local image + destination)

Automatically deploy cluster-api integration

So far, metal3-dev-env sets up virtual bare metal hosts, configures the baremetal-operator to manage them. Next this needs to automatically deploy cluster-api-provider-baremetal and allow provisioning hosts through the Machine interface.

Installing metal3 behind an http proxy

By default this code base does not support corporate proxy settings

Specifically with the proxy issues and ubuntu 18.04 the ubuntu_install_requirements.sh does not appear to honor the proxy configurations for several things:

  1. line 38 sudo add-apt-repository -y ppa:projectatomic/ppa
  2. line 40 add-apt-repository -y ppa:longsleep/golang-backports
  3. line 68 sudo pip3 install

These issues can be resolved by applying a sudo -E so that it picks up the environment settings or informing the pip install that there is a proxy setting however the deployment of the metal3-baremetal-operator does not pick up the environment and will fail every time pulling in the agents it's attempting to install ignoring the environment settings:

ubuntu@virtual-airship:~/metal3-dev-env$ kubectl logs metal3-baremetal-operator-5cbbd7b87d-kvcbx --previous -n metal3 ironic-ipa-downloader
+ export http_proxy=
+ http_proxy=
+ export https_proxy=
+ https_proxy=
+ SNAP=current-tripleo
+ IPA_BASEURI=https://images.rdoproject.org/train/rdo_trunk/current-tripleo/
+ FILENAME=ironic-python-agent
+ FILENAME_EXT=.tar
+ FFILENAME=ironic-python-agent.tar
+ mkdir -p /shared/html/images /shared/tmp
+ cd /shared/html/images
++ mktemp -d -p /shared/tmp
+ TMPDIR=/shared/tmp/tmp.XQ3D6VInff
+ ls -l
total 4
-rw-r--r-- 1 root root 306 Jan 22 22:31 ironic-python-agent.tar.headers
+ '[' -n http://172.22.0.1/images -a '!' -e ironic-python-agent.tar.headers ']'
+ '[' -e ironic-python-agent.tar.headers ']'
++ awk '/ETag:/ {print $2}' ironic-python-agent.tar.headers
++ tr -d '\r'
+ ETAG='"18632000-59c951f5a7ab6"'
+ cd /shared/tmp/tmp.XQ3D6VInff
+ curl -g --verbose --dump-header ironic-python-agent.tar.headers -O https://images.rdoproject.org/train/rdo_trunk/current-tripleo//ironic-python-agent.tar --header 'If-None-Match: "18632000-59c951f5a7ab6"'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 38.145.33.168...
* TCP_NODELAY set
  0     0    0     0    0     0      0      0 --:--:--  0:00:31 --:--:--     0* connect to 38.145.33.168 port 443 failed: Connection timed out
* Failed to connect to images.rdoproject.org port 443: Connection timed out
* Closing connection 0
curl: (7) Failed to connect to images.rdoproject.org port 443: Connection timed out
ubuntu@virtual-airship:~/metal3-dev-env$

Create and manage additional Clusters on metal3-dev-env

I cannot see the clusters pls
$ kubectl --kubeconfig .kube/config get machines --all-namespaces
NAMESPACE NAME PROVIDERID PHASE
metal3 centos
[user@centos7 ~]$ kubectl --kubeconfig .kube/config get clusters --all-namespaces
No resources found.
Steps to create and manage additional cluster will be helpful.

Error running baremetal-operator locally

Hi,

I am trying to follow instructions to run my baremetal-operator locally for test purposes. After

kubectl scale deployment metal3-baremetal-operator -n metal3 --replicas=0
cd ~/go/src/github.com/metal3-io/baremetal-operator
make run 

I will get following message:

operator-sdk up local \
	--go-ldflags="-X github.com/metal3-io/baremetal-operator/pkg/version.Raw=before-rename-112-g497359c9953e08b12237ce1a7eb8cd183fedd538 -X github.com/metal3-io/baremetal-operator/pkg/version.Commit="497359c9953e08b12237ce1a7eb8cd183fedd538"" \
	--namespace=metal3 \
	--operator-flags="-dev"
/bin/sh: operator-sdk: command not found
make: *** [run] Error 127

After that I followed these instructions successfully to install operator-sdk.
https://github.com/operator-framework/operator-sdk/blob/master/doc/user/install-operator-sdk.md

After running make run again in baremetal-operator folder, receiving following output in my terminal. But process is stuck in this state for hours?

operator-sdk up local \
	--go-ldflags="-X github.com/metal3-io/baremetal-operator/pkg/version.Raw=before-rename-112-g497359c9953e08b12237ce1a7eb8cd183fedd538 -X github.com/metal3-io/baremetal-operator/pkg/version.Commit="497359c9953e08b12237ce1a7eb8cd183fedd538"" \
	--namespace=metal3 \
	--operator-flags="-dev"
INFO[0000] Running the operator locally.                
INFO[0000] Using namespace metal3.                      
2019-07-05T12:33:23.395+0300	INFO	cmd	Go Version: go1.11.5
2019-07-05T12:33:23.395+0300	INFO	cmd	Go OS/Arch: linux/amd64
2019-07-05T12:33:23.395+0300	INFO	cmd	Version of operator-sdk: v0.8.0+git
2019-07-05T12:33:23.395+0300	INFO	cmd	Component version: metal3-io/baremetal-operator before-rename-112-g497359c9953e08b12237ce1a7eb8cd183fedd538
2019-07-05T12:33:23.396+0300	INFO	leader	Trying to become the leader.
2019-07-05T12:33:23.396+0300	INFO	leader	Skipping leader election; not running in a cluster.
2019-07-05T12:33:23.706+0300	INFO	cmd	Registering Components.
E0705 12:33:23.706254    5412 client_go_adapter.go:318] descriptor Desc{fqName: "metal3-baremetalhost-controller_depth", help: "Current depth of workqueue: metal3-baremetalhost-controller", constLabels: {}, variableLabels: []} is invalid: "metal3-baremetalhost-controller_depth" is not a valid metric name
E0705 12:33:23.707818    5412 client_go_adapter.go:328] descriptor Desc{fqName: "metal3-baremetalhost-controller_adds", help: "Total number of adds handled by workqueue: metal3-baremetalhost-controller", constLabels: {}, variableLabels: []} is invalid: "metal3-baremetalhost-controller_adds" is not a valid metric name
E0705 12:33:23.710739    5412 client_go_adapter.go:339] descriptor Desc{fqName: "metal3-baremetalhost-controller_queue_latency", help: "How long an item stays in workqueuemetal3-baremetalhost-controller before being requested.", constLabels: {}, variableLabels: []} is invalid: "metal3-baremetalhost-controller_queue_latency" is not a valid metric name
E0705 12:33:23.711845    5412 client_go_adapter.go:350] descriptor Desc{fqName: "metal3-baremetalhost-controller_work_duration", help: "How long processing an item from workqueuemetal3-baremetalhost-controller takes.", constLabels: {}, variableLabels: []} is invalid: "metal3-baremetalhost-controller_work_duration" is not a valid metric name
E0705 12:33:23.712983    5412 client_go_adapter.go:363] descriptor Desc{fqName: "metal3-baremetalhost-controller_unfinished_work_seconds", help: "How many seconds of work metal3-baremetalhost-controller has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases.", constLabels: {}, variableLabels: []} is invalid: "metal3-baremetalhost-controller_unfinished_work_seconds" is not a valid metric name
E0705 12:33:23.714086    5412 client_go_adapter.go:374] descriptor Desc{fqName: "metal3-baremetalhost-controller_longest_running_processor_microseconds", help: "How many microseconds has the longest running processor for metal3-baremetalhost-controller been running.", constLabels: {}, variableLabels: []} is invalid: "metal3-baremetalhost-controller_longest_running_processor_microseconds" is not a valid metric name
E0705 12:33:23.715233    5412 client_go_adapter.go:384] descriptor Desc{fqName: "metal3-baremetalhost-controller_retries", help: "Total number of retries handled by workqueue: metal3-baremetalhost-controller", constLabels: {}, variableLabels: []} is invalid: "metal3-baremetalhost-controller_retries" is not a valid metric name
2019-07-05T12:33:23.716+0300	INFO	controller-runtime.controller	Starting EventSource	{"controller": "metal3-baremetalhost-controller", "source": "kind source: /, Kind="}
2019-07-05T12:33:23.716+0300	INFO	controller-runtime.controller	Starting EventSource	{"controller": "metal3-baremetalhost-controller", "source": "kind source: /, Kind="}
2019-07-05T12:33:23.716+0300	INFO	cmd	Starting the Cmd.
2019-07-05T12:33:23.817+0300	INFO	controller-runtime.controller	Starting Controller	{"controller": "metal3-baremetalhost-controller"}
2019-07-05T12:33:23.918+0300	INFO	controller-runtime.controller	Starting workers	{"controller": "metal3-baremetalhost-controller", "worker count": 1}
2019-07-05T12:33:23.918+0300	INFO	baremetalhost	Reconciling BareMetalHost	{"Request.Namespace": "metal3", "Request.Name": "master-0"}
2019-07-05T12:33:23.919+0300	INFO	baremetalhost_ironic	ironic settings	{"endpoint": "http://localhost:6385/v1/", "inspectorEndpoint": "http://localhost:5050/v1/", "deployKernelURL": "http://172.22.0.1/images/ironic-python-agent.kernel", "deployRamdiskURL": "http://172.22.0.1/images/ironic-python-agent.initramfs"}
2019-07-05T12:33:23.919+0300	INFO	baremetalhost	inspecting hardware	{"Request.Namespace": "metal3", "Request.Name": "master-0", "provisioningState": "inspecting"}
2019-07-05T12:33:23.919+0300	INFO	baremetalhost_ironic	inspecting hardware	{"host": "master-0", "status": "OK"}
2019-07-05T12:33:23.961+0300	INFO	baremetalhost_ironic	found existing node by ID	{"host": "master-0"}
2019-07-05T12:33:23.967+0300	INFO	baremetalhost_ironic	inspection failed	{"host": "master-0", "error": "Introspection timeout"}
2019-07-05T12:33:23.967+0300	INFO	baremetalhost	publishing event	{"reason": "RegistrationError", "message": "Introspection timeout"}
2019-07-05T12:33:23.976+0300	INFO	baremetalhost	stopping on host error	{"Request.Namespace": "metal3", "Request.Name": "master-0", "provisioningState": "inspecting", "message": "Introspection timeout"}
2019-07-05T12:33:23.976+0300	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "metal3-baremetalhost-controller", "request": "metal3/master-0"}
2019-07-05T12:33:23.976+0300	INFO	baremetalhost	Reconciling BareMetalHost	{"Request.Namespace": "metal3", "Request.Name": "worker-0"}
2019-07-05T12:33:23.976+0300	INFO	baremetalhost_ironic	ironic settings	{"endpoint": "http://localhost:6385/v1/", "inspectorEndpoint": "http://localhost:5050/v1/", "deployKernelURL": "http://172.22.0.1/images/ironic-python-agent.kernel", "deployRamdiskURL": "http://172.22.0.1/images/ironic-python-agent.initramfs"}
2019-07-05T12:33:23.976+0300	INFO	baremetalhost	inspecting hardware	{"Request.Namespace": "metal3", "Request.Name": "worker-0", "provisioningState": "inspecting"}
2019-07-05T12:33:23.976+0300	INFO	baremetalhost_ironic	inspecting hardware	{"host": "worker-0", "status": "OK"}
2019-07-05T12:33:23.993+0300	INFO	baremetalhost_ironic	found existing node by ID	{"host": "worker-0"}
2019-07-05T12:33:23.999+0300	INFO	baremetalhost_ironic	inspection failed	{"host": "worker-0", "error": "Introspection timeout"}
2019-07-05T12:33:23.999+0300	INFO	baremetalhost	publishing event	{"reason": "RegistrationError", "message": "Introspection timeout"}
2019-07-05T12:33:24.004+0300	INFO	baremetalhost	stopping on host error	{"Request.Namespace": "metal3", "Request.Name": "worker-0", "provisioningState": "inspecting", "message": "Introspection timeout"}
2019-07-05T12:33:24.004+0300	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "metal3-baremetalhost-controller", "request": "metal3/worker-0"}    

Does anyone have the same problem? Or am I missing something important here?

Support for chroot based environment

This issue follows the progress in installing the development environment in a chroot based Centos7 environment on top of Opensuse. One important objective was to use the libvirt already running in the host machine

There are some minor hacks needed and some unsolved issues.

Hacks

  • This is not specific to metal3 but in order to make yum run in chroot, it is necessary to bind mount /proc and /dev directories into the chroot environment [1]
  • The installation script checks the filesystem using the output from the command df /. In a chroot environment, the result is not reliable. Even following the generally suggested solution of making /etc/mtab a symbolic link to /proc/mounts in the chroot environment, at some point it stopped working and I had to skip this check.
  • Podman doesn't work on top of an xfs file system. I had to install fuse-overlayfs and change podman's configuration to use it as storage driver. This process itself is tricky as there seems not to be packages for Centos and I had to install from sources.
  • Podam requires to access /run/systemd/private socket. The directory /run/systemd must be bind mounted in the chroot environment
  • In order to use the host's libvirt, the directory /opt/run/libvirt must be mounted in the chroot environment
  • The directory /opt/metal3-dev-env/pool must be created in the host machine because is used when creating the ooq_pool storage pool
  • I had to manually modify the the libvirt_uri in vm-setup/roles/virtbmc/tasks/setup_tasks.yml to use qemu:///system instead of the url qemu+ssh:///vbmc_address

Issues

  • At some point in the installation process, there was a problem because the provisioing network was already created. I had to clean up the host with the host_cleanup.sh script and start over again.
  • The virtualbmc service is not started because the setup process uses systemd to start it and this is not supported under chroot. This is the more likely root cause of the issue below.
  • The commands vbmc add <role>_0 --port 6230 --libvirt-uri qemu:///system for <role>="master"|"worker" fails with the error Server at 50891 may be dead, will not try to revive it. There is however, vbmc process listening at this port. This seem related to the next issue
  • The vbmc keeps restarting as shown in /etc/log/virtualbmc/virtualbmc.log:
DEBUG VirtualBMC [-] Server at 50891 connection error: Server response timed out
DEBUG VirtualBMC [-] Attempting to start `vbmcd` behind the scenes. Consider configuring your system to manage `vbmcd` via systemd. Automatic `vbmcd` start up will be removed in the future releases!
ERROR VirtualBMC [-] Control server error: Address already in use
  • The master_0 and worker_0 domains are created, but remains shut off. The can be started manually, but I cannot assert if they are actually running. Attaching to the console doesn't show any output.

References

[1] https://serverfault.com/questions/866294/error-failed-to-initialize-nss-library

Broken v1a2 deployment scripts

The v1alpha2 scripts to provision target clusters are broken both for Ubuntu and Centos, they should be removed, in the templates of the ansible role and the deployment of v1alpha2 should also be removed.

After configuring the Metal3 networks, Minikube cannot access k8s.gcr.io

When building Metal3 on Centos 7, the following is displayed upon the second start of Minikube:

VM is unable to access k8s.gcr.io, you may need to configure a proxy or set --image-repository

As a result, the deployment does not proceed beyond:

   - Waiting for task completion (up to 2400 seconds)  - Command: 'check_k8s_entity statefulsets cluster-api-controller-manager   cluster-api-provider-baremetal-controller-manager'

Because of the network issue, you will see:

kubectl get pods --all-namespaces

 metal3        cluster-api-controller-manager-0                      0/1     ImagePullBackOff    0          5m52s
 metal3        cluster-api-provider-baremetal-controller-manager-0   0/2     ErrImagePull        0          5m52s
 metal3        metal3-baremetal-operator-688bb8c4c4-sxmhd            0/6     Init:ErrImagePull   0          5m52s

Accessing Minikube via minikube ssh and looking at network config reveals two default routes. I don't know if this has anything to do with it, but during the initial configuration of Minikube it was able to access the internet while just on the default KVM network.

0.0.0.0         192.168.122.1   0.0.0.0         UG        0 0          0 eth0
0.0.0.0         192.168.111.1   0.0.0.0         UG        0 0          0 eth3
172.17.0.0      0.0.0.0         255.255.0.0     U         0 0          0 docker0
172.22.0.0      0.0.0.0         255.255.255.0   U         0 0          0 eth2
192.168.39.0    0.0.0.0         255.255.255.0   U         0 0          0 eth1
192.168.111.0   0.0.0.0         255.255.255.0   U         0 0          0 eth3
192.168.111.1   0.0.0.0         255.255.255.255 UH        0 0          0 eth3
192.168.122.0   0.0.0.0         255.255.255.0   U         0 0          0 eth0
192.168.122.1   0.0.0.0         255.255.255.255 UH        0 0          0 eth0

Update to python 3

In many places we still use python2, we should upgrade before the python2 EOS

Hit issue both on Ubuntu and Centos

I hit same issue, when run 03_launch_mgmt_cluster.sh, hit issue:

X Unable to start VM: create: Error creating machine: Error in driver during machine creation: creating domain: error defining domain xml:

I checked the env:

[pengli@ptyalin1 metal3-dev-env]$ sudo virsh list  --all
setlocale: No such file or directory
 Id    Name                           State
----------------------------------------------------
 1     minikube                       paused
 -     node_0                         shut off
 -     node_1                         shut off

And I try to resume the node:

[pengli@ptyalin1 metal3-dev-env]$ sudo virsh resume minikube
setlocale: No such file or directory
error: Failed to resume domain minikube
error: internal error: unable to execute QEMU command 'cont': Resetting the Virtual Machine is required

Could anyone give some suggestion, thanks.

The name org.fedoraproject.FirewallD1 was not provided by any .service files

CentOS 7 freshly installed. I've faced #136 and #198 so I've tried the workaround by commenting the init_minikube function in the 01 script and the yum updates. I've executed make clean then make and now it fails when running the following ansible task:

TASK [libvirt : Start libvirt networks] ******************************************************************************************************************************************************************************************************
ESC[1;30mtask path: /home/metal3/git/metal3-dev-env/vm-setup/roles/libvirt/tasks/network_setup_tasks.yml:38ESC[0m
ESC[0;34m<localhost> ESTABLISH LOCAL CONNECTION FOR USER: metal3ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'echo ~metal3 && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/metal3/.ansible/tmp/ansible-tmp-1580127282.17-132596154316573 `" && echo ansible-tmp-1580127282.17-132596154316573="` echo /home/metal3/.ansible/tmp/ansible-tmp-15
80127282.17-132596154316573 `" ) && sleep 0'ESC[0m
ESC[0;34mUsing module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/misc/virt_net.pyESC[0m
ESC[0;34m<localhost> PUT /home/metal3/.ansible/tmp/ansible-local-14329RkPLQw/tmpMj6aW7 TO /home/metal3/.ansible/tmp/ansible-tmp-1580127282.17-132596154316573/AnsiballZ_virt_net.pyESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'chmod u+x /home/metal3/.ansible/tmp/ansible-tmp-1580127282.17-132596154316573/ /home/metal3/.ansible/tmp/ansible-tmp-1580127282.17-132596154316573/AnsiballZ_virt_net.py && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'sudo -H -S -n  -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ttblpixnbmlxhxbrzaipvqsxuxinhoim ; /usr/bin/python /home/metal3/.ansible/tmp/ansible-tmp-1580127282.17-132596154316573/AnsiballZ_virt_net.py'
"'"' && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'rm -f -r /home/metal3/.ansible/tmp/ansible-tmp-1580127282.17-132596154316573/ > /dev/null 2>&1 && sleep 0'ESC[0m
ESC[0;33mchanged: [localhost] => (item={u'bridge': u'provisioning', u'forward_mode': u'bridge', u'name': u'provisioning'}) => {ESC[0m
ESC[0;33m    "ansible_loop_var": "item", ESC[0m
ESC[0;33m    "changed": true, ESC[0m
ESC[0;33m    "invocation": {ESC[0m
ESC[0;33m        "module_args": {ESC[0m
ESC[0;33m            "autostart": null, ESC[0m
ESC[0;33m            "command": "start", ESC[0m
ESC[0;33m            "name": "provisioning", ESC[0m
ESC[0;33m            "state": "active", ESC[0m
ESC[0;33m            "uri": "qemu:///system", ESC[0m
ESC[0;33m            "xml": nullESC[0m
ESC[0;33m        }ESC[0m
ESC[0;33m    }, ESC[0m
ESC[0;33m    "item": {ESC[0m
ESC[0;33m        "bridge": "provisioning", ESC[0m
ESC[0;33m        "forward_mode": "bridge", ESC[0m
ESC[0;33m        "name": "provisioning"ESC[0m
ESC[0;33m    }, ESC[0m
ESC[0;33m    "msg": 0ESC[0m
ESC[0;33m}ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'echo ~metal3 && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/metal3/.ansible/tmp/ansible-tmp-1580127282.47-18342203079863 `" && echo ansible-tmp-1580127282.47-18342203079863="` echo /home/metal3/.ansible/tmp/ansible-tmp-1580
127282.47-18342203079863 `" ) && sleep 0'ESC[0m
ESC[0;34mUsing module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/misc/virt_net.pyESC[0m
ESC[0;34m<localhost> PUT /home/metal3/.ansible/tmp/ansible-local-14329RkPLQw/tmpNxXveQ TO /home/metal3/.ansible/tmp/ansible-tmp-1580127282.47-18342203079863/AnsiballZ_virt_net.pyESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'chmod u+x /home/metal3/.ansible/tmp/ansible-tmp-1580127282.47-18342203079863/ /home/metal3/.ansible/tmp/ansible-tmp-1580127282.47-18342203079863/AnsiballZ_virt_net.py && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'sudo -H -S -n  -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-lsrvzrnrnexqheouizptenyzwntxapkl ; /usr/bin/python /home/metal3/.ansible/tmp/ansible-tmp-1580127282.47-18342203079863/AnsiballZ_virt_net.py'"'"' && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'rm -f -r /home/metal3/.ansible/tmp/ansible-tmp-1580127282.47-18342203079863/ > /dev/null 2>&1 && sleep 0'ESC[0m
ESC[0;31mThe full traceback is:ESC[0m
ESC[0;31mWARNING: The below traceback may *not* be related to the actual failure.ESC[0m
ESC[0;31m  File "/tmp/ansible_virt_net_payload_5r2Kb4/ansible_virt_net_payload.zip/ansible/modules/cloud/misc/virt_net.py", line 628, in mainESC[0m
ESC[0;31m  File "/tmp/ansible_virt_net_payload_5r2Kb4/ansible_virt_net_payload.zip/ansible/modules/cloud/misc/virt_net.py", line 522, in coreESC[0m
ESC[0;31m  File "/tmp/ansible_virt_net_payload_5r2Kb4/ansible_virt_net_payload.zip/ansible/modules/cloud/misc/virt_net.py", line 429, in startESC[0m
ESC[0;31m  File "/tmp/ansible_virt_net_payload_5r2Kb4/ansible_virt_net_payload.zip/ansible/modules/cloud/misc/virt_net.py", line 419, in createESC[0m
ESC[0;31m  File "/tmp/ansible_virt_net_payload_5r2Kb4/ansible_virt_net_payload.zip/ansible/modules/cloud/misc/virt_net.py", line 217, in createESC[0m
ESC[0;31m  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2990, in createESC[0m
ESC[0;31m    if ret == -1: raise libvirtError ('virNetworkCreate() failed', net=self)ESC[0m
ESC[0;31mESC[0m
ESC[0;31mfailed: [localhost] (item={u'bridge': u'baremetal', u'domain': u'ostest.test.metalkube.org', u'forward_mode': u'nat', u'name': u'baremetal', u'netmask': u'255.255.255.0', u'prefix': u'24', u'dhcp_range': [u'192.168.111.20', u'192.168.111.60'], u'dns': {u'hosts': [], u'forwarders': [{u'domain': u'apps.ostest.test.metalkube.org', u'addr': u'127.0.0.1'}]}, u'address': u'192.168.111.1', u'nat_port_range': [1024, 65535]}) => {ESC[0m
ESC[0;31m    "ansible_loop_var": "item", ESC[0m
ESC[0;31m    "changed": false, ESC[0m
ESC[0;31m    "invocation": {ESC[0m
ESC[0;31m        "module_args": {ESC[0m
ESC[0;31m            "autostart": null, ESC[0m
ESC[0;31m            "command": "start", ESC[0m
ESC[0;31m            "name": "baremetal", ESC[0m
ESC[0;31m            "state": "active", ESC[0m
ESC[0;31m            "uri": "qemu:///system", ESC[0m
ESC[0;31m            "xml": nullESC[0m
ESC[0;31m        }ESC[0m
ESC[0;31m    }, ESC[0m
ESC[0;31m    "item": {ESC[0m
ESC[0;31m        "address": "192.168.111.1", ESC[0m
ESC[0;31m        "bridge": "baremetal", ESC[0m
ESC[0;31m        "dhcp_range": [ESC[0m
ESC[0;31m            "192.168.111.20", ESC[0m
ESC[0;31m            "192.168.111.60"ESC[0m
ESC[0;31m        ], ESC[0m
ESC[0;31m        "dns": {ESC[0m
ESC[0;31m            "forwarders": [ESC[0m
ESC[0;31m                {ESC[0m
ESC[0;31m                    "addr": "127.0.0.1", ESC[0m
ESC[0;31m                    "domain": "apps.ostest.test.metalkube.org"ESC[0m
ESC[0;31m                }ESC[0m
ESC[0;31m            ], ESC[0m
ESC[0;31m            "hosts": []ESC[0m
ESC[0;31m        }, ESC[0m
ESC[0;31m        "domain": "ostest.test.metalkube.org", ESC[0m
ESC[0;31m        "forward_mode": "nat", ESC[0m
ESC[0;31m        "name": "baremetal", ESC[0m
ESC[0;31m        "nat_port_range": [ESC[0m
ESC[0;31m            1024, ESC[0m
ESC[0;31m            65535ESC[0m
ESC[0;31m        ], ESC[0m
ESC[0;31m        "netmask": "255.255.255.0", ESC[0m
ESC[0;31m        "prefix": "24"ESC[0m
ESC[0;31m    }, ESC[0m
ESC[0;31m    "msg": "The name org.fedoraproject.FirewallD1 was not provided by any .service files"ESC[0m
ESC[0;31m}ESC[0m

It looks like a libvirtd restart fixes it:

sudo systemctl restart libvirtd
./02_configure_host.sh
...
ESC[0;34m<localhost> EXEC /bin/sh -c 'echo ~metal3 && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /home/metal3/.ansible/tmp/ansible-tmp-1580129733.94-247031520019170 `" && echo ansible-tmp-1580129733.94-247031520019170="` echo /home/metal3/.ansible/tmp/ansible-tmp-15
80129733.94-247031520019170 `" ) && sleep 0'ESC[0m
ESC[0;34mUsing module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/misc/virt_net.pyESC[0m
ESC[0;34m<localhost> PUT /home/metal3/.ansible/tmp/ansible-local-14873gDLI2B/tmpZtxg3b TO /home/metal3/.ansible/tmp/ansible-tmp-1580129733.94-247031520019170/AnsiballZ_virt_net.pyESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'chmod u+x /home/metal3/.ansible/tmp/ansible-tmp-1580129733.94-247031520019170/ /home/metal3/.ansible/tmp/ansible-tmp-1580129733.94-247031520019170/AnsiballZ_virt_net.py && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'sudo -H -S -n  -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-wstomrbdcppqwltlblkshajdtqunytuh ; /usr/bin/python /home/metal3/.ansible/tmp/ansible-tmp-1580129733.94-247031520019170/AnsiballZ_virt_net.py'
"'"' && sleep 0'ESC[0m
ESC[0;34m<localhost> EXEC /bin/sh -c 'rm -f -r /home/metal3/.ansible/tmp/ansible-tmp-1580129733.94-247031520019170/ > /dev/null 2>&1 && sleep 0'ESC[0m
ESC[0;33mchanged: [localhost] => (item={u'bridge': u'baremetal', u'domain': u'ostest.test.metalkube.org', u'forward_mode': u'nat', u'name': u'baremetal', u'netmask': u'255.255.255.0', u'prefix': u'24', u'dhcp_range': [u'192.168.111.20', u'192.168.111.60'], u'dns': {u'hosts': [], u'forwarders': [{u'domain': u'apps.ostest.test.metalkube.org', u'addr': u'127.0.0.1'}]}, u'address': u'192.168.111.1', u'nat_port_range': [1024, 65535]}) => {ESC[0m
ESC[0;33m    "ansible_loop_var": "item", ESC[0m
ESC[0;33m    "changed": true, ESC[0m
ESC[0;33m    "invocation": {ESC[0m
ESC[0;33m        "module_args": {ESC[0m
ESC[0;33m            "autostart": null, ESC[0m
ESC[0;33m            "command": "start", ESC[0m
ESC[0;33m            "name": "baremetal", ESC[0m
ESC[0;33m            "state": "active", ESC[0m
ESC[0;33m            "uri": "qemu:///system", ESC[0m
ESC[0;33m            "xml": nullESC[0m
ESC[0;33m        }ESC[0m
ESC[0;33m    }, ESC[0m
ESC[0;33m    "item": {ESC[0m
ESC[0;33m        "address": "192.168.111.1", ESC[0m
ESC[0;33m        "bridge": "baremetal", ESC[0m
ESC[0;33m        "dhcp_range": [ESC[0m
ESC[0;33m            "192.168.111.20", ESC[0m
ESC[0;33m            "192.168.111.60"ESC[0m
ESC[0;33m        ], ESC[0m
ESC[0;33m        "dns": {ESC[0m
ESC[0;33m            "forwarders": [ESC[0m
ESC[0;33m                {ESC[0m
ESC[0;33m                    "addr": "127.0.0.1", ESC[0m
ESC[0;33m                    "domain": "apps.ostest.test.metalkube.org"ESC[0m
ESC[0;33m                }ESC[0m
ESC[0;33m            ], ESC[0m
ESC[0;33m            "hosts": []ESC[0m
ESC[0;33m        }, ESC[0m
ESC[0;33m        "domain": "ostest.test.metalkube.org", ESC[0m
ESC[0;33m        "forward_mode": "nat", ESC[0m
ESC[0;33m        "name": "baremetal", ESC[0m
ESC[0;33m        "nat_port_range": [ESC[0m
ESC[0;33m            1024, ESC[0m
ESC[0;33m            65535ESC[0m
ESC[0;33m        ], ESC[0m
ESC[0;33m        "netmask": "255.255.255.0", ESC[0m
ESC[0;33m        "prefix": "24"ESC[0m
ESC[0;33m    }, ESC[0m
ESC[0;33m    "msg": 0ESC[0m
ESC[0;33m}ESC[0m
...

Minikube is unstable

Sometimes when running the deployment, specially in CI, Minikube crashes, and the api server is unreachable. This is usually visible in the logs as :

Unable to connect to the server: net/http: TLS handshake timeout
or
Unable to connect to the server: dial tcp 192.168.39.179:8443: connect: no route to host

We should investigate this issue to figure out if the problem is from the way we use it (for example cause a kernel panic) or if there is an issue with Minikube itself (far less probable). Then we would be able to fix what we do wrong to fix this issue. I would say the reproducing rate (in CI) is around 25%. The crashes always happen at intensive phases (introspection or provisioning). More info needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.