openshift-metal3 / dev-scripts Goto Github PK

Scripts to automate development/test setup for openshift integration with https://github.com/metal3-io/

License: Apache License 2.0

Shell 78.00% Python 13.11% Makefile 0.69% Go 1.73% Dockerfile 0.07% Jinja 6.41%

dev-scripts's Introduction

Metal³ Installer Dev Scripts

This set of scripts configures some libvirt VMs and associated virtualbmc processes to enable deploying to them as dummy baremetal nodes.

We are using this repository as a work space for development environment setup and convenience/test scripts, the main logic needed to enable bare metal provisioning is now integrated into the go-based openshift-installer and other components of OpenShift via support for a baremetal platform type.

Pre-requisites

CentOS 8 or RHEL 8 host
- Alma and Rocky Linux 8 are also supported on a best effort basis
file system that supports d_type (see Troubleshooting section for more information)
ideally on a bare metal host with at least 64G of RAM
run as a user with passwordless sudo access
get a valid pull secret (json string) from https://cloud.redhat.com/openshift/install/pull-secret
get a login token from console-openshift-console.apps.ci.l2s4.p1.openshiftapps.com

Instructions

Preparation

Considering that this is a new install on a clean OS, the next tasks should be performed prior the installation:

Enable passwordless sudo for the current user

Consider creating a separate user for deployments, one without SSH access.

echo "$USER ALL=(ALL) NOPASSWD: ALL" >/etc/sudoers.d/${USER}
In case of RHEL, invoke subscription-manager in order to register and attach the subscription

Install new packages

sudo dnf upgrade -y
sudo dnf install -y git make wget jq

Clone the dev-scripts repository

git clone https://github.com/openshift-metal3/dev-scripts
Create a config file

cp config_example.sh config_$USER.sh
Configure dev-scripts working directory

By default, dev-scripts' working directory is set to /opt/dev-scripts. Make sure that the filesystem has at least 80GB of free space: df -h /.

Alternatively you may have a large /home filesystem, in which case you can export WORKING_DIR=/home/dev-scripts and the scripts will create this directory with appropriate permissions. In the event you create this directory manually it should be world-readable (chmod 755) and chowned by the non-root $USER.

Configuration

Make a copy of the config_example.sh to config_$USER.sh.

Go to https://console-openshift-console.apps.ci.l2s4.p1.openshiftapps.com/, click on your name in the top right, copy the login command, extract the token from the command and use it to set CI_TOKEN in config_$USER.sh.

Save the secret obtained from cloud.openshift.com to pull_secret.json.

There are variable defaults set in both the common.sh and the ocp_install_env.sh scripts, which may be important to override for your particular environment. You can set override values in your config_$USER.sh script.

Configuration for 64-bit ARM

64-bit ARM systems are supported for agent and BMIPI on 64-bit ARM hosts, emulating ARM on x86 (or vice versa) is not supported. You will also need to explictly set OPENSHIFT_RELEASE_IMAGE with a 64-bit ARM payload from https://arm64.ocp.releases.ci.openshift.org/

BMIPI cannot ipxe using ipv6, and the upstream vbmc, sushy-tools and ironic containers are only build for 64-bit x86. You must build your own vbmc, sushy-tools and then extract ironic from the release payload:

Build a 64-bit ARM version of vbmc and sushy-tools (must be executed on a 64-bit ARM system):

podman build -f https://raw.githubusercontent.com/metal3-io/ironic-image/main/resources/vbmc/Dockerfile . -t <yourbuild>/vbmc:arm

podman build -f https://raw.githubusercontent.com/metal3-io/ironic-image/main/resources/sushy-tools/Dockerfile . -t <yourbuild>/sushy-tools:arm

Override the sushy-tools and vbmc images in your config_$USER.sh

export SUSHY_TOOLS_IMAGE=<yourbuild>/sushy-tools:arm

export VBMC_IMAGE=<yourbuild>/vbmc:arm

Lastly, you'll need the ironic container from the OCP release payload:

oc adm release info quay.io/openshift-release-dev/ocp-release:4.13.0-ec.4-aarch64 --image-for ironic

quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cbddb6cdfa138bce5e8cf3ed59287024cf39ec0822b1b84dd125b96d8b871b20

Pull the Container

podman pull quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cbddb6cdfa138bce5e8cf3ed59287024cf39ec0822b1b84dd125b96d8b871b20 --authfile=pullsecret.json

Override the ironic image in your config_$USER.sh

export IRONIC_IMAGE=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cbddb6cdfa138bce5e8cf3ed59287024cf39ec0822b1b84dd125b96d8b871b20

Baremetal

For baremetal test setups where you don't require the VM fake-baremetal nodes, you may also set NODES_FILE to reference a manually created json file with the node details (see ironic_hosts.json.example - make sure the ironic nodes names follow the openshift-master-* and openshift-worker-* format), and NODES_PLATFORM which can be set to e.g "baremetal" to disable the libvirt master/worker node setup. See common.sh for other variables that can be overridden.

Important values to consider for override in your config_$USER.sh script:

# Deploy only the masters and no workers
NUM_WORKERS=0
# Indicate that this is a baremetal deployment
NODES_PLATFORM="baremetal"
# Path to your ironic_hosts.json file per the above
NODES_FILE="/root/dev-scripts/ironic_hosts.json"
# Set to the interface used by the baremetal bridge
INT_IF="em2"
# Set to the interface used by the provisioning bridge on the bootstrap host
PRO_IF="em1"
# Set to the interface used as the provisioning interface on the cluster nodes
CLUSTER_PRO_IF="ens1"
# Don't allow the baremetal bridge to be managed by libvirt
MANAGE_BR_BRIDGE="n"
# Set your valid DNS domain
BASE_DOMAIN=your.valid.domain.com
# Set your valid DNS cluster name
# (will be used as ${CLUSTER_NAME}.${BASE_DOMAIN}
CLUSTER_NAME=clustername
# Set your valid DNS VIP, such as 1.1.1.1 for 'ns1.example.com'
DNS_VIP="1.1.1.1"
# Set your IP Network stack family, determines the IP address family of the cluster, defaults
# to `v6`
IP_STACK="v4"
# Set your default network type, `OpenShiftSDN` or `OVNKubernetes`. For IPv4 deployments, this
# defaults to `OpenShiftSDN`. IPv6 and DualStack only work with OVNKubernetes, hence this
# parameter defaults to `OVNKubernetes` for IPv6 and DualStack deployments (see IP_STACK).
NETWORK_TYPE="OpenShiftSDN"
# Boolean to use OVNKubernetes local gateway mode. defaults to `false` which is shared mode
OVN_LOCAL_GATEWAY_MODE=false
# Set to the subnet in use on the external (baremetal) network
EXTERNAL_SUBNET_V4="192.168.111.0/24"
# Provide additional master/worker ignition configuration, will be
# merged with the installer provided config, can be used to modify
# the default nic configuration etc
export IGNITION_EXTRA=extra.ign
# Folder where to copy extra manifests for the cluster deployment
export ASSETS_EXTRA_FOLDER=local_file_path

Installation

Consider using tmux, screen or nohup as the installation takes around 1 hour.

For a new setup, run:

make

The Makefile will run the scripts in this order:

./01_install_requirements.sh

This installs any prerequisite packages, and also starts a local container registry to enable subsequent scripts to build/push images for testing. Any other dependencies for development/test are also installed here.

./02_configure_host.sh

This does necessary configuration on the host, e.g networking/firewall and also creates the libvirt resources necessary to deploy on VMs as emulated baremetal.

This should result in some (stopped) VMs on the local virthost and some additional bridges/networks for the baremetal and provisioning networks.

./03_build_installer.sh

This will extract openshift-install from the OCP release payload.

./04_setup_ironic.sh

This will setup containers related to the Ironic deployment services which run on the bootstrap VM and deployed cluster. It will start a webserver which caches the necessary images, starts virtual BMC services to control the VMs via IPMI or Redfish.

This script also can optionally build/push custom images for Ironic and other components see the Testing custom container images section below.

./06_create_cluster.sh

This will run openshift-install to generate ignition configs for the bootstrap node and the masters. The installer then launches both the bootstrap VM and master nodes using the Terraform providers for libvirt and Ironic. Once bootstrap is complete, the installer removes the bootstrap node and the cluster will be online.

You can view the IP for the bootstrap node by running sudo virsh net-dhcp-leases ostestbm. You can SSH to it using ssh core@IP.

Then you can interact with the k8s API on the bootstrap VM e.g sudo oc status --kubeconfig /etc/kubernetes/kubeconfig.

You can also see the status of the bootkube.sh script which is running via journalctl -b -f -u bootkube.service.

Interacting with the deployed cluster

Consider export KUBECONFIG=<path-to-config> to avoid using the --kubeconfig flag on each command.

When the master nodes are up and the cluster is active, you can interact with the API:

$ oc --kubeconfig ocp/${CLUSTER_NAME}/auth/kubeconfig get nodes
NAME       STATUS    ROLES     AGE       VERSION
master-0   Ready     master    20m       v1.12.4+50c2f2340a
master-1   Ready     master    20m       v1.12.4+50c2f2340a
master-2   Ready     master    20m       v1.12.4+50c2f2340a

GUI

Alternatively it is possible to manage the cluster using OpenShift Console web UI. The URL can be retrieved using

oc get routes --all-namespaces | grep console

By default, the URL is https://console-openshift-console.apps.ostest.test.metalkube.org

Accessing the web Console running on virtualized cluster from local web browser requires additional setup on local machine and the virt host to enable forwarding to cluster's VMs.

There are two ways to achieve this, by using sshuttle or xinetd.

sshuttle (works only with IPv4)

On your local machine install sshuttle

Add entry to /etc/hosts

192.168.111.4 console-openshift-console.apps.ostest.test.metalkube.org console openshift-authentication-openshift-authentication.apps.ostest.test.metalkube.org api.ostest.test.metalkube.org prometheus-k8s-openshift-monitoring.apps.ostest.test.metalkube.org alertmanager-main-openshift-monitoring.apps.ostest.test.metalkube.org kubevirt-web-ui.apps.ostest.test.metalkube.org oauth-openshift.apps.ostest.test.metalkube.org grafana-openshift-monitoring.apps.ostest.test.metalkube.org

Run sshuttle on the local machine

sshuttle -r <user>@<virthost> 192.168.111.0/24

xinetd

This approach uses xinetd instead of iptables to allow IPv4 to IPv6 forwarding.

Install xinetd
```
sudo yum install xinetd -y
```

Copy the example config file

sudo cp dev-scripts/openshift_xinetd_example.conf /etc/xinetd.d/openshift

Edit the config file

The values can be found at dev-scripts/ocp/ostest/.openshift_install_state.json:

IPv4_Host_IP: your hosts IPv4 address

IPv6_API_Address:

jq '.["*installconfig.InstallConfig"].config.platform.baremetal.apiVIP' dev-scripts/ocp/ostest/.openshift_install_state.json

IPv6_Ingress_Address:

jq '.["*installconfig.InstallConfig"].config.platform.baremetal.ingressVIP' dev-scripts/ocp/ostest/.openshift_install_state.json

sudo vim /etc/xinetd.d/openshift

Restart xinetd
```
sudo systemctl restart xinetd
```

Populate your local machine's /etc/hosts/

Replace <HOST_IP> with your host machine's address

<HOST_IP> console-openshift-console.apps.ostest.test.metalkube.org openshift-authentication-openshift-authentication.apps.ostest.test.metalkube.org grafana-openshift-monitoring.apps.ostest.test.metalkube.org prometheus-k8s-openshift-monitoring.apps.ostest.test.metalkube.org api.ostest.test.metalkube.org oauth-openshift.apps.ostest.test.metalkube.org

Ensure that ports 443 and 6443 ports on the host are open

sudo firewall-cmd --zone=public --permanent --add-service=https
sudo firewall-cmd --permanent --add-port=6443/tcp
sudo firewall-cmd --reload

Finally, to access the web Console use the kubeadmin user, and password generated in the dev-scripts/ocp/${CLUSTER_NAME}/auth/kubeadmin-password file.

Hosting multiple dev-scripts on the same host

dev-scripts has some support for running multiple instances on the same resources, when doing this CLUSTER_NAME is used to namespace various resources on the virtual host. This support is not actively tested and has a few limitations but aims to allow you to run two separate clusters on the same host.

To do this a the same user should be used to run dev-scripts for all environments but with a different config file. In the config file at least the following 3 environment variables should be defined and differ from their defaults CLUSTER_NAME, PROVISIONING_NETWORK and EXTERNAL_SUBNET_V4 e.g.

export CLUSTER_NAME=osopenshift
export PROVISIONING_NETWORK=172.33.0.0/24
export EXTERNAL_SUBNET_V4=192.168.222.0/24

Some resources are also shared on the virt hosts (e.g. some of the container on the virt host serving images, redfish etc..) In order to avoid multiple environments interfering with each other you should not clean or deploy one environment while another is deploying

Interacting with Ironic directly

The ./06_create_cluster.sh script generates a clouds.yaml file with connection settings for both instances of Ironic. The copy of Ironic that runs on the bootstrap node during installation can be accessed by using the cloud name metal3-bootstrap and the copy running inside the cluster once deployment is finished can be accessed by using the cloud name metal3.

Note that the clouds.yaml is generated on exit from ./06_create_cluster.sh (on success, and also on failure if possible), however it can be useful to generate the file during deployment, in which case generate_clouds_yaml.sh may be run manually.

The dev-scripts will install the baremetal command line tool on the provisioning host as part of setting up the cluster. The baremetal tool will look for clouds.yaml in the _clouds_yaml directory.

For manual debugging via the baremetal client connecting to the bootstrap VM, which is ephemeral and won't be available once the masters have been deployed:

export OS_CLOUD=metal3-bootstrap
baremetal node list
...

To access the Ironic instance running in the baremetal-operator pod:

export OS_CLOUD=metal3
baremetal node list
...

And to access the Ironic inspector instance running in the baremetal-operator pod:

export OS_CLOUD=metal3-inspector
baremetal introspection list
...

NOTE: If you use a provisioning network other than the default, you may need to modify the IP addresses used in

Cleanup

To clean up the ocp deployment run ./ocp_cleanup.sh
To clean up the dummy baremetal VMs and associated libvirt resources run ./host_cleanup.sh

e.g. to clean and re-install ocp run:

./ocp_cleanup.sh
rm -fr ocp
./06_create_cluster.sh

Or, you can run make clean which will run all of the cleanup steps.

Troubleshooting

If you're having trouble, try systemctl restart libvirtd.

You can use:

virsh console domain_name

Determining your filesystem type

If you're not sure what filesystem you have, try df -T and the second column will include the type.

Determining if your filesystem supports d_type

If the above command returns ext4 or btrfs, d_type is supported by default. If not, at the command line, try:

xfs_info /mount-point

If you see ftype=1 then you have d_type support.

Modifying cpu/memory/disk resources

The default cpu/memory/disk resources when using virtual machines are provided by the vm_setup_vars.yml file, which sets some dev-scripts variables that override the defaults in metal3-dev-env.

The VM resources can be overridden by setting the following environment variables in config_$USER.sh:

# Change VM resources for masters
#export MASTER_MEMORY=16384
#export MASTER_DISK=20
#export MASTER_VCPU=8

# Change VM resources for workers
#export WORKER_MEMORY=8192
#export WORKER_DISK=20
#export WORKER_VCPU=4

Testing custom container images

dev-scripts uses an openshift release image that contains references to openshift containers, any of these containers can be overridden by setting environment variables of the form <NAME>_LOCAL_IMAGE to build or use copy of container images locally e.g. to use a custom ironic container image and build a container image from a git repository for the machine-config-operator you could set

export IRONIC_LOCAL_IMAGE=quay.io/username/ironic
export MACHINE_CONFIG_OPERATOR_LOCAL_IMAGE=https://github.com/openshift/machine-config-operator

The value for <NAME> needs to match the name of the tags for images (found in the openshift release images in /release-manifests/image-references), converted to uppercase and with "-"'s converted to "_"'s.

Testing a custom machine-api-operator image with this deployment

The script run-custom-mao.sh allows the machine-api-operator pod to be re-deployed with a custom image.

For example: ./run-custom-mao.sh <path in quay.io for the custom MAO image with tag> <repo name> <branch name>

Custom MAO image name is a mandatory parameter but the others are optional with defaults.

Alternatively, all input parameters can be set via CUSTOM_MAO_IMAGE, REPO_NAME and MAO_BRANCH variables respectively, and run-custom-mao.sh can be run automatically if you set TEST_CUSTOM_MAO to true.

Testing a customizations to the deployed OS

It is possible to pass additional ignition configuration which will be merged with the installer generated files prior to deployment. This can be useful for debug/testing during development, and potentially also for configuration of networking or storage on baremetal nodes needed before cluster configuration starts (most machine configuration should use the machine-config-operator, but for changes required before that starts, ignition may be an option).

The following adds an additional file /etc/test as an example:

export IGNITION_EXTRA="ignition/file_example.ign"

Testing with extra workers

It is possible to specify additional workers, which are not used in the initial deployment, and can then later be used e.g to test scale-out. The default online status of the extra workers is true, but can be changed to false using EXTRA_WORKERS_ONLINE_STATUS.

export NUM_EXTRA_WORKERS=2
export EXTRA_WORKERS_ONLINE_STATUS=false

After initial deployment, a file containing the BareMetalHost manifests can be applied:

oc apply -f ocp/ostest/extra_host_manifests.yaml

Once completed, it's possibile to scale up the machineset to provision the extra workers. The following example shows how to add another worker to the current deployment:

$ oc get machineset -n openshift-machine-api
NAME              DESIRED   CURRENT   READY   AVAILABLE   AGE
ostest-worker-0   2         2         2       2           27h

$ oc scale machineset ostest-worker-0 --replicas=3 -n openshift-machine-api
machineset.machine.openshift.io/ostest-worker-0 scaled

Deploying dummy remote cluster nodes

It is possible to add remote site nodes along with their own L2 network. To do so, use the remote_nodes.sh script to create the definitions of VMs and their corresponding network. Additional configuration can be made by altering the environment variables within the script.

Create remote cluster VMs and their network using the remote_nodes.sh script. The script accepts two arguments, an action (defaults to setup or cleanup) and optional namespace argument. If omitted, the namespace will default to openshift-machine-api.

./remote_nodes.sh [action] [namespace]

In the setup action, the script creates an openshift namespace and applies the manifests generated by the playbook. It also enables the watchAllNamespaces in the provisioning CR and enables traffic between the extrenal network and the newly created network.

The cleanup action tears down the libvirt VM definitions and the namespace specific network. It also tears down the openshift namespace and cleans up the generated manifests.

./remote_nodes.sh setup mynamespace
./remote_nodes.sh cleanup mynamespace

How do I...

Use a custom installer?

Check out your custom installer at ~/go/src/github.com/openshift/installer and set export KNI_INSTALL_FROM_GIT=true, then run make to start the deployment. Note, if you dont have the golang compiler installed the deployment will timeout.

Deploy nodes with extra disks?

Open vm_setup_vars.yml and modify the flavor you want to have extra disks, set extradisks: true and modify extradisks_size if necessary.

dev-scripts's People

Contributors

Stargazers

Watchers

Forkers

derekhiggins dhellmann vrutkovs ukalifon honza elfosardo knowncitizen celebdor jtaleric bcrochet bfournie achuzhoy markmc mcornea imain dantrainor yprokule fabiand booxter jmolmo sadasu stbenjam russellb cybertron hardys rthallisey mlammon ktdreyer eparis jparrill netman2k zaneb mhenriks ffromani ppatierno ezio-auditore shtripat dustinblack aakarshg nixpanic mhrivnak djgalloway rmohr jtomasek liewegas slagle itzikb-redhat suppawar p3ck gobindadas johnbieren dtantsur rgolangh dry923 zhaozhanqi juliakreger rohantmp rollandf cynepco3hahue rdoxenham ptrnull shellymiron scollier rwsu slmingol djzager schseba ch-stark zshi-redhat akiselev1 andfasano cwilkers danielerez luis5tb stuggi yrobla maiqueb tsorya slashtpa kennethogden egorlu n1r1 juphoff danwinship zerodayz eslutsky iurygregory codificat sanchezl fultonj krsacme hroyrh avielyo10 pliurh markdgray yuvigold ardaguclu slintes zugwan mshitrit

dev-scripts's Issues

Update INT_IF

We are no longer using virbr0, so

https://github.com/metalkube/dev-scripts/blob/master/02_configure_host.sh#L54

Needs to be updated.

Cleanup script "Make clean" is not removing /opt/dev-scripts directory

[#Minor] Make Clean script is not removing /opt/dev-scripts directory .

Though this issue is minor one.

When I tried to re-run the make script with user "user2" my installation fail as user2 does not have permission to write files inside /opt/dev-script/ironic folder ,as this folder is owned by pervious user "bkr"

"make" failed with error message:

Writing manifest to image destination
Storing signatures
63c4f7bb2f7917d7397796f2e12dd36a8e6920a8c66716307f36afd55debb6ae
+ pushd /opt/dev-scripts/ironic/html/images
/opt/dev-scripts/ironic/html/images ~/dev-scripts
+ '[' '!' -e rhcos-410.8.20190410.0-compressed.qcow2 ']'
+ '[' '!' -e rhcos-410.8.20190410.0-compressed.qcow2.md5sum -o rhcos-410.8.20190410.0-compressed.qcow2 -nt rhcos-410.8.20190410.0-compressed.qcow2.md5sum ']'
+ ln -sf rhcos-410.8.20190410.0-compressed.qcow2 rhcos-ootpa-latest.qcow2
ln: cannot remove ‘rhcos-ootpa-latest.qcow2’: Permission denied
make: *** [ironic] Error 1

Content of this directory are:

ls -l /opt/dev-scripts/
total 8
drwxrwxr-x. 4 **bkr  bkr**    34 Apr  2 14:57 ironic
-rw-r--r--. 1 root root 3865 Apr 11 19:20 ironic_nodes.json
drwxr-xr-x. 2 root root  210 Apr 12 14:02 pool
-rw-r--r--. 1 root root  222 Apr  2 14:50 volume_pool.xml

Attaching the host_cleanup logs.
host_cleanup-2019-04-12-142513.log

Error while fetching the latest RHCOS images {malformed image url with extra escape charater }

While executing dev-scripts I observed Error during fetching of latest RHCOS images,

level=debug msg=Fetching RHCOS metadata from "https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/400.7.20190306.0/meta.json\""
level=fatal msg="failed to fetch Bootstrap Ignition Config: failed to fetch dependency of "Bootstrap Ignition Config": failed to fetch dependency of "Master Machines": failed to generate asset "Image": failed to fetch RHCOS metadata: incorrect HTTP response (503 Service Unavailable)"
make: *** [ocp_run] Error 1

looks like its due to extra escaping character in URL
"https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/400.7.20190306.0/meta.json\""

"Make clean" script failed task "TASK [teardown/user : Remove virt_power_key from remote authorized_keys]"

while executing 'make clean' one of the task is failing 'TASK [teardown/user : Remove virt_power_key from remote authorized_keys]" fatal: [localhost]: FAILED! => {
"msg": "could not locate file in lookup: /home/bkr/.quickstart/id_rsa_virt_power.pub"
}

also I have observed that
'make clean' does not delete the baremetal and provisioning bridge
due to which when I execute 'make' it fails with msg": "error creating bridge interface baremetal: File exists"
[bkr@vm220 dev-scripts]$ brctl show
bridge name bridge id STP enabled interfaces
baremetal 8000.000000000000 no
provisioning 8000.000000000000 no
virbr0 8000.5254002e4699 yes virbr0-nic

I need to manually remove the bridges to make run successful.

[bkr@vm220 dev-scripts]$ sudo ip link delete baremetal type bridge
[bkr@vm220 dev-scripts]$ sudo ip link delete provisioning type bridge

Move DNS VIP from libvirt's DNS to configuration

Since the cluster DNS is not something that is necessary from the outside, if we just put it in ignitions, it will take one thing of the prereqs list.

The path mentioned does not exist. source action fails

https://github.com/openshift-metalkube/dev-scripts/blob/ecf60de2ea62593bb1613c5ee1af2a888ed3d261/assets/files/etc/kubernetes/manifests/mdns-publisher.yaml#L47

[core@master-1 ~]$ source /etc/kubernetes/static-pod-resources/clusterrc
-bash: /etc/kubernetes/static-pod-resources/clusterrc: No such file or directory
[core@master-1 ~]$ source /etc/kubernetes/static-pod-resources/mdns/clusterrc
[core@master-1 ~]$

240 pods after installation in +-6 different states, 46GB host swapping

After installation that passed, there is too many pods it too many different states like (I particularly dislike "Terminating" and "OOMKilled" :)).

$ oc --config ocp/auth/kubeconfig get pods --all-namespaces
NAMESPACE                                               NAME                                                              READY   STATUS        RESTARTS   AGE
kube-system                                             etcd-member-master-0                                              1/1     Running       0          17h
kube-system                                             etcd-member-master-1                                              1/1     Running       0          17h
kube-system                                             etcd-member-master-2                                              1/1     Running       0          17h
openshift-apiserver-operator                            openshift-apiserver-operator-7b8c99bb8b-7dnlk                     1/1     Running       1          17h
openshift-apiserver                                     apiserver-jmjrs                                                   1/1     Running       0          17h
openshift-apiserver                                     apiserver-mzkx6                                                   1/1     Running       0          17h
openshift-apiserver                                     apiserver-p2c7x                                                   1/1     Running       0          17h
openshift-authentication-operator                       openshift-authentication-operator-bb8775754-9s258                 1/1     Running       0          17h
openshift-authentication                                openshift-authentication-677f9f678-2djk7                          1/1     Running       0          17h
openshift-authentication                                openshift-authentication-677f9f678-b6p4p                          1/1     Running       0          17h
openshift-cloud-credential-operator                     cloud-credential-operator-5cf49888b5-ccktg                        1/1     Running       0          17h
openshift-cluster-machine-approver                      machine-approver-6cf997dbcc-g25cx                                 1/1     Running       0          17h
openshift-cluster-node-tuning-operator                  cluster-node-tuning-operator-cb6dd6bcb-2c72p                      1/1     Running       0          17h
openshift-cluster-node-tuning-operator                  tuned-6djpt                                                       1/1     Running       0          17h
openshift-cluster-node-tuning-operator                  tuned-dx6sl                                                       1/1     Running       0          17h
openshift-cluster-node-tuning-operator                  tuned-zwkng                                                       1/1     Running       0          17h
openshift-cluster-samples-operator                      cluster-samples-operator-749c4b7dc7-zc5hz                         1/1     Running       0          17h
openshift-cluster-storage-operator                      cluster-storage-operator-7d7fcb7b56-xzhfm                         1/1     Running       0          17h
openshift-cluster-version                               cluster-version-operator-56c74d99b9-4qxhf                         1/1     Running       0          17h
openshift-console-operator                              console-operator-589ddb9775-tll9h                                 1/1     Running       0          17h
openshift-console                                       console-595b47967-nqqkz                                           1/1     Terminating   1          17h
openshift-console                                       console-6d8db4c7df-b5gwh                                          1/1     Running       1          17h
openshift-console                                       console-6d8db4c7df-dqnn7                                          0/1     Terminating   0          17h
openshift-console                                       console-6d8db4c7df-vlkrd                                          1/1     Running       0          17h
openshift-console                                       downloads-7748c8d856-2bhqw                                        1/1     Running       0          17h
openshift-console                                       downloads-7748c8d856-wd56w                                        1/1     Running       0          17h
openshift-controller-manager-operator                   openshift-controller-manager-operator-5f78855946-xxj86            1/1     Running       1          17h
openshift-controller-manager                            controller-manager-fj5hk                                          1/1     Running       0          17h
openshift-controller-manager                            controller-manager-m5vhr                                          1/1     Running       0          17h
openshift-controller-manager                            controller-manager-tpdkm                                          1/1     Running       0          17h
openshift-dns-operator                                  dns-operator-6f9d679b9c-fqj8s                                     1/1     Running       0          17h
openshift-dns                                           dns-default-9t87q                                                 2/2     Running       0          17h
openshift-dns                                           dns-default-d942l                                                 2/2     Running       0          17h
openshift-dns                                           dns-default-pb227                                                 2/2     Running       0          17h
openshift-image-registry                                cluster-image-registry-operator-86885f6c8d-4csbf                  1/1     Running       0          17h
openshift-image-registry                                cluster-image-registry-operator-86885f6c8d-vvpzn                  1/1     Terminating   0          17h
openshift-ingress-operator                              ingress-operator-7f8dcf7bb9-hvk9r                                 1/1     Running       0          17h
openshift-ingress                                       router-default-55f4fcfd66-774t6                                   1/1     Running       0          17h
openshift-ingress                                       router-default-55f4fcfd66-k4gz2                                   0/1     Pending       0          17h
openshift-ingress                                       router-default-55f4fcfd66-phlq7                                   1/1     Terminating   0          17h
openshift-ingress                                       router-default-55f4fcfd66-z555w                                   1/1     Running       0          17h
openshift-kube-apiserver-operator                       kube-apiserver-operator-6976f454fb-r5tkj                          1/1     Running       1          17h
openshift-kube-apiserver                                installer-1-master-0                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-1-master-1                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-2-master-0                                              0/1     OOMKilled     0          17h
openshift-kube-apiserver                                installer-2-master-1                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-2-master-2                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-3-master-0                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-3-master-1                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-4-master-0                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-4-master-1                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-4-master-2                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-5-master-0                                              0/1     OOMKilled     0          17h
openshift-kube-apiserver                                installer-6-master-0                                              0/1     Completed     0          17h
openshift-kube-apiserver                                installer-6-master-1                                              1/1     Running       0          17h
openshift-kube-apiserver                                kube-apiserver-master-0                                           2/2     Running       0          17h
openshift-kube-apiserver                                kube-apiserver-master-1                                           0/2     Init:0/2      0          17h
openshift-kube-apiserver                                kube-apiserver-master-2                                           2/2     Running       0          17h
openshift-kube-apiserver                                revision-pruner-1-master-0                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-1-master-1                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-2-master-0                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-2-master-1                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-2-master-2                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-3-master-0                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-3-master-1                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-4-master-0                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-4-master-1                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-4-master-2                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-5-master-0                                        0/1     Completed     0          17h
openshift-kube-apiserver                                revision-pruner-6-master-0                                        0/1     Completed     0          17h
openshift-kube-controller-manager-operator              kube-controller-manager-operator-7566b748b8-7wk4l                 1/1     Running       1          17h
openshift-kube-controller-manager                       installer-1-master-0                                              0/1     Completed     0          17h
openshift-kube-controller-manager                       installer-2-master-0                                              0/1     Completed     0          17h
openshift-kube-controller-manager                       installer-3-master-0                                              0/1     Completed     0          17h
openshift-kube-controller-manager                       installer-4-master-0                                              0/1     OOMKilled     0          17h
openshift-kube-controller-manager                       installer-5-master-0                                              0/1     OOMKilled     0          17h
openshift-kube-controller-manager                       installer-5-master-1                                              0/1     Completed     0          17h
openshift-kube-controller-manager                       installer-5-master-2                                              0/1     Completed     0          17h
openshift-kube-controller-manager                       installer-6-master-0                                              0/1     OOMKilled     0          17h
openshift-kube-controller-manager                       installer-6-master-1                                              0/1     Completed     0          17h
openshift-kube-controller-manager                       installer-6-master-2                                              0/1     OOMKilled     0          17h
openshift-kube-controller-manager                       kube-controller-manager-master-0                                  1/1     Running       2          17h
openshift-kube-controller-manager                       kube-controller-manager-master-1                                  1/1     Running       3          17h
openshift-kube-controller-manager                       kube-controller-manager-master-2                                  1/1     Running       0          17h
openshift-kube-controller-manager                       revision-pruner-1-master-0                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-2-master-0                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-3-master-0                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-4-master-0                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-5-master-0                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-5-master-1                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-5-master-2                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-6-master-0                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-6-master-1                                        0/1     Completed     0          17h
openshift-kube-controller-manager                       revision-pruner-6-master-2                                        0/1     Completed     0          17h
openshift-kube-scheduler-operator                       openshift-kube-scheduler-operator-cd7fd87ff-7jhlk                 1/1     Running       1          17h
openshift-kube-scheduler                                installer-1-master-0                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-1-master-1                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-1-master-2                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-2-master-0                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-2-master-1                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-3-master-0                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-3-master-1                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-3-master-2                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-4-master-0                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-4-master-1                                              0/1     Completed     0          17h
openshift-kube-scheduler                                installer-4-master-2                                              0/1     Completed     0          17h
openshift-kube-scheduler                                openshift-kube-scheduler-master-0                                 1/1     Running       0          17h
openshift-kube-scheduler                                openshift-kube-scheduler-master-1                                 1/1     Running       3          17h
openshift-kube-scheduler                                openshift-kube-scheduler-master-2                                 1/1     Running       3          17h
openshift-kube-scheduler                                revision-pruner-1-master-0                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-1-master-1                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-1-master-2                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-2-master-0                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-3-master-0                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-3-master-1                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-3-master-2                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-4-master-0                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-4-master-1                                        0/1     Completed     0          17h
openshift-kube-scheduler                                revision-pruner-4-master-2                                        0/1     Completed     0          17h
openshift-machine-api                                   cluster-autoscaler-operator-6dd695cc7d-v2rx6                      1/1     Running       1          17h
openshift-machine-api                                   clusterapi-manager-controllers-5cf47db544-vg7dr                   4/4     Running       0          17h
openshift-machine-api                                   machine-api-operator-bd5f59899-z46pj                              1/1     Running       1          17h
openshift-machine-api                                   metalkube-baremetal-operator-8f9c66c86-mpq5t                      0/1     Terminating   0          17h
openshift-machine-api                                   metalkube-baremetal-operator-8f9c66c86-scgzw                      1/1     Running       0          17h
openshift-machine-config-operator                       machine-config-operator-767fcfcf74-5w5mv                          1/1     Running       1          17h
openshift-marketplace                                   certified-operators-5997774686-hkg4h                              1/1     Running       0          17h
openshift-marketplace                                   community-operators-7fbb7c4588-bbklz                              1/1     Running       0          17h
openshift-marketplace                                   marketplace-operator-78d556c764-l2zgv                             1/1     Running       0          17h
openshift-marketplace                                   redhat-operators-6b4f995b78-5csgz                                 1/1     Running       0          17h
openshift-monitoring                                    alertmanager-main-0                                               0/3     Pending       0          17h
openshift-monitoring                                    alertmanager-main-1                                               3/3     Running       0          17h
openshift-monitoring                                    cluster-monitoring-operator-5c68c9d967-th5qk                      1/1     Running       0          17h
openshift-monitoring                                    grafana-74876d8b8d-4qwqf                                          0/2     Preempting    0          17h
openshift-monitoring                                    grafana-74876d8b8d-6h8zm                                          2/2     Running       0          17h
openshift-monitoring                                    kube-state-metrics-56d947b89d-dfk66                               3/3     Terminating   0          17h
openshift-monitoring                                    kube-state-metrics-56d947b89d-l6jlz                               3/3     Running       0          17h
openshift-monitoring                                    node-exporter-4jjkc                                               2/2     Running       0          17h
openshift-monitoring                                    node-exporter-jsrzm                                               2/2     Running       0          17h
openshift-monitoring                                    node-exporter-kv4bb                                               2/2     Running       0          17h
openshift-monitoring                                    prometheus-adapter-c76d6596f-82vjk                                1/1     Running       0          17h
openshift-monitoring                                    prometheus-adapter-c76d6596f-lcnfv                                1/1     Running       0          17h
openshift-monitoring                                    prometheus-adapter-c76d6596f-q5kwq                                1/1     Terminating   0          17h
openshift-monitoring                                    prometheus-k8s-0                                                  0/6     Pending       0          17h
openshift-monitoring                                    prometheus-k8s-1                                                  6/6     Terminating   1          17h
openshift-monitoring                                    prometheus-operator-6ff74c9976-rmqvq                              1/1     Running       0          17h
openshift-monitoring                                    prometheus-operator-6ff74c9976-x52g6                              1/1     Terminating   1          17h
openshift-monitoring                                    telemeter-client-6579d7cf8-q7mkk                                  3/3     Running       0          17h
openshift-multus                                        multus-8j6kb                                                      1/1     Running       0          17h
openshift-multus                                        multus-98kcc                                                      1/1     Running       0          17h
openshift-multus                                        multus-nzm94                                                      1/1     Running       0          17h
openshift-network-operator                              network-operator-56b8ccdcbb-xd9q5                                 1/1     Running       0          17h
openshift-operator-lifecycle-manager                    catalog-operator-6865f8bd88-hwqth                                 1/1     Running       0          17h
openshift-operator-lifecycle-manager                    olm-operator-fccbd8798-9x77k                                      1/1     Running       0          17h
openshift-operator-lifecycle-manager                    olm-operators-zhkk2                                               1/1     Running       0          17h
openshift-operator-lifecycle-manager                    packageserver-544b89c886-8ldtr                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-54569d7c8f-8j9sv                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-5486cc45fd-nqwvt                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-556df8d845-gcn7w                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-55b74df5bb-m2cqx                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-5647ff4c9b-z78cm                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-5668b7bd5-ssvrf                                     0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-5687977596-db522                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-56ccf9988b-f5rzq                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-576dddbcf6-f4fk8                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-58d68cfc8b-lpz5m                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-58f796cb5-zcfk8                                     0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-596f49576c-zj282                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-599489cfbf-5tng4                                    0/1     Terminating   0          13h
openshift-operator-lifecycle-manager                    packageserver-5bc68c459b-mlpsg                                    1/1     Running       0          13h
openshift-operator-lifecycle-manager                    packageserver-5bc68c459b-mqxvq                                    1/1     Running       0          13h
openshift-operator-lifecycle-manager                    packageserver-5c548f4fb5-h6tpf                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-5c6c4f598b-xh7hv                                    0/1     Terminating   0          13h
openshift-operator-lifecycle-manager                    packageserver-5cc868cff6-kz2gb                                    0/1     Terminating   0          17h
openshift-operator-lifecycle-manager                    packageserver-5cdfd89cf9-r7bqn                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-5df46449fb-58pdm                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-5dfcd9fb56-fgmlc                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-6484b96c7-lxbnq                                     1/1     Terminating   0          17h
openshift-operator-lifecycle-manager                    packageserver-64cbcff7cb-chrv9                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-64ff8c5b89-vnpj4                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-658d685bdf-t4j7v                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-65fb76c755-qgh47                                    0/1     Terminating   0          17h
openshift-operator-lifecycle-manager                    packageserver-66845969c7-qg7ks                                    0/1     Terminating   0          13h
openshift-operator-lifecycle-manager                    packageserver-669b56f59-xz8xb                                     0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-6798cc6565-49dlp                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-68475586b6-tx6wm                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-687768b948-q9l9t                                    0/1     Terminating   0          17h
openshift-operator-lifecycle-manager                    packageserver-68bdb47898-lh6bm                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-68cf9fd6bf-dlbn9                                    0/1     Terminating   0          17h
openshift-operator-lifecycle-manager                    packageserver-69488c7665-zbvnt                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-6988754f9c-t7rp2                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-699795f5c6-22lxq                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-6b59bdb4d7-2mgjc                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-6d6548469d-k7jth                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-6d8446c695-z5n4h                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-6d89c547c9-m8bk8                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-6f769fdc56-nqc8g                                    0/1     Terminating   0          13h
openshift-operator-lifecycle-manager                    packageserver-74cd8b5574-cg8gp                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-76648b7798-tpfsn                                    0/1     Terminating   0          17h
openshift-operator-lifecycle-manager                    packageserver-76d4b8dbc4-frmd5                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-77678bbdc-s6pd8                                     0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-7869f5c94f-vj6xn                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-78d7685975-jwg4k                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-79d9f749bd-28h7x                                    0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-79ff46ccc5-kwpwg                                    0/1     Terminating   0          13h
openshift-operator-lifecycle-manager                    packageserver-7b5bbc9db6-xhpdb                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-7bbf6b9f9c-klrst                                    0/1     Terminating   0          13h
openshift-operator-lifecycle-manager                    packageserver-7bdf97c5d5-nzbfz                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-7c58db8f46-kh9db                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-7c64c778c7-gzlsd                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-7cf476b5b4-xpnm5                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-7dcc548b54-vvt6j                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-845c6bbf44-ppzmv                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-84b6f97575-s962t                                    0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-84c9cbb78c-bjskn                                    0/1     Terminating   0          13h
openshift-operator-lifecycle-manager                    packageserver-85985d844c-prpfw                                    0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-867448d9-f8cfq                                      0/1     Terminating   0          16h
openshift-operator-lifecycle-manager                    packageserver-8b74bd6d8-f8vk9                                     0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-b48c575c8-257jl                                     0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-b89585667-srqhn                                     0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-c4dd86cc9-tppd6                                     0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-c78858db-blkc8                                      0/1     Terminating   0          14h
openshift-operator-lifecycle-manager                    packageserver-c9f694678-fn8cd                                     0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-cc84784b6-nqnwr                                     0/1     Terminating   0          15h
openshift-operator-lifecycle-manager                    packageserver-cf694654d-c724r                                     0/1     Terminating   0          17h
openshift-sdn                                           ovs-7p2qf                                                         1/1     Running       0          17h
openshift-sdn                                           ovs-k8qbx                                                         1/1     Running       0          17h
openshift-sdn                                           ovs-z7wwv                                                         1/1     Running       0          17h
openshift-sdn                                           sdn-7xlxr                                                         1/1     Running       0          17h
openshift-sdn                                           sdn-controller-jfmj5                                              1/1     Running       0          17h
openshift-sdn                                           sdn-controller-nfmkf                                              1/1     Running       0          17h
openshift-sdn                                           sdn-controller-sl2g4                                              1/1     Running       0          17h
openshift-sdn                                           sdn-sjskt                                                         1/1     Running       2          17h
openshift-sdn                                           sdn-t8bc8                                                         1/1     Running       1          17h
openshift-service-ca-operator                           openshift-service-ca-operator-7957dd76c9-p7425                    1/1     Running       0          17h
openshift-service-ca                                    apiservice-cabundle-injector-6589bb696b-phqmw                     1/1     Running       0          17h
openshift-service-ca                                    configmap-cabundle-injector-787f7f684b-z78kn                      1/1     Running       0          17h
openshift-service-ca                                    service-serving-cert-signer-58f9487f4f-5stwv                      1/1     Running       0          17h
openshift-service-catalog-apiserver-operator            openshift-service-catalog-apiserver-operator-84d5b596c7-wghlc     1/1     Running       1          17h
openshift-service-catalog-controller-manager-operator   openshift-service-catalog-controller-manager-operator-77cf59cj8   1/1     Running       1          17h

default values for cpus and memory

currently, nodes are deployed with 2vcpus and 6Gb of ram, which causes pods to crashloop or not launch after initial bootstrapping.
we should increase those values to maybe 8 cpus and at least 8gb of ram

ironic-python-agent.tar: tar: This does not look like a tar archive

make fails with following error for me:

$ make
[...]
Logging to ./logs/04_setup_ironic-2019-03-24-174739.log
+++ exec
++++ tee ./logs/04_setup_ironic-2019-03-24-174739.log
++ '[' '!' -f redhat-coreos-maipo-47.284-qemu.qcow2 ']'
++ mkdir -p /opt/dev-scripts/ironic/html/images
++ '[' -d images ']'
++ pushd /opt/dev-scripts/ironic/html/images
/opt/dev-scripts/ironic/html/images ~/dev-scripts
++ '[' '!' -f redhat-coreos-maipo-47.284-openstack.qcow2 ']'
++ curl --insecure --compressed -L -o redhat-coreos-maipo-47.284-openstack.qcow2 https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo//47.284/redhat-coreos-maipo-47.284-openstack.qcow2.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   161  100   161    0     0    325      0 --:--:-- --:--:-- --:--:--   326
100  662M  100  662M    0     0  4257k      0  0:02:39  0:02:39 --:--:-- 8318k
++ '[' '!' -f ironic-python-agent.initramfs ']'
++ tar -xf -
++ curl --insecure --compressed -L https://images.rdoproject.org/master/rdo_trunk/54c5a6de8ce5b9cfae83632a7d81000721d56071_786d88d2/ironic-python-agent.tar
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   288  100   288    0     0    539      0 --:--:-- --:--:-- --:--:--   539
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
make: *** [ironic] Error 2

02_configure_host.sh fails due to left over image pool

After running make clean I ran through the scripts again, and ran into

+ sudo usermod -a -G libvirt test
+ virsh pool-uuid default
+ virsh pool-define /dev/stdin
error: Failed to define pool from /dev/stdin
error: operation failed: Storage source conflict with pool: 'images'

To get around this, I had to destroy and undefine the left over images pool.

haproxy.sh: unable to start container "api-haproxy"

api-haproxy.service unit enters infinite restart loop

[root@master-2 ~]# systemctl status api-haproxy.service
● api-haproxy.service - Load Balance OpenShift API using HAProxy
   Loaded: loaded (/etc/systemd/system/api-haproxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2019-04-10 16:11:02 UTC; 1ms ago
 Main PID: 19361 (bash)
    Tasks: 0 (limit: 26213)
   Memory: 0B
      CPU: 189us
   CGroup: /system.slice/api-haproxy.service
           └─19361 bash /usr/local/bin/haproxy.sh

Apr 10 16:11:02 master-2 systemd[1]: Started Load Balance OpenShift API using HAProxy.
[root@master-2 ~]# systemctl status api-haproxy.service
● api-haproxy.service - Load Balance OpenShift API using HAProxy
   Loaded: loaded (/etc/systemd/system/api-haproxy.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Wed 2019-04-10 16:11:03 UTC; 427ms ago
  Process: 19361 ExecStart=/usr/local/bin/haproxy.sh (code=exited, status=125)
 Main PID: 19361 (code=exited, status=125)
      CPU: 624ms

Apr 10 16:11:03 master-2 systemd[1]: api-haproxy.service: Consumed 624ms CPU time

From journal:

Apr 10 16:11:02 master-2 sudo[19412]: root : TTY=unknown ; PWD=/etc/haproxy ; USER=root ; COMMAND=/bin/mkdir --parents /etc/haproxy
Apr 10 16:11:02 master-2 sudo[19412]: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 10 16:11:02 master-2 sudo[19412]: pam_unix(sudo:session): session closed for user root
Apr 10 16:11:02 master-2 sudo[19427]: root : TTY=unknown ; PWD=/etc/haproxy ; USER=root ; COMMAND=/bin/tee /etc/haproxy/haproxy.cfg
Apr 10 16:11:02 master-2 sudo[19427]: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 10 16:11:02 master-2 haproxy.sh[19361]: defaults
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     mode                    tcp
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     log                     global
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     option                  dontlognull
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     retries                 3
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     timeout http-request    10s
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     timeout queue           1m
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     timeout connect 10s
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     timeout client 86400s
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     timeout server 86400s
Apr 10 16:11:02 master-2 haproxy.sh[19361]:     timeout tunnel 86400s
Apr 10 16:11:02 master-2 haproxy.sh[19361]: frontend  main
Apr 10 16:11:02 master-2 haproxy.sh[19361]:   bind :7443
Apr 10 16:11:02 master-2 haproxy.sh[19361]:   default_backend masters
Apr 10 16:11:02 master-2 haproxy.sh[19361]: backend masters
Apr 10 16:11:02 master-2 haproxy.sh[19361]:    option httpchk GET /healthz HTTP/1.0
Apr 10 16:11:02 master-2 haproxy.sh[19361]:    option log-health-checks
Apr 10 16:11:02 master-2 haproxy.sh[19361]:    balance     roundrobin
Apr 10 16:11:02 master-2 sudo[19427]: pam_unix(sudo:session): session closed for user root
Apr 10 16:11:03 master-2 sudo[19511]: root : TTY=unknown ; PWD=/etc/haproxy ; USER=root ; COMMAND=/bin/podman ps -a --format {{.Names}}
Apr 10 16:11:03 master-2 sudo[19511]: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 10 16:11:03 master-2 sudo[19511]: pam_unix(sudo:session): session closed for user root
Apr 10 16:11:03 master-2 sudo[19557]: root : TTY=unknown ; PWD=/etc/haproxy ; USER=root ; COMMAND=/bin/podman start api-haproxy
Apr 10 16:11:03 master-2 sudo[19557]: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 10 16:11:03 master-2 haproxy.sh[19361]: unable to start container "api-haproxy": unable to determine if "/var/lib/containers/storage/overlay-containers/3da81e2bb93889c088b4a85a9e862693d059077d8e992089ddf7393f3c49df42/userdata/shm" is mounted: failed to canonicalise path for /var/lib/containers/storage/overlay-containers/3da81e2bb93889c088b4a85a9e862693d059077d8e992089ddf7393f3c49df42/userdata/shm: lstat /var/lib/containers/storage/overlay-containers/3da81e2bb93889c088b4a85a9e862693d059077d8e992089ddf7393f3c49df42: no such file or directory
Apr 10 16:11:03 master-2 sudo[19557]: pam_unix(sudo:session): session closed for user root
Apr 10 16:11:03 master-2 systemd[1]: api-haproxy.service: Main process exited, code=exited, status=125/n/a
Apr 10 16:11:03 master-2 systemd[1]: api-haproxy.service: Failed with result 'exit-code'.
Apr 10 16:11:03 master-2 systemd[1]: api-haproxy.service: Consumed 624ms CPU time
Apr 10 16:11:08 master-2 systemd[1]: api-haproxy.service: Service RestartSec=5s expired, scheduling restart.
Apr 10 16:11:08 master-2 systemd[1]: api-haproxy.service: Scheduled restart job, restart counter is at 575.
Apr 10 16:11:08 master-2 systemd[1]: Stopped Load Balance OpenShift API using HAProxy.
Apr 10 16:11:08 master-2 systemd[1]: api-haproxy.service: Consumed 624ms CPU time
Apr 10 16:11:08 master-2 systemd[1]: Started Load Balance OpenShift API using HAProxy.
Apr 10 16:11:09 master-2 sudo[19792]: root : TTY=unknown ; PWD=/etc/haproxy ; USER=root ; COMMAND=/bin/mkdir --parents /etc/haproxy
Apr 10 16:11:09 master-2 sudo[19792]: pam_unix(sudo:session): session opened for user root by (uid=0)
Apr 10 16:11:09 master-2 sudo[19792]: pam_unix(sudo:session): session closed for user root
Apr 10 16:11:09 master-2 sudo[19807]: root : TTY=unknown ; PWD=/etc/haproxy ; USER=root ; COMMAND=/bin/tee /etc/haproxy/haproxy.cfg
Apr 10 16:11:09 master-2 sudo[19807]: pam_unix(sudo:session): session opened for user root by (uid=0)

baremetal-operator deployed by dev-scripts can't reach Ironic

When we deploy the baremetal-operator using dev-scripts, it needs to be configured to talk to Ironic on the provisioning host until we have the baremetal-operator updated to run Ironic itself inside the cluster.

Right now it assumes it can reach Ironic on localhost, which fails when running the BMO as a pod.

{"level":"error","ts":1554831354.3558333,"logger":"kubebuilder.controller","msg":"Reconciler error","controller":"metalkube-baremetalhost-controller","request":"openshift-machine-api/openshift-worker-0","error":"phase validate access failed: failed to validate BMC access: failed to find existing host: failed to find node by name openshift-worker-0: Get http://localhost:6385/v1/nodes/openshift-worker-0: dial tcp [::1]:6385: connect: connection refused","errorVerbose":"Get http://localhost:6385/v1/nodes/openshift-worker-0: dial tcp [::1]:6385: connect: connection refused\nfailed to find node by name openshift-worker-0\ngithub.com/metalkube/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).findExistingHost\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/provisioner/ironic/ironic.go:145\ngithub.com/metalkube/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).ValidateManagementAccess\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/provisioner/ironic/ironic.go:161\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).phaseValidateAccess\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:403\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).(github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.phaseValidateAccess)-fm\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:268\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:283\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361\nfailed to find existing host\ngithub.com/metalkube/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).ValidateManagementAccess\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/provisioner/ironic/ironic.go:163\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).phaseValidateAccess\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:403\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).(github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.phaseValidateAccess)-fm\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:268\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:283\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361\nfailed to validate BMC access\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).phaseValidateAccess\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:405\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).(github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.phaseValidateAccess)-fm\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:268\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:283\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361\nphase validate access failed\ngithub.com/metalkube/baremetal-operator/pkg/controller/baremetalhost.(*ReconcileBareMetalHost).Reconcile\n\t/go/src/github.com/metalkube/baremetal-operator/pkg/controller/baremetalhost/baremetalhost_controller.go:285\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:213\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:2361","stacktrace":"github.com/metalkube/baremetal-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:215\ngithub.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:158\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134\ngithub.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/metalkube/baremetal-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}

`/opt/dev-scripts` not created automatically

On a clean CoreOS 7.6 system, following README.md instructions, make fails because of missing directory:

[tester@... dev-scripts]$ make
./01_install_requirements.sh
+ source common.sh
++++ dirname common.sh
+++ cd .
+++ pwd
++ SCRIPTDIR=/home/tester/dev-scripts
+++ whoami
++ USER=tester
++ '[' -z '' ']'
++ '[' -f /home/tester/dev-scripts/config_tester.sh ']'
++ echo 'Using CONFIG /home/tester/dev-scripts/config_tester.sh'
Using CONFIG /home/tester/dev-scripts/config_tester.sh
++ CONFIG=/home/tester/dev-scripts/config_tester.sh
++ source /home/tester/dev-scripts/config_tester.sh
+++ set +x
++ ADDN_DNS=
++ EXT_IF=
++ PRO_IF=
++ INT_IF=
++ ROOT_DISK=/dev/vda
++ FILESYSTEM=/
++ WORKING_DIR=/opt/dev-scripts
++ NODES_FILE=/opt/dev-scripts/ironic_nodes.json
++ NODES_PLATFORM=libvirt
++ MASTER_NODES_FILE=ocp/master_nodes.json
++ export RHCOS_IMAGE_URL=https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/
++ RHCOS_IMAGE_URL=https://releases-rhcos.svc.ci.openshift.org/storage/releases/maipo/
++ export RHCOS_IMAGE_VERSION=47.284
++ RHCOS_IMAGE_VERSION=47.284
++ export RHCOS_IMAGE_NAME=redhat-coreos-maipo-47.284
++ RHCOS_IMAGE_NAME=redhat-coreos-maipo-47.284
++ export RHCOS_IMAGE_FILENAME=redhat-coreos-maipo-47.284-qemu.qcow2
++ RHCOS_IMAGE_FILENAME=redhat-coreos-maipo-47.284-qemu.qcow2
++ export RHCOS_IMAGE_FILENAME_OPENSTACK=redhat-coreos-maipo-47.284-openstack.qcow2
++ RHCOS_IMAGE_FILENAME_OPENSTACK=redhat-coreos-maipo-47.284-openstack.qcow2
++ export RHCOS_IMAGE_FILENAME_DUALDHCP=redhat-coreos-maipo-47.284-dualdhcp.qcow2
++ RHCOS_IMAGE_FILENAME_DUALDHCP=redhat-coreos-maipo-47.284-dualdhcp.qcow2
++ export RHCOS_IMAGE_FILENAME_LATEST=redhat-coreos-maipo-latest.qcow2
++ RHCOS_IMAGE_FILENAME_LATEST=redhat-coreos-maipo-latest.qcow2
++ export IRONIC_IMAGE=quay.io/metalkube/metalkube-ironic
++ IRONIC_IMAGE=quay.io/metalkube/metalkube-ironic
++ export IRONIC_INSPECTOR_IMAGE=quay.io/metalkube/metalkube-ironic-inspector
++ IRONIC_INSPECTOR_IMAGE=quay.io/metalkube/metalkube-ironic-inspector
++ export IRONIC_DATA_DIR=/opt/dev-scripts/ironic
++ IRONIC_DATA_DIR=/opt/dev-scripts/ironic
++ export KUBECONFIG=/home/tester/dev-scripts/ocp/auth/kubeconfig
++ KUBECONFIG=/home/tester/dev-scripts/ocp/auth/kubeconfig
++ export 'SSH=ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5'
++ SSH='ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5'
++ export LIBVIRT_DEFAULT_URI=qemu:///system
++ LIBVIRT_DEFAULT_URI=qemu:///system
++ '[' tester '!=' root -a '' == /run/user/0 ']'
++ [[ 1001 -eq 0 ]]
++ sudo -n uptime
+++ awk -F= '/^ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
++ [[ centos != \c\e\n\t\o\s ]]
+++ awk -F= '/^VERSION_ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
++ [[ 7 -ne 7 ]]
+++ df / --output=fstype
+++ grep -v Type
++ FSTYPE=xfs
++ case ${FSTYPE} in
+++ xfs_info /
+++ grep -q ftype=1
++ [[ -n '' ]]
++ '[' 2710 = 0 ']'
++ '[' '!' -d /opt/dev-scripts ']'
++ echo 'Creating Working Dir'
Creating Working Dir
++ mkdir /opt/dev-scripts
mkdir: cannot create directory ‘/opt/dev-scripts’: Permission denied
make: *** [requirements] Error 1

When I create it manually and rerun make, it proceeds further.

Nodes are getting NotReady status

After 24 of a runnign cluster, nodes (which are tainted masters) getting into NotReady status.
SSH in to the node and systemctl status kubelet showing this:
kubelet.service - Kubernetes Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2019-04-14 09:33:24 UTC; 22h ago
Process: 1409 ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state (code=exited, status=0/SUCCESS)
Process: 1407 ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests (code=exited, status=0/SUCCESS)
Main PID: 1411 (hyperkube)
Tasks: 57 (limit: 26213)
Memory: 246.6M
CPU: 4h 37min 3.310s
CGroup: /system.slice/kubelet.service
└─1411 /usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --rotate-certificates --kubeconfig=/var/lib/kubelet/kubeconfig --container>

Apr 15 08:18:24 master-1 hyperkube[1411]: E0415 08:18:24.206447 1411 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Unauthorized
Apr 15 08:18:24 master-1 hyperkube[1411]: E0415 08:18:24.404757 1411 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized
Apr 15 08:18:24 master-1 hyperkube[1411]: W0415 08:18:24.701927 1411 status_manager.go:485] Failed to get status for pod "openshift-authentication-operator-768788f4f7-t85x6_openshift-authentication-operat>
Apr 15 08:18:25 master-1 hyperkube[1411]: E0415 08:18:25.001594 1411 webhook.go:106] Failed to make webhook authenticator request: Unauthorized
Apr 15 08:18:25 master-1 hyperkube[1411]: E0415 08:18:25.001634 1411 server.go:245] Unable to authenticate the request due to an error: Unauthorized
Apr 15 08:18:25 master-1 hyperkube[1411]: E0415 08:18:25.202356 1411 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Unauthorized
Apr 15 08:18:25 master-1 hyperkube[1411]: E0415 08:18:25.402716 1411 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Unauthorized
Apr 15 08:18:25 master-1 hyperkube[1411]: E0415 08:18:25.602266 1411 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unauthorized
Apr 15 08:18:25 master-1 hyperkube[1411]: E0415 08:18:25.802347 1411 webhook.go:106] Failed to make webhook authenticator request: Unauthorized
Apr 15 08:18:25 master-1 hyperkube[1411]: E0415 08:18:25.802576 1411 server.go:245] Unable to authenticate the request due to an error: Unauthorized
~

Errors with attempting to use tripleo_repos on RHEL

tripleo-repos warnings

cd
'[' '!' -d tripleo-repos ']'
git clone https://git.openstack.org/openstack/tripleo-repos
Cloning into 'tripleo-repos'...
pushd tripleo-repos
~/tripleo-repos ~
sudo python setup.py install
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'long_description_content_type'
warnings.warn(msg)
/usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'project_urls'
warnings.warn(msg)

Attempted a baremetal deploy using some qct D51B-1u servers.

I was able to install RHEL 7.6 one these servers after getting the failures attempt to install CentOS on this server fails.

errors from ./01_install_requirements.sh

tripleo-repos states rhel7.6 is unsupported

sudo tripleo-repos current-tripleo
WARNING: Unsupported platform 'rhel7.6' detected by tripleo-repos, centos7 will be used unless you use CLI param to change it.
WARNING: --centos-mirror was deprecated in favour of --mirror
Loaded plugins: enabled_repos_upload, package_upload, product-id, search-
: disabled-repos, subscription-manager
No package yum-plugin-priorities available.
Error: Nothing to do
Uploading Enabled Repositories Report
Loaded plugins: product-id, subscription-manager
Loaded plugins: product-id, subscription-manager
ERROR: Failed to install yum-plugin-priorities
['yum', 'install', '-y', 'yum-plugin-priorities']
None
Traceback (most recent call last):
File "/bin/tripleo-repos", line 10, in
sys.exit(main())
File "/usr/lib/python2.7/site-packages/tripleo_repos/main.py", line 347, in main
_install_priorities()
File "/usr/lib/python2.7/site-packages/tripleo_repos/main.py", line 243, in _install_priorities
'yum-plugin-priorities'])
File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['yum', 'install', '-y', 'yum-plugin-priorities']' returned non-zero exit status 1
make: *** [requirements] Error 1

Can't get past 06

Ironic timeout/failure/error on ./06_create_cluster.sh (by performing make).

level=debug msg="2019/04/09 13:03:33 [ERROR] root.masters: eval: *terraform.EvalApplyPost, err: 1 error occurred:"
level=debug msg="\t* ironic_node_v1.openshift-master-1: Internal Server Error"
level=debug
level=debug msg="2019/04/09 13:03:33 [ERROR] root.masters: eval: *terraform.EvalSequence, err: 1 error occurred:"
level=debug msg="\t* ironic_node_v1.openshift-master-1: Internal Server Error"
level=debug
level=debug msg="2019/04/09 13:03:33 [ERROR] root.masters: eval: *terraform.EvalApplyPost, err: 1 error occurred:"
level=debug msg="\t* ironic_node_v1.openshift-master-2: Internal Server Error"
level=debug
level=debug msg="2019/04/09 13:03:33 [ERROR] root.masters: eval: *terraform.EvalSequence, err: 1 error occurred:"
level=debug msg="\t* ironic_node_v1.openshift-master-2: Internal Server Error"
level=debug
level=debug msg="2019/04/09 13:03:33 [ERROR] root.masters: eval: *terraform.EvalApplyPost, err: 1 error occurred:"
level=debug msg="\t* ironic_node_v1.openshift-master-0: Internal Server Error"
level=debug
level=debug msg="2019/04/09 13:03:33 [ERROR] root.masters: eval: *terraform.EvalSequence, err: 1 error occurred:"
level=debug msg="\t* ironic_node_v1.openshift-master-0: Internal Server Error"
level=debug
level=error
level=error msg="Error: Error applying plan:"
level=error
level=error msg="3 errors occurred:"
level=error msg="\t* module.masters.ironic_node_v1.openshift-master-1: 1 error occurred:"
level=error msg="\t* ironic_node_v1.openshift-master-1: Internal Server Error"
level=error
level=error
level=error msg="\t* module.masters.ironic_node_v1.openshift-master-2: 1 error occurred:"
level=error msg="\t* ironic_node_v1.openshift-master-2: Internal Server Error"
level=error
level=error
level=error msg="\t* module.masters.ironic_node_v1.openshift-master-0: 1 error occurred:"
level=error msg="\t* ironic_node_v1.openshift-master-0: Internal Server Error"
level=error
level=error
level=error
level=error
level=error
level=error msg="Terraform does not automatically rollback in the face of errors."
level=error msg="Instead, your Terraform state file has been partially updated with"
level=error msg="any resources that successfully completed. Please address the error"
level=error msg="above and apply again to incrementally change your infrastructure."
level=error
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"

Full Log:

http://ix.io/1FM4

Machine:

32GB RAM
856GB Hard Drive Free

Deployment times out: Still waiting for the Kubernetes API: Get https://api.ostest.test.metalkube.org:6443/version?timeout=32s: dial tcp 192.168.123.5:6443: connect: connection refused

Logs of the discovery container on master nodes reveal that querying the DNS VIP cannot resolve _etcd-server-ssl._tcp.ostest.test.metalkube.org :

[root@master-0 core]# crictl logs -f $(crictl ps | awk '/discovery/ {print $1}')
I0409 18:33:02.044434       1 run.go:46] Version: 4.0.0-alpha.0-162-g6276485e-dirty
E0409 18:33:02.049842       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:33:02.052979       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:33:32.056696       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:33:32.059669       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:34:02.056626       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:34:02.059876       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:34:32.056434       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:34:32.059145       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:35:02.056681       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:35:02.059569       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:35:32.057436       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:35:32.061139       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:36:02.057464       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:36:02.060341       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:36:32.055545       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:36:32.058743       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:37:02.056657       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such host
E0409 18:37:02.059077       1 run.go:63] error looking up self: lookup _etcd-server-ssl._tcp.ostest.test.metalkube.org on 192.168.123.6:53: no such ho

'jq' not installed

Attempted a baremetal deploy using some qct D51B-1u servers.

I was able to install RHEL 7.6 one these servers after getting the failures attempt to install CentOS on this server fails.

errors from ./01_install_requirements.sh

jq used but not installed - several failures 1 example
+++ jq -r '.builds[0]'
common.sh: line 42: jq: command not found
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 6067 100 6067 0 0 10056 0 --:--:-- --:--:-- --:--:-- 10044
curl: (23) Failed writing body (0 != 6067)

Installation fails with a machine-config timeout

Using:

commit 489d790e1819e2e1ce489bbac87a4d5bbd5ac47e
Merge: c9bbf54 98ba9fb
Author: Steven Hardy <[email protected]>
Date:   Wed Apr 10 15:22:10 2019 +0100

    Merge pull request #312 from zaneb/fix_certs
    
    Run fix_certs script automatically

Virtual environment:

"level=fatal msg=\"failed to initialize the cluster: Cluster operator machine-config is reporting a failure: Failed to resync 4.0.0-0.alpha-2019-04-10-154442 because: [timed out waiting for the condition during waitForControllerConfigToBeCompleted: controllerconfig is not completed: ControllerConfig has not completed: completed(false) running(false) failing(true), pool master has not progressed to latest configuration: configuration for pool master is empty, retrying]: timed out waiting for the condition\"

Nodes not hosting API VIP IP fail `dial tcp 192.168.123.5:6443: connect: invalid argument`

After some time of cluster being up it starts to fail with errors like:

Apr 12 09:48:31 master-1 hyperkube[23740]: E0412 09:48:31.776526   23740 kubelet.go:2273] node "master-1" not found
Apr 12 09:48:31 master-1 hyperkube[23740]: E0412 09:48:31.843612   23740 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://api.ostest.test.metalkube.org:6443/ap
i/v1/nodes?fieldSelector=metadata.name%3Dmaster-1&limit=500&resourceVersion=0: dial tcp 192.168.123.5:6443: connect: invalid argument
Apr 12 09:48:31 master-1 hyperkube[23740]: E0412 09:48:31.844422   23740 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://api.ostest.test.metalkube.org:
6443/api/v1/pods?fieldSelector=spec.nodeName%3Dmaster-1&limit=500&resourceVersion=0: dial tcp 192.168.123.5:6443: connect: invalid argument
Apr 12 09:48:31 master-1 hyperkube[23740]: E0412 09:48:31.876801   23740 kubelet.go:2273] node "master-1" not found
Apr 12 09:48:31 master-1 hyperkube[23740]: E0412 09:48:31.977082   23740 kubelet.go:2273] node "master-1" not found
Apr 12 09:48:32 master-1 hyperkube[23740]: E0412 09:48:32.054813   23740 reflector.go:125] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://api.ostest.test.metalkube.org:6443
/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.123.5:6443: connect: invalid argument

Attempt to ping this ip from any other master nodes(except the one that hosts the ip) fails:

[root@master-1 ~]# ip a | grep 192.168.123.5
[root@master-1 ~]# ping 192.168.123.5
connect: Invalid argument
[root@master-1 ~]# curl -4kvL 192.168.123.5:6443
* Rebuilt URL to: 192.168.123.5:6443/
*   Trying 192.168.123.5...
* TCP_NODELAY set
* Immediate connect fail for 192.168.123.5: Invalid argument
* Closing connection 0
curl: (7) Couldn't connect to server
[root@master-1 ~]# curl -4kvL api.ostest.test.metalkube.org:6443                                                                                                                                                   
* Rebuilt URL to: api.ostest.test.metalkube.org:6443/
*   Trying 192.168.123.5...
* TCP_NODELAY set
* Immediate connect fail for 192.168.123.5: Invalid argument
* Closing connection 0
curl: (7) Couldn't connect to server
[root@master-1 ~]#

[root@master-2 ~]# ip a | grep 192.168.123.5
[root@master-2 ~]# ping 192.168.123.5
connect: Invalid argument
[root@master-2 ~]# curl -4kvL 192.168.123.5:6443
* Rebuilt URL to: 192.168.123.5:6443/
*   Trying 192.168.123.5...
* TCP_NODELAY set
* Immediate connect fail for 192.168.123.5: Invalid argument
* Closing connection 0
curl: (7) Couldn't connect to server
[root@master-2 ~]# curl -4kvL api.ostest.test.metalkube.org:6443                                                                                                                                                   
* Rebuilt URL to: api.ostest.test.metalkube.org:6443/
*   Trying 192.168.123.5...
* TCP_NODELAY set
* Immediate connect fail for 192.168.123.5: Invalid argument
* Closing connection 0
curl: (7) Couldn't connect to server
[root@master-2 ~]#

and from master-0 that handles the API VIP IP

[root@master-0 ~]# ip a |grep 192.168.123.5
    inet 192.168.123.5/32 scope global ens4
[root@master-0 ~]# curl -4klv api.ostest.test.metalkube.org:6443
* Rebuilt URL to: api.ostest.test.metalkube.org:6443/
*   Trying 192.168.123.5...
* TCP_NODELAY set
* Connected to api.ostest.test.metalkube.org (192.168.123.5) port 6443 (#0)
> GET / HTTP/1.1
> Host: api.ostest.test.metalkube.org:6443
> User-Agent: curl/7.61.1
> Accept: */*
> 
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
* Failed writing body (0 != 7)
* Closing connection 0
[root@master-0 ~]# curl -4klv 192.168.123.5:6443
* Rebuilt URL to: 192.168.123.5:6443/
*   Trying 192.168.123.5...
* TCP_NODELAY set
* Connected to 192.168.123.5 (192.168.123.5) port 6443 (#0)
> GET / HTTP/1.1
> Host: 192.168.123.5:6443
> User-Agent: curl/7.61.1
> Accept: */*
> 
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
* Failed writing body (0 != 7)
* Closing connection 0
[root@master-0 ~]#

10_deploy_rook.sh exits with error: no matching resources found

[centos@provisionhost-0 dev-scripts]$ ./10_deploy_rook.sh | awk '{ print strftime("%Y-%m-%d %H:%M:%S |"), $0; fflush(); }'
+ source logging.sh
+++ dirname ./10_deploy_rook.sh
++ LOGDIR=./logs
++ '[' '!' -d ./logs ']'
+++ basename ./10_deploy_rook.sh .sh
+++ date +%F-%H%M%S
++ LOGFILE=./logs/10_deploy_rook-2019-04-13-175424.log
++ echo 'Logging to ./logs/10_deploy_rook-2019-04-13-175424.log'
++ exec
2019-04-13 17:54:24 | Logging to ./logs/10_deploy_rook-2019-04-13-175424.log
+++ tee ./logs/10_deploy_rook-2019-04-13-175424.log
2019-04-13 17:54:24 | + source common.sh
2019-04-13 17:54:24 | ++++ dirname common.sh
2019-04-13 17:54:24 | +++ cd .
2019-04-13 17:54:24 | +++ pwd
2019-04-13 17:54:24 | ++ SCRIPTDIR=/home/centos/dev-scripts
2019-04-13 17:54:24 | +++ whoami
2019-04-13 17:54:24 | ++ USER=centos
2019-04-13 17:54:24 | ++ '[' -z '' ']'
2019-04-13 17:54:24 | ++ '[' -f /home/centos/dev-scripts/config_centos.sh ']'
2019-04-13 17:54:24 | ++ echo 'Using CONFIG /home/centos/dev-scripts/config_centos.sh'
2019-04-13 17:54:24 | Using CONFIG /home/centos/dev-scripts/config_centos.sh
2019-04-13 17:54:24 | ++ CONFIG=/home/centos/dev-scripts/config_centos.sh
2019-04-13 17:54:24 | ++ source /home/centos/dev-scripts/config_centos.sh
2019-04-13 17:54:24 | +++ set +x
2019-04-13 17:54:24 | +++ BOOTSTRAP_SSH_READY=2500
2019-04-13 17:54:24 | +++ NODES_PLATFORM=baremetal
2019-04-13 17:54:24 | +++ INT_IF=eth0
2019-04-13 17:54:24 | +++ PRO_IF=eth1
2019-04-13 17:54:24 | +++ ROOT_DISK=/dev/vda
2019-04-13 17:54:24 | +++ NODES_FILE=/home/centos/instackenv.json
2019-04-13 17:54:24 | +++ MANAGE_BR_BRIDGE=n
2019-04-13 17:54:24 | ++ ADDN_DNS=
2019-04-13 17:54:24 | ++ EXT_IF=
2019-04-13 17:54:24 | ++ PRO_IF=eth1
2019-04-13 17:54:24 | ++ MANAGE_BR_BRIDGE=n
2019-04-13 17:54:24 | ++ MANAGE_PRO_BRIDGE=y
2019-04-13 17:54:24 | ++ MANAGE_INT_BRIDGE=y
2019-04-13 17:54:24 | ++ INT_IF=eth0
2019-04-13 17:54:24 | ++ ROOT_DISK=/dev/vda
2019-04-13 17:54:24 | ++ KAFKA_NAMESPACE=strimzi
2019-04-13 17:54:24 | ++ KAFKA_CLUSTERNAME=strimzi
2019-04-13 17:54:24 | ++ KAFKA_PVC_SIZE=10
2019-04-13 17:54:24 | ++ KAFKA_PRODUCER_TIMER=100
2019-04-13 17:54:24 | ++ KAFKA_PRODUCER_TOPIC=strimzi-topic
2019-04-13 17:54:24 | ++ FILESYSTEM=/
2019-04-13 17:54:24 | ++ WORKING_DIR=/opt/dev-scripts
2019-04-13 17:54:24 | ++ NODES_FILE=/home/centos/instackenv.json
2019-04-13 17:54:24 | ++ NODES_PLATFORM=baremetal
2019-04-13 17:54:24 | ++ MASTER_NODES_FILE=ocp/master_nodes.json
2019-04-13 17:54:24 | ++ export RHCOS_IMAGE_URL=https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa/
2019-04-13 17:54:24 | ++ RHCOS_IMAGE_URL=https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa/
2019-04-13 17:54:24 | +++ jq -r '.builds[1]'
2019-04-13 17:54:24 | +++ curl https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa//builds.json
2019-04-13 17:54:24 |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-13 17:54:24 |                                  Dload  Upload   Total   Spent    Left  Speed
100   737  100   737    0     0   1032      0 --:--:-- --:--:-- --:--:--  1033
2019-04-13 17:54:25 | ++ export RHCOS_LATEST=410.8.20190412.2
2019-04-13 17:54:25 | ++ RHCOS_LATEST=410.8.20190412.2
2019-04-13 17:54:25 | ++ export RHCOS_IMAGE_VERSION=410.8.20190412.2
2019-04-13 17:54:25 | ++ RHCOS_IMAGE_VERSION=410.8.20190412.2
2019-04-13 17:54:25 | +++ curl https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa//410.8.20190412.2/meta.json
2019-04-13 17:54:25 | +++ jq -r .images.openstack.path
2019-04-13 17:54:25 |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
2019-04-13 17:54:25 |                                  Dload  Upload   Total   Spent    Left  Speed
100  4778  100  4778    0     0  10485      0 --:--:-- --:--:-- --:--:-- 10478
2019-04-13 17:54:25 | ++ export RHCOS_IMAGE_FILENAME_OPENSTACK_GZ=rhcos-410.8.20190412.2-openstack.qcow2
2019-04-13 17:54:25 | ++ RHCOS_IMAGE_FILENAME_OPENSTACK_GZ=rhcos-410.8.20190412.2-openstack.qcow2
2019-04-13 17:54:25 | +++ echo rhcos-410.8.20190412.2-openstack.qcow2
2019-04-13 17:54:25 | +++ sed -e 's/-openstack.*//'
2019-04-13 17:54:25 | ++ export RHCOS_IMAGE_NAME=rhcos-410.8.20190412.2
2019-04-13 17:54:25 | ++ RHCOS_IMAGE_NAME=rhcos-410.8.20190412.2
2019-04-13 17:54:25 | ++ export RHCOS_IMAGE_FILENAME_OPENSTACK=rhcos-410.8.20190412.2-openstack.qcow2
2019-04-13 17:54:25 | ++ RHCOS_IMAGE_FILENAME_OPENSTACK=rhcos-410.8.20190412.2-openstack.qcow2
2019-04-13 17:54:25 | ++ export RHCOS_IMAGE_FILENAME_COMPRESSED=rhcos-410.8.20190412.2-compressed.qcow2
2019-04-13 17:54:25 | ++ RHCOS_IMAGE_FILENAME_COMPRESSED=rhcos-410.8.20190412.2-compressed.qcow2
2019-04-13 17:54:25 | ++ export RHCOS_IMAGE_FILENAME_LATEST=rhcos-ootpa-latest.qcow2
2019-04-13 17:54:25 | ++ RHCOS_IMAGE_FILENAME_LATEST=rhcos-ootpa-latest.qcow2
2019-04-13 17:54:25 | ++ export IRONIC_IMAGE=quay.io/metalkube/metalkube-ironic
2019-04-13 17:54:25 | ++ IRONIC_IMAGE=quay.io/metalkube/metalkube-ironic
2019-04-13 17:54:25 | ++ export IRONIC_INSPECTOR_IMAGE=quay.io/metalkube/metalkube-ironic-inspector
2019-04-13 17:54:25 | ++ IRONIC_INSPECTOR_IMAGE=quay.io/metalkube/metalkube-ironic-inspector
2019-04-13 17:54:25 | ++ export IRONIC_DATA_DIR=/opt/dev-scripts/ironic
2019-04-13 17:54:25 | ++ IRONIC_DATA_DIR=/opt/dev-scripts/ironic
2019-04-13 17:54:25 | ++ export KUBECONFIG=/home/centos/dev-scripts/ocp/auth/kubeconfig
2019-04-13 17:54:25 | ++ KUBECONFIG=/home/centos/dev-scripts/ocp/auth/kubeconfig
2019-04-13 17:54:25 | ++ export 'SSH=ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5'
2019-04-13 17:54:25 | ++ SSH='ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ConnectTimeout=5'
2019-04-13 17:54:25 | ++ export LIBVIRT_DEFAULT_URI=qemu:///system
2019-04-13 17:54:25 | ++ LIBVIRT_DEFAULT_URI=qemu:///system
2019-04-13 17:54:25 | ++ '[' centos '!=' root -a /run/user/1000 == /run/user/0 ']'
2019-04-13 17:54:25 | ++ sudo -n uptime
2019-04-13 17:54:25 | +++ awk -F= '/^ID=/ { print $2 }' /etc/os-release
2019-04-13 17:54:25 | +++ tr -d '"'
2019-04-13 17:54:25 | ++ [[ ! centos =~ ^(centos|rhel)$ ]]
2019-04-13 17:54:25 | +++ awk -F= '/^VERSION_ID=/ { print $2 }' /etc/os-release
2019-04-13 17:54:25 | +++ tr -d '"'
2019-04-13 17:54:25 | +++ cut -f1 -d.
2019-04-13 17:54:25 | ++ [[ 7 -ne 7 ]]
2019-04-13 17:54:25 | +++ df / --output=fstype
2019-04-13 17:54:25 | +++ grep -v Type
2019-04-13 17:54:25 | ++ FSTYPE=xfs
2019-04-13 17:54:25 | ++ case ${FSTYPE} in
2019-04-13 17:54:25 | +++ xfs_info /
2019-04-13 17:54:25 | +++ grep -q ftype=1
2019-04-13 17:54:25 | ++ [[ -n '' ]]
2019-04-13 17:54:25 | ++ '[' 477 = 0 ']'
2019-04-13 17:54:25 | ++ '[' '!' -d /opt/dev-scripts ']'
2019-04-13 17:54:25 | + figlet 'Deploying rook'
2019-04-13 17:54:25 | + lolcat
2019-04-13 17:54:25 |  ____             _             _                               _
2019-04-13 17:54:25 | |  _ \  ___ _ __ | | ___  _   _(_)_ __   __ _   _ __ ___   ___ | | __
2019-04-13 17:54:25 | | | | |/ _ \ '_ \| |/ _ \| | | | | '_ \ / _` | | '__/ _ \ / _ \| |/ /
2019-04-13 17:54:25 | | |_| |  __/ |_) | | (_) | |_| | | | | | (_| | | | | (_) | (_) |   <
2019-04-13 17:54:25 | |____/ \___| .__/|_|\___/ \__, |_|_| |_|\__, | |_|  \___/ \___/|_|\_\
2019-04-13 17:54:25 |            |_|            |___/         |___/
2019-04-13 17:54:25 | ++ go env
2019-04-13 17:54:25 | + eval 'GOARCH="amd64"
2019-04-13 17:54:25 | GOBIN=""
2019-04-13 17:54:25 | GOCACHE="/home/centos/.cache/go-build"
2019-04-13 17:54:25 | GOEXE=""
2019-04-13 17:54:25 | GOFLAGS=""
2019-04-13 17:54:25 | GOHOSTARCH="amd64"
2019-04-13 17:54:25 | GOHOSTOS="linux"
2019-04-13 17:54:25 | GOOS="linux"
2019-04-13 17:54:25 | GOPATH="/home/centos/go"
2019-04-13 17:54:25 | GOPROXY=""
2019-04-13 17:54:25 | GORACE=""
2019-04-13 17:54:25 | GOROOT="/usr/lib/golang"
2019-04-13 17:54:25 | GOTMPDIR=""
2019-04-13 17:54:25 | GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
2019-04-13 17:54:25 | GCCGO="gccgo"
2019-04-13 17:54:25 | CC="gcc"
2019-04-13 17:54:25 | CXX="g++"
2019-04-13 17:54:25 | CGO_ENABLED="1"
2019-04-13 17:54:25 | GOMOD=""
2019-04-13 17:54:25 | CGO_CFLAGS="-g -O2"
2019-04-13 17:54:25 | CGO_CPPFLAGS=""
2019-04-13 17:54:25 | CGO_CXXFLAGS="-g -O2"
2019-04-13 17:54:25 | CGO_FFLAGS="-g -O2"
2019-04-13 17:54:25 | CGO_LDFLAGS="-g -O2"
2019-04-13 17:54:25 | PKG_CONFIG="pkg-config"
2019-04-13 17:54:25 | GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build160591485=/tmp/go-build -gno-record-gcc-switches"'
2019-04-13 17:54:25 | ++ GOARCH=amd64
2019-04-13 17:54:25 | ++ GOBIN=
2019-04-13 17:54:25 | ++ GOCACHE=/home/centos/.cache/go-build
2019-04-13 17:54:25 | ++ GOEXE=
2019-04-13 17:54:25 | ++ GOFLAGS=
2019-04-13 17:54:25 | ++ GOHOSTARCH=amd64
2019-04-13 17:54:25 | ++ GOHOSTOS=linux
2019-04-13 17:54:25 | ++ GOOS=linux
2019-04-13 17:54:25 | ++ GOPATH=/home/centos/go
2019-04-13 17:54:25 | ++ GOPROXY=
2019-04-13 17:54:25 | ++ GORACE=
2019-04-13 17:54:25 | ++ GOROOT=/usr/lib/golang
2019-04-13 17:54:25 | ++ GOTMPDIR=
2019-04-13 17:54:25 | ++ GOTOOLDIR=/usr/lib/golang/pkg/tool/linux_amd64
2019-04-13 17:54:25 | ++ GCCGO=gccgo
2019-04-13 17:54:25 | ++ CC=gcc
2019-04-13 17:54:25 | ++ CXX=g++
2019-04-13 17:54:25 | ++ CGO_ENABLED=1
2019-04-13 17:54:25 | ++ GOMOD=
2019-04-13 17:54:25 | ++ CGO_CFLAGS='-g -O2'
2019-04-13 17:54:25 | ++ CGO_CPPFLAGS=
2019-04-13 17:54:25 | ++ CGO_CXXFLAGS='-g -O2'
2019-04-13 17:54:25 | ++ CGO_FFLAGS='-g -O2'
2019-04-13 17:54:25 | ++ CGO_LDFLAGS='-g -O2'
2019-04-13 17:54:25 | ++ PKG_CONFIG=pkg-config
2019-04-13 17:54:25 | ++ GOGCCFLAGS='-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build160591485=/tmp/go-build -gno-record-gcc-switches'
2019-04-13 17:54:25 | + export ROOKPATH=/home/centos/go/src/github.com/rook/rook
2019-04-13 17:54:25 | + ROOKPATH=/home/centos/go/src/github.com/rook/rook
2019-04-13 17:54:25 | + cd /home/centos/go/src/github.com/rook/rook/cluster/examples/kubernetes/ceph
2019-04-13 17:54:25 | + oc create -f common.yaml
2019-04-13 17:54:26 | namespace/rook-ceph created
2019-04-13 17:54:26 | customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created
2019-04-13 17:54:26 | customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created
2019-04-13 17:54:26 | customresourcedefinition.apiextensions.k8s.io/cephnfses.ceph.rook.io created
2019-04-13 17:54:26 | customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created
2019-04-13 17:54:26 | customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created
2019-04-13 17:54:26 | customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created
2019-04-13 17:54:26 | customresourcedefinition.apiextensions.k8s.io/volumes.rook.io created
2019-04-13 17:54:26 | clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
2019-04-13 17:54:26 | role.rbac.authorization.k8s.io/rook-ceph-system created
2019-04-13 17:54:26 | clusterrole.rbac.authorization.k8s.io/rook-ceph-global created
2019-04-13 17:54:26 | clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
2019-04-13 17:54:26 | serviceaccount/rook-ceph-system created
2019-04-13 17:54:26 | rolebinding.rbac.authorization.k8s.io/rook-ceph-system created
2019-04-13 17:54:26 | clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created
2019-04-13 17:54:26 | serviceaccount/rook-ceph-osd created
2019-04-13 17:54:26 | serviceaccount/rook-ceph-mgr created
2019-04-13 17:54:26 | role.rbac.authorization.k8s.io/rook-ceph-osd created
2019-04-13 17:54:26 | clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-system created
2019-04-13 17:54:26 | role.rbac.authorization.k8s.io/rook-ceph-mgr created
2019-04-13 17:54:26 | rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created
2019-04-13 17:54:26 | rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created
2019-04-13 17:54:26 | rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created
2019-04-13 17:54:26 | rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created
2019-04-13 17:54:26 | clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created
2019-04-13 17:54:26 | + sed '/FLEXVOLUME_DIR_PATH/!b;n;c\          value: "\/etc/kubernetes\/kubelet-plugins\/volume\/exec"' operator-openshift.yaml
2019-04-13 17:54:26 | + sed -i 's/# - name: FLEXVOLUME_DIR_PATH/- name: FLEXVOLUME_DIR_PATH/' operator-openshift-modified.yaml
2019-04-13 17:54:26 | + oc create -f operator-openshift-modified.yaml
2019-04-13 17:54:26 | securitycontextconstraints.security.openshift.io/rook-ceph created
2019-04-13 17:54:26 | deployment.apps/rook-ceph-operator created
2019-04-13 17:54:26 | + oc wait --for condition=ready pod -l app=rook-ceph-operator -n rook-ceph --timeout=120s
2019-04-13 17:54:26 | error: no matching resources found

There are no 'baremetal' hosts

$ oc explain baremetalhosts.metalkube.org
KIND:     BareMetalHost
VERSION:  metalkube.org/v1alpha1

DESCRIPTION:
     <empty>

$ oc get baremetalhosts --all-namespaces
No resources found.
$ oc version
Client Version: version.Info{Major:"4", Minor:"0+", GitVersion:"v4.0.22", GitCommit:"219bbe2f0c", GitTreeState:"", BuildDate:"2019-03-10T22:23:11Z", GoVersion:"", Compiler:"", Platform:""}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.4+eae8710", GitCommit:"eae8710", GitTreeState:"clean", BuildDate:"2019-03-30T18:01:26Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
$ git rev-parse HEAD
31b1359f92750d290fcc972350bc3a1c2dd6ebda
$ oc get nodes
NAME       STATUS   ROLES           AGE     VERSION
master-0   Ready    master,worker   3h18m   v1.12.4+50c2f2340a
master-1   Ready    master,worker   3h18m   v1.12.4+50c2f2340a
master-2   Ready    master,worker   3h18m   v1.12.4+50c2f2340a

Is the baremetal crd only for workers or nodes deployed with the baremetal operator?

Version checks for RHEL fail in common.sh

Check for RHEL 7 fail with RHEL 7.6 errors - several errors - 1 example

++ sudo -n uptime
+++ awk -F= '/^ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
++ [[ ! rhel =~ ^(centos|rhel)$ ]]
+++ awk -F= '/^VERSION_ID=/ { print $2 }' /etc/os-release
+++ tr -d '"'
++ [[ 7.6 -ne 7 ]]
common.sh: line 84: [[: 7.6: syntax error: invalid arithmetic operator (error token is ".6")
+++ df / --output=fstype
+++ grep -v Type
++ FSTYPE=xfs

Network interface reordering

I'm not sure if this is related to RHCOS version (OSTREE_VERSION=47.284) or if this can be fixed in the dev-scripts, but in my environment (3 masters, 4 nics, eth0 = provisioning network, eth1 = baremetal network, eth2 & eth3 = unused) ironic python agent and fedora get the proper nic order, but RHCOS uses eth2 as eth0 and eth3 as eth1 so the installation fails.
As a workaround, modifying the 04 script to use eth2 and eth3 instead seems to work.

Error on DNS route resolution on Livbirt node

Regarding to this PR #245

I've facing an error on DNS resolution and also on TLS validation:

Logs from openshift-authentication pod:

entication.svc.cluster.local] issuer="openshift-service-serving-signer@1554149617" (2019-04-01 20:21:17 +0000 UTC to 2021-03-31 20:21:18 +0000 UTC (now=2019-04-01 20:21:48.182068939 +0000 UTC))
I0401 20:21:48.182239       1 serving.go:195] [1] "/var/config/system/secrets/v4-0-config-system-serving-cert/tls.crt" serving certificate: "openshift-service-serving-signer@1554149617" [] issuer="<self>" (2019-04-01 20:13:36 +0000 UTC to 2020-03-31 20:13:37 +0000 UTC (now=2019-04-01 20:21:48.182226129 +0000 UTC))
I0401 20:21:48.182379       1 secure_serving.go:125] Serving securely on 0.0.0.0:6443
I0401 20:21:48.182483       1 serving.go:77] Starting DynamicLoader
I0401 21:30:50.704480       1 log.go:172] http: TLS handshake error from 10.128.0.1:59540: remote error: tls: unknown certificate
I0401 21:47:35.891348       1 log.go:172] http: TLS handshake error from 10.128.0.1:47572: remote error: tls: unknown certificate
I0401 22:11:13.559976       1 log.go:172] http: TLS handshake error from 10.128.0.1:42264: remote error: tls: unknown certificate
I0401 22:15:50.470017       1 log.go:172] http: TLS handshake error from 10.128.0.1:46784: remote error: tls: unknown certificate
I0401 22:19:35.864342       1 log.go:172] http: TLS handshake error from 10.128.0.1:50434: remote error: tls: unknown certificate
I0401 22:23:27.825915       1 log.go:172] http: TLS handshake error from 10.128.0.1:54202: remote error: tls: unknown certificate
I0401 22:26:29.398364       1 log.go:172] http: TLS handshake error from 10.128.0.1:57154: remote error: tls: unknown certificate

Trying to reach openshift-console:

[jparrill@sjr3 dev-scripts]$ until wget --no-check-certificate https://console-openshift-console.apps.apps.ostest.test.metalkube.org; do sleep 10 && echo "rechecking"; done
--2019-04-01 23:00:25--  https://console-openshift-console.apps.apps.ostest.test.metalkube.org/
Resolving console-openshift-console.apps.apps.ostest.test.metalkube.org (console-openshift-console.apps.apps.ostest.test.metalkube.org)... 192.168.111.21
Connecting to console-openshift-console.apps.apps.ostest.test.metalkube.org (console-openshift-console.apps.apps.ostest.test.metalkube.org)|192.168.111.21|:443... connected.
WARNING: cannot verify console-openshift-console.apps.apps.ostest.test.metalkube.org's certificate, issued by ‘/CN=cluster-ingress-operator@1554150077’:
  Self-signed certificate encountered.
WARNING: no certificate subject alternative name matches
	requested host name ‘console-openshift-console.apps.apps.ostest.test.metalkube.org’.
HTTP request sent, awaiting response... 503 Service Unavailable
2019-04-01 23:00:25 ERROR 503: Service Unavailable.

rechecking
--2019-04-01 23:00:35--  https://console-openshift-console.apps.apps.ostest.test.metalkube.org/
Resolving console-openshift-console.apps.apps.ostest.test.metalkube.org (console-openshift-console.apps.apps.ostest.test.metalkube.org)... 192.168.111.21
Connecting to console-openshift-console.apps.apps.ostest.test.metalkube.org (console-openshift-console.apps.apps.ostest.test.metalkube.org)|192.168.111.21|:443... connected.
WARNING: cannot verify console-openshift-console.apps.apps.ostest.test.metalkube.org's certificate, issued by ‘/CN=cluster-ingress-operator@1554150077’:
  Self-signed certificate encountered.
WARNING: no certificate subject alternative name matches
	requested host name ‘console-openshift-console.apps.apps.ostest.test.metalkube.org’.
HTTP request sent, awaiting response... 503 Service Unavailable
2019-04-01 23:00:35 ERROR 503: Service Unavailable.

No csrs to approve

[jparrill@sjr3 dev-scripts]$ oc get csr
No resources found.

Those traces comes from the Libvirt node which contains the master nodes

Ideas?

04_setup_ironic.sh fails on creating directory

Permission denied

Logging to ./logs/04_setup_ironic-2019-03-05-150932.log
+++ exec
++++ tee ./logs/04_setup_ironic-2019-03-05-150932.log
++ '[' '!' -f redhat-coreos-maipo-47.284-qemu.qcow2 ']'
++ mkdir -p /opt/dev-scripts/ironic/html/images
mkdir: cannot create directory ‘/opt/dev-scripts/ironic’: Permission denied

ironic is shipping unbounded

Currently we ship ironic with an unbounded number of API workers. This could become an issue on the platform. We should consider bounding ironic to 12 workers (similar to how we do things in TripleO today [however, not with ironic]).

Applying resources to kubevirt namespace fails with "caches not synchronized"

When applying the kubevirt manifests from manifests/ directory, oc fails
to post the resources with the following errors:

[root@zeus07 manifests]# oc --config ../ocp/auth/kubeconfig apply -f 110_cnv_kubevirt_op.yaml 
Warning: oc apply should be used on resource created by either oc create --save-config or oc apply
Warning: oc apply should be used on resource created by either oc create --save-config or oc apply
customresourcedefinition.apiextensions.k8s.io/kubevirts.kubevirt.io configured
Warning: oc apply should be used on resource created by either oc create --save-config or oc apply
clusterrole.rbac.authorization.k8s.io/kubevirt.io:operator configured
Warning: oc apply should be used on resource created by either oc create --save-config or oc apply
clusterrolebinding.rbac.authorization.k8s.io/kubevirt-operator configured
Error from server (Forbidden): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"labels\":{\"kubevirt.io\":\"\"},\"name\":\"kubevirt\",\"namespace\":\"\"}}\n"},"namespace":""}}
to:
Resource: "/v1, Resource=namespaces", GroupVersionKind: "/v1, Kind=Namespace"
Name: "kubevirt", Namespace: ""
Object: &{map["kind":"Namespace" "apiVersion":"v1" "metadata":map["name":"kubevirt" "selfLink":"/api/v1/namespaces/kubevirt" "uid":"cfc4fdce-41bb-11e9-a9c0-52fdfc072182" "resourceVersion":"904" "creationTimestamp":"2019-03-08T16:04:11Z" "labels":map["kubevirt.io":""]] "spec":map["finalizers":["kubernetes"]] "status":map["phase":"Active"]]}
for: "110_cnv_kubevirt_op.yaml": namespaces "kubevirt" is forbidden: caches not synchronized
Error from server (Forbidden): error when creating "110_cnv_kubevirt_op.yaml": serviceaccounts "kubevirt-operator" is forbidden: caches not synchronized
Error from server (Forbidden): error when creating "110_cnv_kubevirt_op.yaml": deployments.apps "virt-operator" is forbidden: caches not synchronized

When we switch to kube-system namespace then it works fine.

The same error happens when manifests are applied by openshift-install
itself, as in #127.

I plan to post a PR that switches to kube-system to make progress on
deployment. This issue will be used to track investigation on reasons behind
the failure and to revert back to kubevirt namespace once we have the root
cause.

Cluster not registered in cloud.openshift.com

It looks like the cluster deployed with the dev-scripts is not registered in cloud.openshift.com.

By some talks in the #forum-monitoring channel, it seems that in order to the cluster to be registered, the monitoring operator is failing. That is the component responsible for sending the metrics to telemetry, so if it is failing no metrics will be sent, and the cluster will not be registered.

From the events in the "openshift-monitoring" it looks like prometheus is not deployed as there are no nodes available (insufficient cpu)

kubevirt based ui not deploying

The kubevirt based ui is not deployed with the dev-scripts. I'm not sure what would be the procedure (maybe add more stuff to the manifests?) but I think it worth opening the issue to ensure this can be added to the deployment (it looks like the ui is this one https://github.com/kubevirt/web-ui-operator)

ceph status is HEALTH_WARN

[root@f13-h02-000-1029u dev-scripts]# oc -n rook-ceph exec rook-ceph-tools-76c7d559b6-95l8p -- ceph -s                                                                                                              
  cluster:
    id:     2661923a-db3a-405a-b07a-ceb6e33d86ce
    health: HEALTH_WARN
            367 slow ops, oldest one blocked for 2288 sec, mon.c has slow ops
            clock skew detected on mon.b, mon.c
 
  services:
    mon: 3 daemons, quorum a,b,c
    mgr: a(active)
    mds: myfs-1/1/1 up  {0=myfs-b=up:active}, 1 up:standby-replay
    osd: 3 osds: 3 up, 3 in
 
  data:
    pools:   3 pools, 300 pgs
    objects: 22  objects, 2.2 KiB
    usage:   3.0 GiB used, 2.6 TiB / 2.6 TiB avail
    pgs:     300 active+clean
 
  io:
    client:   1.2 KiB/s rd, 2 op/s rd, 0 op/s wr

Which seems to be due to times being off

debug 2019-04-15 17:26:12.093 7f10f49ff700 -1 mon.c@2(peon).paxos(paxos updating c 1005..1751) lease_expire from mon.0 172.30.19.151:6789/0 is 123.467528 seconds in the past; mons are probably laggy (or possibly clocks are too skewed)

[root@f13-h02-000-1029u dev-scripts]# for n in 20 21 22; do                                                                                                                                                         
> ssh [email protected].$n date;
> done
Mon Apr 15 17:47:01 UTC 2019
Mon Apr 15 17:47:00 UTC 2019
Mon Apr 15 17:49:10 UTC 2019

Which this seems similar to issue #354

Certs not getting auto approved, stuck in pending state

All of our deployments have a problem where certs aren't being approved automatically, and the following command is needed as a workaround:

oc get csr -o name | xargs -n 1 oc adm certificate approve

Other PRs / issues tracked through this investigation:

Deployments failing due to rendering bootstrap manifests

Something recently introduced is causing deployments to fail.

Apr 03 18:16:08 localhost.localdomain systemd[1]: Started Bootstrap a Kubernetes cluster.
Apr 03 18:16:27 localhost.localdomain bootkube.sh[11222]: Rendering MCO manifests...
Apr 03 18:16:28 localhost.localdomain bootkube.sh[11222]: I0403 18:16:28.454325       1 bootstrap.go:84] Version: 4.0.0-alpha.0-154-gc1ba218c-dirty
Apr 03 18:16:28 localhost.localdomain bootkube.sh[11222]: F0403 18:16:28.456056       1 bootstrap.go:104] error rendering bootstrap manifests: open /assets/tls/etcd-metric-ca-bundle.crt: no such file or directory
Apr 03 18:16:28 localhost.localdomain systemd[1]: bootkube.service: main process exited, code=exited, status=255/n/a
Apr 03 18:16:28 localhost.localdomain systemd[1]: Unit bootkube.service entered failed state.
Apr 03 18:16:28 localhost.localdomain systemd[1]: bootkube.service failed.

Masters deploy fail with terraform - Server-side error: "(sqlite3.OperationalError) database is locked

From #135

With this PR, Yurii is seeing:
OperationalError: (sqlite3.OperationalError) database is locked [SQL: u'SELECT nodes.created_at
even with:
$ podman exec -it  ironic sqlite3 /var/lib/ironic/ironic.db  “PRAGMA journal_mode”
wal
Dmitri suggests:

So, our last resort (?) option with sqlite is to try setting busy_timeout.

Failed like:

ironic_node_v1.openshift-master-2: Still creating... (2m50s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (2m50s elapsed)
ironic_node_v1.openshift-master-1: Still creating... (2m50s elapsed)
2019/03/15 10:29:45 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error
2019/03/15 10:29:45 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error
2019/03/15 10:29:45 [TRACE] [walkApply] Exiting eval tree: ironic_node_v1.openshift-master-1
2019/03/15 10:29:48 [ERROR] root: eval: *terraform.EvalApplyPost, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-2: Internal Server Error
2019/03/15 10:29:48 [ERROR] root: eval: *terraform.EvalSequence, err: 1 error(s) occurred:

* ironic_node_v1.openshift-master-2: Internal Server Error

2019/03/15 10:30:02 [DEBUG] plugin: waiting for all plugin processes to complete...
Error: Error applying plan:

3 error(s) occurred:

* ironic_node_v1.openshift-master-0: 1 error(s) occurred:

* ironic_node_v1.openshift-master-0: Internal Server Error
* ironic_node_v1.openshift-master-1: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error
* ironic_node_v1.openshift-master-2: 1 error(s) occurred:

2019-03-15T10:30:02.972+0200 [DEBUG] plugin.terraform-provider-ironic: 2019/03/15 10:30:02 [ERR] plugin: stream copy 'stderr' error: stream closed
* ironic_node_v1.openshift-master-2: Internal Server Error

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


2019-03-15T10:30:03.007+0200 [DEBUG] plugin.terraform-provider-ironic: 2019/03/15 10:30:03 [ERR] plugin: plugin server: accept unix /tmp/plugin005349256: use of closed network connection
2019-03-15T10:30:03.008+0200 [DEBUG] plugin: plugin process exited: path=/root/.terraform.d/plugins/terraform-provider-ironic

2019-03-15 08:29:25.649 44 ERROR wsme.api [req-747fc5e4-6050-463e-9d5d-8b7fa79a00f3 - - - - -] Server-side error: "(sqlite3.OperationalError) database is locked [SQL: u'SELECT anon_1.nodes_created_at AS anon_1_nodes_created_at, anon_1.nodes_updated_at AS anon_1_nodes_updated_at, anon_1.nodes_version AS anon_1_nodes_version, anon_1.nodes_id AS anon_1_nodes_id, anon_1.nodes_uuid AS anon_1_nodes_uuid, anon_1.nodes_instance_uuid AS anon_1_nodes_instance_uuid, anon_1.nodes_name AS anon_1_nodes_name, anon_1.nodes_chassis_id AS anon_1_nodes_chassis_id, anon_1.nodes_power_state AS anon_1_nodes_power_state, anon_1.nodes_target_power_state AS anon_1_nodes_target_power_state, anon_1.nodes_provision_state AS anon_1_nodes_provision_state, anon_1.nodes_target_provision_state AS anon_1_nodes_target_provision_state, anon_1.nodes_provision_updated_at AS anon_1_nodes_provision_updated_at, anon_1.nodes_last_error AS anon_1_nodes_last_error, anon_1.nodes_instance_info AS anon_1_nodes_instance_info, anon_1.nodes_properties AS anon_1_nodes_properties, anon_1.nodes_driver AS anon_1_nodes_driver, anon_1.nodes_driver_info AS anon_1_nodes_driver_info, anon_1.nodes_driver_internal_info AS anon_1_nodes_driver_internal_info, anon_1.nodes_clean_step AS anon_1_nodes_clean_step, anon_1.nodes_deploy_step AS anon_1_nodes_deploy_step, anon_1.nodes_resource_class AS anon_1_nodes_resource_class, anon_1.nodes_raid_config AS anon_1_nodes_raid_config, anon_1.nodes_target_raid_config AS anon_1_nodes_target_raid_config, anon_1.nodes_reservation AS anon_1_nodes_reservation, anon_1.nodes_conductor_affinity AS anon_1_nodes_conductor_affinity, anon_1.nodes_conductor_group AS anon_1_nodes_conductor_group, anon_1.nodes_maintenance AS anon_1_nodes_maintenance, anon_1.nodes_maintenance_reason AS anon_1_nodes_maintenance_reason, anon_1.nodes_fault AS anon_1_nodes_fault, anon_1.nodes_console_enabled AS anon_1_nodes_console_enabled, anon_1.nodes_inspection_finished_at AS anon_1_nodes_inspection_finished_at, anon_1.nodes_inspection_started_at AS anon_1_nodes_inspection_started_at, anon_1.nodes_extra AS anon_1_nodes_extra, anon_1.nodes_automated_clean AS anon_1_nodes_automated_clean, anon_1.nodes_protected AS anon_1_nodes_protected, anon_1.nodes_protected_reason AS anon_1_nodes_protected_reason, anon_1.nodes_owner AS anon_1_nodes_owner, anon_1.nodes_allocation_id AS anon_1_nodes_allocation_id, anon_1.nodes_description AS anon_1_nodes_description, anon_1.nodes_bios_interface AS anon_1_nodes_bios_interface, anon_1.nodes_boot_interface AS anon_1_nodes_boot_interface, anon_1.nodes_console_interface AS anon_1_nodes_console_interface, anon_1.nodes_deploy_interface AS anon_1_nodes_deploy_interface, anon_1.nodes_inspect_interface AS anon_1_nodes_inspect_interface, anon_1.nodes_management_interface AS anon_1_nodes_management_interface, anon_1.nodes_network_interface AS anon_1_nodes_network_interface, anon_1.nodes_raid_interface AS anon_1_nodes_raid_interface, anon_1.nodes_rescue_interface AS anon_1_nodes_rescue_interface, anon_1.nodes_storage_interface AS anon_1_nodes_storage_interface, anon_1.nodes_power_interface AS anon_1_nodes_power_interface, anon_1.nodes_vendor_interface AS anon_1_nodes_vendor_interface, node_traits_1.created_at AS node_traits_1_created_at, node_traits_1.updated_at AS node_traits_1_updated_at, node_traits_1.version AS node_traits_1_version, node_traits_1.node_id AS node_traits_1_node_id, node_traits_1.trait AS node_traits_1_trait, node_tags_1.created_at AS node_tags_1_created_at, node_tags_1.updated_at AS node_tags_1_updated_at, node_tags_1.version AS node_tags_1_version, node_tags_1.node_id AS node_tags_1_node_id, node_tags_1.tag AS node_tags_1_tag \nFROM (SELECT nodes.created_at AS nodes_created_at, nodes.updated_at AS nodes_updated_at, nodes.version AS nodes_version, nodes.id AS nodes_id, nodes.uuid AS nodes_uuid, nodes.instance_uuid AS nodes_instance_uuid, nodes.name AS nodes_name, nodes.chassis_id AS nodes_chassis_id, nodes.power_state AS nodes_power_state, nodes.target_power_state AS nodes_target_power_state, nodes.provision_state AS nodes_provision_state, nodes.target_provision_state AS nodes_target_provision_state, nodes.provision_updated_at AS nodes_provision_updated_at, nodes.last_error AS nodes_last_error, nodes.instance_info AS nodes_instance_info, nodes.properties AS nodes_properties, nodes.driver AS nodes_driver, nodes.driver_info AS nodes_driver_info, nodes.driver_internal_info AS nodes_driver_internal_info, nodes.clean_step AS nodes_clean_step, nodes.deploy_step AS nodes_deploy_step, nodes.resource_class AS nodes_resource_class, nodes.raid_config AS nodes_raid_config, nodes.target_raid_config AS nodes_target_raid_config, nodes.reservation AS nodes_reservation, nodes.conductor_affinity AS nodes_conductor_affinity, nodes.conductor_group AS nodes_conductor_group, nodes.maintenance AS nodes_maintenance, nodes.maintenance_reason AS nodes_maintenance_reason, nodes.fault AS nodes_fault, nodes.console_enabled AS nodes_console_enabled, nodes.inspection_finished_at AS nodes_inspection_finished_at, nodes.inspection_started_at AS nodes_inspection_started_at, nodes.extra AS nodes_extra, nodes.automated_clean AS nodes_automated_clean, nodes.protected AS nodes_protected, nodes.protected_reason AS nodes_protected_reason, nodes.owner AS nodes_owner, nodes.allocation_id AS nodes_allocation_id, nodes.description AS nodes_description, nodes.bios_interface AS nodes_bios_interface, nodes.boot_interface AS nodes_boot_interface, nodes.console_interface AS nodes_console_interface, nodes.deploy_interface AS nodes_deploy_interface, nodes.inspect_interface AS nodes_inspect_interface, nodes.management_interface AS nodes_management_interface, nodes.network_interface AS nodes_network_interface, nodes.raid_interface AS nodes_raid_interface, nodes.rescue_interface AS nodes_rescue_interface, nodes.storage_interface AS nodes_storage_interface, nodes.power_interface AS nodes_power_interface, nodes.vendor_interface AS nodes_vendor_interface \nFROM nodes ORDER BY nodes.id ASC\n LIMIT ? OFFSET ?) AS anon_1 LEFT OUTER JOIN node_traits AS node_traits_1 ON node_traits_1.node_id = anon_1.nodes_id LEFT OUTER JOIN node_tags AS node_tags_1 ON node_tags_1.node_id = anon_1.nodes_id ORDER BY anon_1.nodes_id ASC'] [parameters: (1000, 0)] (Background on this error at: http://sqlalche.me/e/e3q8)". Detail:
Traceback (most recent call last):

  File "/usr/lib/python2.7/site-packages/wsmeext/pecan.py", line 85, in callfunction
    result = f(self, *args, **kwargs)

  File "/usr/lib/python2.7/site-packages/ironic/api/controllers/v1/node.py", line 1872, in get_all
    **extra_args)

  File "/usr/lib/python2.7/site-packages/ironic/api/controllers/v1/node.py", line 1684, in _get_nodes_collection
    filters=filters)

  File "/usr/lib/python2.7/site-packages/ironic/objects/node.py", line 313, in list
    sort_dir=sort_dir)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 400, in get_node_list
    sort_key, sort_dir, query)

  File "/usr/lib/python2.7/site-packages/ironic/db/sqlalchemy/api.py", line 229, in _paginate_query
    return query.all()

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 2925, in all
    return list(self)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3081, in __iter__
    return self._execute_and_instances(context)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/query.py", line 3106, in _execute_and_instances
    result = conn.execute(querycontext.statement, self._params)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 980, in execute
    return meth(self, multiparams, params)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement
    distilled_params,

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1240, in _execute_context
    e, statement, parameters, cursor, context

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1456, in _handle_dbapi_exception
    util.raise_from_cause(newraise, exc_info)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context
    cursor, statement, parameters, context

  File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute
    cursor.execute(statement, parameters)

allow defining rook YAML files

In 10_deploy_rook.sh there is no way to provision rook with other YAML files. This is problematic for CI systems where certain options might be needed to verify certain behavior.

Although it is helpful to have some sort of "vanilla" setup, it is important to have some way to define a path for these YAML files so that rook can be configured differently.

Manual execution of this line fails

https://github.com/openshift-metalkube/dev-scripts/blob/ecf60de2ea62593bb1613c5ee1af2a888ed3d261/assets/files/etc/kubernetes/manifests/mdns-publisher.yaml#L54

[core@master-1 ~]$ ONE_CIDR="$(ip addr show to "$SUBNET_CIDR" | \

              grep -Po 'inet \K[\d.]+/[\d.]+' | \
              grep -v "${DNS_VIP}/$PREFIX" | \
              grep -v "${API_VIP}/$PREFIX" | \
              sort | xargs | cut -f1 -d' ')"

Error: any valid prefix is expected rather than "".

infinite loop

If there is a networking hiccup with the setup, this loop could cause the scripts to wait forever[1].

+ host -t SRV _etcd-server-ssl._tcp.ostest.test.metalkube.org api.ostest.test.metalkube.org
+ echo -n .
.+ sleep 1
+ host -t SRV _etcd-server-ssl._tcp.ostest.test.metalkube.org api.ostest.test.metalkube.org
+ echo -n .
.+ sleep 1
+ host -t SRV _etcd-server-ssl._tcp.ostest.test.metalkube.org api.ostest.test.metalkube.org
+ echo -n .
.+ sleep 1

[1] https://github.com/openshift-metalkube/dev-scripts/blob/master/utils.sh#L162

playbook created networks are not deleted by cleanup scripts

dev-scripts should use the same rhcos image build that kni-install is pinned to

In dev-scripts, rather than using the latest rhcos image, we should use the same one build that is hard-coded in the kni-installer repository:

hack/build.sh:5:RHCOS_BUILD_NAME="${RHCOS_BUILD_NAME:-400.7.20190306.0}"

When openshift-metal3/kni-installer#36 lands, that also means changing the channel from maipo to ootpa

plugin.terraform-provider-ironic: 2019/03/27 18:19:30 [ERR] plugin: plugin server: accept unix /tmp/plugin204986381: use of closed network connection

Installation failed:

$ make
[...]
ironic_node_v1.openshift-master-2: Still creating... (6m20s elapsed)
ironic_node_v1.openshift-master-0: Still creating... (6m20s elapsed)
2019-03-27T18:19:29.662-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node current state is 'active'
2019-03-27T18:19:29.662-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node 8f0943d0-839c-45da-a548-c57e69501fdf is 'active', we are done.
ironic_node_v1.openshift-master-2: Creation complete after 6m25s (ID: 8f0943d0-839c-45da-a548-c57e69501fdf)
2019-03-27T18:19:29.878-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node current state is 'active'
2019-03-27T18:19:29.878-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:29 [DEBUG] Node b8e484a7-bee9-49f3-a203-a3f06a474b74 is 'active', we are done.
ironic_node_v1.openshift-master-0: Creation complete after 6m25s (ID: b8e484a7-bee9-49f3-a203-a3f06a474b74)
2019/03/27 18:19:30 [DEBUG] plugin: waiting for all plugin processes to complete...

Error: Error applying plan:

1 error(s) occurred:

* ironic_node_v1.openshift-master-1: 1 error(s) occurred:

* ironic_node_v1.openshift-master-1: Internal Server Error

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.


2019-03-27T18:19:30.101-0400 [DEBUG] plugin.terraform-provider-ironic: 2019/03/27 18:19:30 [ERR] plugin: plugin server: accept unix /tmp/plugin204986381: use of closed network connection
2019-03-27T18:19:30.102-0400 [DEBUG] plugin: plugin process exited: path=/home/kni/.terraform.d/plugins/terraform-provider-ironic
make: *** [ocp_run] Error 1

Some random commands as I basically do not know what I'm doing :-)

[kni@intel-canoepass-09 dev-scripts]$ oc --config /home/kni/dev-scripts/ocp/auth/kubeconfig get nodes
error: the server doesn't have a resource type "nodes"

[kni@intel-canoepass-09 dev-scripts]$ export OS_TOKEN=fake-token
[kni@intel-canoepass-09 dev-scripts]$ export OS_URL=http://localhost:6385/
[kni@intel-canoepass-09 dev-scripts]$ openstack baremetal node list
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name               | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
| c9c0a601-86a7-41ad-bf84-96c429b93fd9 | openshift-master-1 | None          | power on    | active             | False       |
| b8e484a7-bee9-49f3-a203-a3f06a474b74 | openshift-master-0 | None          | power on    | active             | False       |
| 8f0943d0-839c-45da-a548-c57e69501fdf | openshift-master-2 | None          | power on    | active             | False       |
+--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+
[kni@intel-canoepass-09 dev-scripts]$ openstack baremetal node show c9c0a601-86a7-41ad-bf84-96c429b93fd9
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                  | Value                                                                                                                                                                                                                                                                           |
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| allocation_uuid        | None                                                                                                                                                                                                                                                                            |
| automated_clean        | None                                                                                                                                                                                                                                                                            |
| bios_interface         | no-bios                                                                                                                                                                                                                                                                         |
| boot_interface         | ipxe                                                                                                                                                                                                                                                                            |
| chassis_uuid           | None                                                                                                                                                                                                                                                                            |
| clean_step             | {}                                                                                                                                                                                                                                                                              |
| conductor              | intel-canoepass-09.khw1.lab.eng.bos.redhat.com                                                                                                                                                                                                                                  |
| conductor_group        |                                                                                                                                                                                                                                                                                 |
| console_enabled        | False                                                                                                                                                                                                                                                                           |
| console_interface      | no-console                                                                                                                                                                                                                                                                      |
| created_at             | 2019-03-27T22:13:04.839047+00:00                                                                                                                                                                                                                                                |
| deploy_interface       | direct                                                                                                                                                                                                                                                                          |
| deploy_step            | {}                                                                                                                                                                                                                                                                              |
| description            | None                                                                                                                                                                                                                                                                            |
| driver                 | ipmi                                                                                                                                                                                                                                                                            |
| driver_info            | {u'ipmi_port': u'6231', u'ipmi_username': u'admin', u'deploy_kernel': u'http://172.22.0.1/images/ironic-python-agent.kernel', u'ipmi_address': u'192.168.111.1', u'deploy_ramdisk': u'http://172.22.0.1/images/ironic-python-agent.initramfs', u'ipmi_password': u'******'}     |
| driver_internal_info   | {u'deploy_boot_mode': u'bios', u'is_whole_disk_image': True, u'root_uuid_or_disk_id': u'0xbaef8236', u'agent_url': u'http://172.22.0.78:9999', u'deploy_steps': None, u'agent_version': u'3.7.0.dev3', u'agent_last_heartbeat': u'2019-03-27T22:18:08.408711'}                  |
| extra                  | {}                                                                                                                                                                                                                                                                              |
| fault                  | None                                                                                                                                                                                                                                                                            |
| inspect_interface      | inspector                                                                                                                                                                                                                                                                       |
| inspection_finished_at | None                                                                                                                                                                                                                                                                            |
| inspection_started_at  | None                                                                                                                                                                                                                                                                            |
| instance_info          | {u'root_gb': u'25', u'image_source': u'http://172.22.0.1/images/redhat-coreos-maipo-latest.qcow2', u'image_type': u'whole-disk-image', u'root_device': u'/dev/vda', u'image_checksum': u'308f00a5cb04c5aaf0f15073dabe335f', u'image_url': u'******', u'configdrive': u'******'} |
| instance_uuid          | None                                                                                                                                                                                                                                                                            |
| last_error             | None                                                                                                                                                                                                                                                                            |
| maintenance            | False                                                                                                                                                                                                                                                                           |
| maintenance_reason     | None                                                                                                                                                                                                                                                                            |
| management_interface   | ipmitool                                                                                                                                                                                                                                                                        |
| name                   | openshift-master-1                                                                                                                                                                                                                                                              |
| network_interface      | noop                                                                                                                                                                                                                                                                            |
| owner                  | None                                                                                                                                                                                                                                                                            |
| power_interface        | ipmitool                                                                                                                                                                                                                                                                        |
| power_state            | power on                                                                                                                                                                                                                                                                        |
| properties             | {}                                                                                                                                                                                                                                                                              |
| protected              | False                                                                                                                                                                                                                                                                           |
| protected_reason       | None                                                                                                                                                                                                                                                                            |
| provision_state        | active                                                                                                                                                                                                                                                                          |
| provision_updated_at   | 2019-03-27T22:19:20.972264+00:00                                                                                                                                                                                                                                                |
| raid_config            | {}                                                                                                                                                                                                                                                                              |
| raid_interface         | no-raid                                                                                                                                                                                                                                                                         |
| rescue_interface       | no-rescue                                                                                                                                                                                                                                                                       |
| reservation            | None                                                                                                                                                                                                                                                                            |
| resource_class         | None                                                                                                                                                                                                                                                                            |
| storage_interface      | noop                                                                                                                                                                                                                                                                            |
| target_power_state     | None                                                                                                                                                                                                                                                                            |
| target_provision_state | None                                                                                                                                                                                                                                                                            |
| target_raid_config     | {}                                                                                                                                                                                                                                                                              |
| traits                 | []                                                                                                                                                                                                                                                                              |
| updated_at             | 2019-03-27T22:19:21.011055+00:00                                                                                                                                                                                                                                                |
| uuid                   | c9c0a601-86a7-41ad-bf84-96c429b93fd9                                                                                                                                                                                                                                            |
| vendor_interface       | ipmitool                                                                                                                                                                                                                                                                        |
+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[kni@intel-canoepass-09 dev-scripts]$ openstack baremetal node show c9c0a601-86a7-41ad-bf84-96c429b93fd9 -f value -c last_error
None

ironic_node_v1.openshift-master-0: cannot go from state 'deploy failed' to state 'manageable'"

On commit 0fb6eb9 using nest virt running first step, "make".

level=debug msg="module.masters.ironic_node_v1.openshift-master-0: Still creating... (31m31s elapsed)"
level=debug msg="module.masters.ironic_node_v1.openshift-master-0: Still creating... (31m41s elapsed)"
level=error
level=error msg="Error: Error applying plan:"
level=error
level=error msg="1 error occurred:"
level=error msg="\t* module.masters.ironic_node_v1.openshift-master-0: 1 error occurred:"
level=error msg="\t* ironic_node_v1.openshift-master-0: cannot go from state 'deploy failed' to state 'manageable'"
level=error
level=error
level=error
level=error
level=error
level=error msg="Terraform does not automatically rollback in the face of errors."
level=error msg="Instead, your Terraform state file has been partially updated with"
level=error msg="any resources that successfully completed. Please address the error"
level=error msg="above and apply again to incrementally change your infrastructure."
level=error
level=error
level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform"
make: *** [ocp_run] Error 1

If ipmi_port is not provided in the json file, the BareMetalHost is created with :

https://github.com/openshift-metalkube/dev-scripts/blob/8c0e9c645d2c09c8ccccfc9e7a9cee971042d357/11_register_hosts.sh#L17

apiVersion: metalkube.org/v1alpha1
kind: BareMetalHost
metadata:
  name: openshift-master-1
spec:
  online: true
  bmc:
    address: ipmi://a.b.c.d:
    credentialsName: openshift-master-1-bmc-secret
  bootMACAddress: 00:11:22:33:44:55

  machineRef:
    name: ostest-master-1
    namespace: openshift-machine-api

410.8.20190412.1 rhcos release doesn't include the openstack image

curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa/builds.json | jq -r ".builds[0]"
410.8.20190412.1


curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa/410.8.20190412.1/meta.json | jq -r .images
{
  "qemu": {
    "path": "rhcos-410.8.20190412.1-qemu.qcow2",
    "sha256": "b2b80605be197d988816df347cfe76c652db556fc47830df7c74c66e2ac97d28",
    "size": "765151729",
    "uncompressed-sha256": "1a9394ed8383cebb41228e49e4b3b009d7e6a6cc0045061ce0549d3ed860ccc0",
    "uncompressed-size": "2165243904"
  }
}

curl -s https://releases-rhcos.svc.ci.openshift.org/storage/releases/ootpa/410.8.20190412.1/meta.json | jq -r .images.openstack
null

Affected code: https://github.com/openshift-metalkube/dev-scripts/blob/master/common.sh#L54

increase default VM RAM requirement to match AWS infrastructure requirements

Proposing this after discussion with @mcornea and after debugging issues caused by OOM on a host with 32GB RAM. Would 16GB be more reasonable default allowing higher installation success rate?

02_configure_host.sh fails on ovs?

I set

WORKING_DIR=${WORKING_DIR:-"/opt/dev-scripts"}
NODES_FILE=${NODES_FILE:-"/home/test/rook.json"}
NODES_PLATFORM=${NODES_PLATFORM:-"baremetal"}
MASTER_NODES_FILE=${MASTER_NODES_FILE:-"/home/test/rook.json"}

TASK [include_role : parts/ovs] *********************************************************************************************************************************************************************************************************
task path: /home/test/dev-scripts/tripleo-quickstart-config/roles/environment/setup/tasks/main.yml:6
fatal: [localhost]: FAILED! => {
    "msg": "The conditional check 'networks|selectattr('virtualport_type', 'defined')|map(attribute='name')|list|length > 0' failed. The error was: error while evaluating conditional (networks|selectattr('virtualport_type', 'defined')|map(attribute='name')|list|length > 0): 'networks' is undefined\n\nThe error appears to have been in '/home/test/dev-scripts/tripleo-quickstart-config/roles/environment/setup/tasks/main.yml': line 6, column 5, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n  # Install OVS dependencies\n  - name: Install OVS dependencies\n    ^ here\n"
}
        to retry, use: --limit @/home/test/dev-scripts/tripleo-quickstart-config/metalkube-setup-playbook.retry

PLAY RECAP ******************************************************************************************************************************************************************************************************************************
localhost                  : ok=12   changed=2    unreachable=0    failed=1

RFE: Add a NODES_FILE example

README says for baremetal testing a NODES_FILE variable pointing to a json file with nodes' details should be provided.
It would be nice to have a sample json file with the required content.

Kubelet on bootstrap node fails to start, masters errors out with: Still waiting for the Kubernetes API: Get https://api.ostest.test.metalkube.org:6443/version?timeout=32s: dial tcp 192.168.111.5:6443: connect: connection refused

Running make failed because of this:

level=debug msg="Still waiting for the Kubernetes API: Get https://api.ostest.test.metalkube.org:6443/version?timeout=32s: dial tcp 192.168.111.5:6443: connect: connection refused"
level=debug msg="Still waiting for the Kubernetes API: Get https://api.ostest.test.metalkube.org:6443/version?timeout=32s: dial tcp 192.168.111.5:6443: connect: connection refused"
level=debug msg="Still waiting for the Kubernetes API: Get https://api.ostest.test.metalkube.org:6443/version?timeout=32s: dial tcp 192.168.111.5:6443: connect: connection refused"

I have ssh-ed to the bootstrap node and found this in the logs:

Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: Flag --runtime-request-timeout has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: Flag --pod-manifest-path has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: Flag --allow-privileged has been deprecated, will be removed in a future version
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: Flag --minimum-container-ttl-duration has been deprecated, Use --eviction-hard or --eviction-soft instead. Will be removed in a future version.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: Flag --cluster-domain has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: Flag --serialize-image-pulls has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.169640    3349 flags.go:33] FLAG: --address="0.0.0.0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.169967    3349 flags.go:33] FLAG: --allow-privileged="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.169978    3349 flags.go:33] FLAG: --allowed-unsafe-sysctls="[]"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.169996    3349 flags.go:33] FLAG: --alsologtostderr="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170006    3349 flags.go:33] FLAG: --anonymous-auth="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170014    3349 flags.go:33] FLAG: --application-metrics-count-limit="100"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170035    3349 flags.go:33] FLAG: --authentication-token-webhook="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170044    3349 flags.go:33] FLAG: --authentication-token-webhook-cache-ttl="2m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170055    3349 flags.go:33] FLAG: --authorization-mode="AlwaysAllow"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170065    3349 flags.go:33] FLAG: --authorization-webhook-cache-authorized-ttl="5m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170073    3349 flags.go:33] FLAG: --authorization-webhook-cache-unauthorized-ttl="30s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170081    3349 flags.go:33] FLAG: --azure-container-registry-config=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170089    3349 flags.go:33] FLAG: --boot-id-file="/proc/sys/kernel/random/boot_id"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170128    3349 flags.go:33] FLAG: --bootstrap-checkpoint-path=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170137    3349 flags.go:33] FLAG: --bootstrap-kubeconfig=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170144    3349 flags.go:33] FLAG: --cert-dir="/var/lib/kubelet/pki"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170153    3349 flags.go:33] FLAG: --cgroup-driver="systemd"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170160    3349 flags.go:33] FLAG: --cgroup-root=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170168    3349 flags.go:33] FLAG: --cgroups-per-qos="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170176    3349 flags.go:33] FLAG: --chaos-chance="0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170187    3349 flags.go:33] FLAG: --client-ca-file=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170195    3349 flags.go:33] FLAG: --cloud-config=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170202    3349 flags.go:33] FLAG: --cloud-provider=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170210    3349 flags.go:33] FLAG: --cluster-dns="[]"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170225    3349 flags.go:33] FLAG: --cluster-domain="cluster.local"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170233    3349 flags.go:33] FLAG: --cni-bin-dir="/opt/cni/bin"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170241    3349 flags.go:33] FLAG: --cni-conf-dir="/etc/cni/net.d"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170249    3349 flags.go:33] FLAG: --config=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170257    3349 flags.go:33] FLAG: --container-hints="/etc/cadvisor/container_hints.json"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170268    3349 flags.go:33] FLAG: --container-log-max-files="5"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170278    3349 flags.go:33] FLAG: --container-log-max-size="10Mi"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170286    3349 flags.go:33] FLAG: --container-runtime="remote"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170293    3349 flags.go:33] FLAG: --container-runtime-endpoint="/var/run/crio/crio.sock"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170315    3349 flags.go:33] FLAG: --containerd="unix:///var/run/containerd.sock"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170324    3349 flags.go:33] FLAG: --containerized="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170332    3349 flags.go:33] FLAG: --contention-profiling="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170340    3349 flags.go:33] FLAG: --cpu-cfs-quota="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170347    3349 flags.go:33] FLAG: --cpu-cfs-quota-period="100ms"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170355    3349 flags.go:33] FLAG: --cpu-manager-policy="none"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170363    3349 flags.go:33] FLAG: --cpu-manager-reconcile-period="10s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170371    3349 flags.go:33] FLAG: --docker="unix:///var/run/docker.sock"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170379    3349 flags.go:33] FLAG: --docker-endpoint="unix:///var/run/docker.sock"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170387    3349 flags.go:33] FLAG: --docker-env-metadata-whitelist=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170409    3349 flags.go:33] FLAG: --docker-only="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170417    3349 flags.go:33] FLAG: --docker-root="/var/lib/docker"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170426    3349 flags.go:33] FLAG: --docker-tls="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170434    3349 flags.go:33] FLAG: --docker-tls-ca="ca.pem"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170441    3349 flags.go:33] FLAG: --docker-tls-cert="cert.pem"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170449    3349 flags.go:33] FLAG: --docker-tls-key="key.pem"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170456    3349 flags.go:33] FLAG: --dynamic-config-dir=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170466    3349 flags.go:33] FLAG: --enable-controller-attach-detach="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170473    3349 flags.go:33] FLAG: --enable-debugging-handlers="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170481    3349 flags.go:33] FLAG: --enable-load-reader="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170488    3349 flags.go:33] FLAG: --enable-server="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170496    3349 flags.go:33] FLAG: --enforce-node-allocatable="[pods]"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170512    3349 flags.go:33] FLAG: --event-burst="10"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170520    3349 flags.go:33] FLAG: --event-qps="5"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170527    3349 flags.go:33] FLAG: --event-storage-age-limit="default=0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170536    3349 flags.go:33] FLAG: --event-storage-event-limit="default=0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170543    3349 flags.go:33] FLAG: --eviction-hard="imagefs.available<15%,memory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170566    3349 flags.go:33] FLAG: --eviction-max-pod-grace-period="0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170576    3349 flags.go:33] FLAG: --eviction-minimum-reclaim=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170586    3349 flags.go:33] FLAG: --eviction-pressure-transition-period="5m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170594    3349 flags.go:33] FLAG: --eviction-soft=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170601    3349 flags.go:33] FLAG: --eviction-soft-grace-period=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170609    3349 flags.go:33] FLAG: --exit-on-lock-contention="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170616    3349 flags.go:33] FLAG: --experimental-allocatable-ignore-eviction="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170623    3349 flags.go:33] FLAG: --experimental-bootstrap-kubeconfig=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170630    3349 flags.go:33] FLAG: --experimental-check-node-capabilities-before-mount="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170637    3349 flags.go:33] FLAG: --experimental-dockershim="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170644    3349 flags.go:33] FLAG: --experimental-dockershim-root-directory="/var/lib/dockershim"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170652    3349 flags.go:33] FLAG: --experimental-fail-swap-on="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170659    3349 flags.go:33] FLAG: --experimental-kernel-memcg-notification="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170666    3349 flags.go:33] FLAG: --experimental-mounter-path=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170673    3349 flags.go:33] FLAG: --fail-swap-on="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170680    3349 flags.go:33] FLAG: --feature-gates=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170690    3349 flags.go:33] FLAG: --file-check-frequency="20s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170700    3349 flags.go:33] FLAG: --global-housekeeping-interval="1m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170708    3349 flags.go:33] FLAG: --google-json-key=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170715    3349 flags.go:33] FLAG: --hairpin-mode="promiscuous-bridge"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170723    3349 flags.go:33] FLAG: --healthz-bind-address="127.0.0.1"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170730    3349 flags.go:33] FLAG: --healthz-port="10248"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170738    3349 flags.go:33] FLAG: --help="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170745    3349 flags.go:33] FLAG: --host-ipc-sources="[*]"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170758    3349 flags.go:33] FLAG: --host-network-sources="[*]"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170771    3349 flags.go:33] FLAG: --host-pid-sources="[*]"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170782    3349 flags.go:33] FLAG: --hostname-override=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170790    3349 flags.go:33] FLAG: --housekeeping-interval="10s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170798    3349 flags.go:33] FLAG: --http-check-frequency="20s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170805    3349 flags.go:33] FLAG: --image-gc-high-threshold="85"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170812    3349 flags.go:33] FLAG: --image-gc-low-threshold="80"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170819    3349 flags.go:33] FLAG: --image-pull-progress-deadline="1m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170827    3349 flags.go:33] FLAG: --image-service-endpoint=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170836    3349 flags.go:33] FLAG: --iptables-drop-bit="15"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170844    3349 flags.go:33] FLAG: --iptables-masquerade-bit="14"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170851    3349 flags.go:33] FLAG: --keep-terminated-pod-volumes="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170858    3349 flags.go:33] FLAG: --kube-api-burst="10"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170865    3349 flags.go:33] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170873    3349 flags.go:33] FLAG: --kube-api-qps="5"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170880    3349 flags.go:33] FLAG: --kube-reserved=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170888    3349 flags.go:33] FLAG: --kube-reserved-cgroup=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170895    3349 flags.go:33] FLAG: --kubeconfig=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170902    3349 flags.go:33] FLAG: --kubelet-cgroups=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170910    3349 flags.go:33] FLAG: --lock-file=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170917    3349 flags.go:33] FLAG: --log-backtrace-at=":0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170925    3349 flags.go:33] FLAG: --log-cadvisor-usage="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170933    3349 flags.go:33] FLAG: --log-dir=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170939    3349 flags.go:33] FLAG: --log-flush-frequency="5s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170947    3349 flags.go:33] FLAG: --logtostderr="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170956    3349 flags.go:33] FLAG: --machine-id-file="/etc/machine-id,/var/lib/dbus/machine-id"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170965    3349 flags.go:33] FLAG: --make-iptables-util-chains="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170972    3349 flags.go:33] FLAG: --manifest-url=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170979    3349 flags.go:33] FLAG: --manifest-url-header=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170990    3349 flags.go:33] FLAG: --master-service-namespace="default"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.170998    3349 flags.go:33] FLAG: --max-open-files="1000000"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171008    3349 flags.go:33] FLAG: --max-pods="110"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171015    3349 flags.go:33] FLAG: --maximum-dead-containers="-1"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171023    3349 flags.go:33] FLAG: --maximum-dead-containers-per-container="1"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171030    3349 flags.go:33] FLAG: --minimum-container-ttl-duration="6m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171037    3349 flags.go:33] FLAG: --minimum-image-ttl-duration="2m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171045    3349 flags.go:33] FLAG: --network-plugin=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171051    3349 flags.go:33] FLAG: --network-plugin-mtu="0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171058    3349 flags.go:33] FLAG: --node-ip=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171065    3349 flags.go:33] FLAG: --node-labels=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171075    3349 flags.go:33] FLAG: --node-status-max-images="50"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171084    3349 flags.go:33] FLAG: --node-status-update-frequency="10s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171108    3349 flags.go:33] FLAG: --non-masquerade-cidr="10.0.0.0/8"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171119    3349 flags.go:33] FLAG: --oom-score-adj="-999"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171127    3349 flags.go:33] FLAG: --pod-cidr=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171133    3349 flags.go:33] FLAG: --pod-infra-container-image="k8s.gcr.io/pause:3.1"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171141    3349 flags.go:33] FLAG: --pod-manifest-path="/etc/kubernetes/manifests"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171149    3349 flags.go:33] FLAG: --pod-max-pids="-1"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171157    3349 flags.go:33] FLAG: --pods-per-core="0"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171164    3349 flags.go:33] FLAG: --port="10250"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171172    3349 flags.go:33] FLAG: --protect-kernel-defaults="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171180    3349 flags.go:33] FLAG: --provider-id=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171188    3349 flags.go:33] FLAG: --qos-reserved=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171195    3349 flags.go:33] FLAG: --read-only-port="10255"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171203    3349 flags.go:33] FLAG: --really-crash-for-testing="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171210    3349 flags.go:33] FLAG: --redirect-container-streaming="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171217    3349 flags.go:33] FLAG: --register-node="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171225    3349 flags.go:33] FLAG: --register-schedulable="true"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171233    3349 flags.go:33] FLAG: --register-with-taints=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171242    3349 flags.go:33] FLAG: --registry-burst="10"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171249    3349 flags.go:33] FLAG: --registry-qps="5"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171256    3349 flags.go:33] FLAG: --resolv-conf="/etc/resolv.conf"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171264    3349 flags.go:33] FLAG: --root-dir="/var/lib/kubelet"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171271    3349 flags.go:33] FLAG: --rotate-certificates="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171278    3349 flags.go:33] FLAG: --rotate-server-certificates="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171285    3349 flags.go:33] FLAG: --runonce="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171292    3349 flags.go:33] FLAG: --runtime-cgroups=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171314    3349 flags.go:33] FLAG: --runtime-request-timeout="10m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171322    3349 flags.go:33] FLAG: --seccomp-profile-root="/var/lib/kubelet/seccomp"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171330    3349 flags.go:33] FLAG: --serialize-image-pulls="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171337    3349 flags.go:33] FLAG: --stderrthreshold="2"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171345    3349 flags.go:33] FLAG: --storage-driver-buffer-duration="1m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171352    3349 flags.go:33] FLAG: --storage-driver-db="cadvisor"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171362    3349 flags.go:33] FLAG: --storage-driver-host="localhost:8086"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171369    3349 flags.go:33] FLAG: --storage-driver-password="root"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171376    3349 flags.go:33] FLAG: --storage-driver-secure="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171384    3349 flags.go:33] FLAG: --storage-driver-table="stats"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171391    3349 flags.go:33] FLAG: --storage-driver-user="root"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171398    3349 flags.go:33] FLAG: --streaming-connection-idle-timeout="4h0m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171405    3349 flags.go:33] FLAG: --sync-frequency="1m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171413    3349 flags.go:33] FLAG: --system-cgroups=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171420    3349 flags.go:33] FLAG: --system-reserved=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171427    3349 flags.go:33] FLAG: --system-reserved-cgroup=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171436    3349 flags.go:33] FLAG: --tls-cert-file=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171443    3349 flags.go:33] FLAG: --tls-cipher-suites="[]"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171460    3349 flags.go:33] FLAG: --tls-min-version=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171467    3349 flags.go:33] FLAG: --tls-private-key-file=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171474    3349 flags.go:33] FLAG: --v="2"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171482    3349 flags.go:33] FLAG: --version="false"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171495    3349 flags.go:33] FLAG: --vmodule=""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171502    3349 flags.go:33] FLAG: --volume-plugin-dir="/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171510    3349 flags.go:33] FLAG: --volume-stats-agg-period="1m0s"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171551    3349 feature_gate.go:206] feature gates: &{map[]}
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.171613    3349 feature_gate.go:206] feature gates: &{map[]}
Apr 04 09:09:32 localhost.localdomain systemd[1]: Started Kubernetes systemd probe.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.684901    3349 mount_linux.go:179] Detected OS with systemd
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.685110    3349 server.go:415] Version: v1.12.4+2a194a0f02
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.685195    3349 feature_gate.go:206] feature gates: &{map[]}
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.685276    3349 feature_gate.go:206] feature gates: &{map[]}
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.685472    3349 plugins.go:99] No cloud provider specified.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.685490    3349 server.go:531] No cloud provider specified: "" from the config file: ""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: W0404 09:09:32.685632    3349 server.go:554] standalone mode, no API client
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.687831    3349 manager.go:155] cAdvisor running in container: "/sys/fs/cgroup/cpu,cpuacct/system.slice/kubelet.service"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.692798    3349 fs.go:142] Filesystem UUIDs: map[125df5c3-f244-4ce5-ae4e-562e2ccc3b3a:/dev/vda2 dd7e4c4e-77dc-44de-8559-e1c8f77541e8:/dev/vda1]
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.692819    3349 fs.go:143] Filesystem partitions: map[tmpfs:{mountpoint:/dev/shm major:0 minor:19 fsType:tmpfs blockSize:0} /dev/vda2:{mountpoint:/var major:252 minor:2 fsType:xfs blockSize:0} /dev/vda1:{mountpoint:/boot major:252 minor:1 fsType:xfs blockSize:0}]
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.698649    3349 manager.go:229] Machine: {NumCores:4 CpuFrequency:1999999 MemoryCapacity:4142034944 HugePages:[{PageSize:1048576 NumPages:0} {PageSize:2048 NumPages:0}] MachineID:c49cdfb46f024e0a99cf775dadd55056 SystemUUID:C49CDFB4-6F02-4E0A-99CF-775DADD55056 BootID:bcf913e9-f83f-48f2-a067-4892b26be5b8 Filesystems:[{Device:tmpfs DeviceMajor:0 DeviceMinor:19 Capacity:2071015424 Type:vfs Inodes:505619 HasI
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.699868    3349 manager.go:235] Version: {KernelVersion:3.10.0-957.5.1.el7.x86_64 ContainerOsVersion:Red Hat Enterprise Linux CoreOS 400.7.20190306.0 DockerVersion:Unknown DockerAPIVersion:Unknown CadvisorVersion: CadvisorRevision:}
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: W0404 09:09:32.699966    3349 server.go:472] No api server defined - no events will be sent to API server.
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.699980    3349 server.go:634] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.700431    3349 container_manager_linux.go:247] container manager verified user specified cgroup-root exists: []
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.700452    3349 container_manager_linux.go:252] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[pods:{
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.700574    3349 container_manager_linux.go:271] Creating device plugin manager: true
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.700595    3349 manager.go:108] Creating Device Plugin manager at /var/lib/kubelet/device-plugins/kubelet.sock
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.700783    3349 state_mem.go:36] [cpumanager] initializing new in-memory state store
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.702985    3349 server.go:989] Using root directory: /var/lib/kubelet
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.703480    3349 kubelet.go:279] Adding pod path: /etc/kubernetes/manifests
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.703934    3349 file.go:68] Watching path "/etc/kubernetes/manifests"
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: W0404 09:09:32.710117    3349 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.711264    3349 resolver_conn_wrapper.go:64] dialing to target with scheme: ""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.711305    3349 resolver_conn_wrapper.go:70] could not get resolver for scheme: ""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: W0404 09:09:32.711328    3349 util_unix.go:75] Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.711364    3349 resolver_conn_wrapper.go:64] dialing to target with scheme: ""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: I0404 09:09:32.711376    3349 resolver_conn_wrapper.go:70] could not get resolver for scheme: ""
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: W0404 09:09:32.712239    3349 clientconn.go:944] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory"; Reconnecting to {/var/run/crio/crio.sock 0  <nil>}
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: W0404 09:09:32.712568    3349 clientconn.go:944] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory"; Reconnecting to {/var/run/crio/crio.sock 0  <nil>}
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: E0404 09:09:32.712714    3349 remote_runtime.go:72] Version from runtime service failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: E0404 09:09:32.712787    3349 kuberuntime_manager.go:183] Get runtime version failed: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Apr 04 09:09:32 localhost.localdomain hyperkube[3349]: F0404 09:09:32.712806    3349 server.go:262] failed to run Kubelet: failed to create kubelet: rpc error: code = Unavailable desc = grpc: the connection is unavailable
Apr 04 09:09:32 localhost.localdomain systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
Apr 04 09:09:32 localhost.localdomain systemd[1]: Failed to start Kubernetes Kubelet.
Apr 04 09:09:32 localhost.localdomain systemd[1]: Unit kubelet.service entered failed state.
Apr 04 09:09:32 localhost.localdomain systemd[1]: kubelet.service failed.
Apr 04 09:09:32 localhost.localdomain systemd[1]: Starting Preprocess NFS configuration...
Apr 04 09:09:32 localhost.localdomain systemd[1]: Started Bootstrap a Kubernetes cluster.
Apr 04 09:09:32 localhost.localdomain systemd[1]: Started Bootstrap an OpenShift cluster.

stack user required could cause issues if not checked

The scripts assume that the user 'stack' will be used to run the process but there's no check to verify and/or enforce that it's the actual user running the commands.
This could cause some issues during the process, e.g.

cat: /home/stack/ironic_nodes.json: Permission denied

due to the stack home dir being set by default at 700
A workaround exists for this issue, but others could arise.
In general, having the stack user hardcoded is not ideal.

ocp stops to work after rebooting the nodes

after restarting the nodes, on a virtual setup, ocp no longer works, the api endpoint doesnt respond though the corresponding vip is running on one of the master nodes.

Baremetal installation requires running 'patch_ep_host_etcd' while installing

It seems in baremetal deployments it is required to run:

#!/usr/bin/env bash
set -x
set -e

source logging.sh
source utils.sh
source common.sh
source ocp_install_env.sh

# Update kube-system ep/host-etcd used by cluster-kube-apiserver-operator to
# generate storageConfig.urls
patch_ep_host_etcd "$CLUSTER_DOMAIN"

While the cluster is installing in the 'Waiting up to 1h0m0s for the bootstrap-complete event...' step:

level=debug msg="Still waiting for the Kubernetes API: Get https://api.kni1.cloud.lab.eng.bos.redhat.com:6443/version?timeout=32s: dial tcp 10.19.138.14:6443: connect: connection refused"
level=info msg="API v1.13.4+938b976 up"
level=info msg="Waiting up to 1h0m0s for the bootstrap-complete event..."
level=debug msg="added kube-controller-manager.1597d0ec8231f284: localhost.localdomain_a4283191-6507-11e9-93be-2af39db2a27a became leader"
level=debug msg="added kube-scheduler.1597d0ecbce9cf55: localhost.localdomain_a426c179-6507-11e9-ada6-2af39db2a27a became leader"