Giter Club home page Giter Club logo

ocp4-vsphere-upi-automation's Introduction

OCP 4.x VMware vSphere and Hybrid UPI Automation

Note
This repository was derived from the original works of Mike Allmen and Vijay Chintalapati located in the Official Red Hat Official GitHub repo

The goal of this repo is to automate the deployment (and redeployment) of OpenShift v4 clusters. Using the same repo and with minor tweaks, it can be applied to any version of OpenShift higher than 4.4. As it stands right now, the repo works for several installation use cases:

  • vSphere cluster (3 node master only or traditional 5+ node clusters with worker nodes)

  • Hybrid cluster (vSphere masters and baremetal workers)

  • Static IPs for nodes (lack of isolated network to let helper run DHCP server)

  • DHCP/Dynamic IPs for nodes (requires reservations in DHCP server config)

  • w/o Cluster-wide Proxy (HTTP and SSL/TLS with certs supported)

  • Restricted network (with or without DHCP)

  • No Cloud Provider (Useful for mixed clusters with both virtual and physical Nodes)

This repo is most ideal for Home Lab and Proof-of-Concept scenarios. Having said that, if prerequisites (below) can be met and if the vCenter service account can be locked down to access only certain resources and perform only certain actions, the same repo can then be used for DEV or higher environments. Refer to the Required vCenter account privileges section in the OCP documentation for more details on required permissions for a vCenter service account.

Quickstart

The quickstart section is a brief summary of everything you need to do to use this repo. There are more details later in this document.

  1. Setup helper node or ensure appropriate services (DNS/DHCP/LB/etc.) are available and properly referenced.

  2. Copy group_vars/all.yml into a new file under the clusters folder named the same as your cluster with a .yaml extension and only change the parts that are required

  3. Customize ansible.cfg and use/copy/modify staging inventory file as required

  4. Run one of the several install options

Note
In your cluster vars file created in step 2 you only need to add override vars. The group_vars/all.yaml file will be the defaults if not overridden in the cluster file.

Prerequisites

  1. vSphere ESXi and vCenter 6.7 (or higher) installed

  2. A datacenter created with a vSphere host added to it, a datastore exists and has adequate capacity

  3. The playbook(s) assumes you are running a helper node in the same network to provide all the necessary services such as [DHCP/DNS/HAProxy as LB]. Also, the MAC addresses for the machines should match between helper repo and this. If not using the helper node, the minimum expectation is that the webserver and tftp server (for PXE boot) are running on the same external host, which we will then treat as a helper node.

  4. The necessary services such as [DNS/DHCP/LB(Load Balancer)] must be up and running before this repo can be used

  5. Python 3+ and the following modules installed

    • openshift

  6. Ansible 2.11+

  7. Ansible Galaxy modules

    • kubernetes.core

    • community.general

    • community.crypto

    • community.vmware

    • ansible.posix

Installation Steps

Variables

Pre-populated entries in group_vars/all.yml are used as default values, to customize further you need to create a cluster file under the clusters folder. Any updates described below refer to changes made in cluster files (See: example cluster file) unless otherwise specified.

Default Values (Too much detail? Click here.)
  • The helper_vm_ip and helper_vm_port are used to build the bootstrap_ignition_url and the no_proxy values if there is a proxy in the environment.

  • The config key and it’s child keys are for cluster settings

  • The nodes key is how you define the nodes, this array will get further split by type as set in each node object.

    • If you delete macaddr from the node dictionaries VMware will auto-generate your MAC addresses. If you are using DHCP, defining macaddr will allow you to reserve the specified IP addresses on your DHCP server to ensure the OpenShift nodes always get the same IP address.

  • The vm_mods key allows you to specify hotadd and core_per_socket options on the vms. These settings are optional.

  • The static_ips key and it’s child keys are used for non-DHCP configurations.

  • The network_modifications key Network CIDRs default to sensible ranges. If a conflict is present (these ranges of addresses are assigned elsewhere in the organization), you may select other non-conflicting CIDR ranges by changing "enabled: false" to "enabled: true" and entering the new ranges. The ranges shown in the repository are the ones that are used by default, even if "enabled: false" is left as it is.

    • The machine network is the network on which the VMs are created. Be sure to specify the right machine network if you set enabled: true

  • The proxy key and it’s child keys are for configuring cluster-wide proxy settings

  • The registry key and it’s child keys are for configuring offline or disconnected registries for clusters in restricted networks

  • The ntp key and it’s child keys are for configuring time servers to keep the cluster in sync

  • The f5 key and it’s child keys are for configuring the F5 Load Balancer (if applicable)

Set Ansible Inventory and Configuration

Now configure ansible.cfg and staging inventory file based on your environment before picking one of the 5 different install options listed below.

Update the staging inventory file

Under the webservers.hosts entry, use one of two options below:

  1. localhost : if the ansible-playbook is being run on the same host as the webserver that would eventually host bootstrap.ign file

  2. the IP address or FQDN of the machine that would run the webserver.

Update the ansible.cfg based on your needs

  • Running the playbook as a root user

    • If the localhost runs the webserver

    [defaults]
    host_key_checking = False
  • If the remote host runs the webserver

    [defaults]
    host_key_checking = False
    remote_user = root
    ask_pass = True
  • Running the playbook as a non-root user

    • If the localhost runs the webserver

    [defaults]
    host_key_checking = False

    [privilege_escalation]
    become_ask_pass = True
  • If the remote host runs the webserver

    [defaults]
    host_key_checking = False
    remote_user = root
    ask_pass = True

    [privilege_escalation]
    become_ask_pass = True

Run Installation Playbook

Static IPs
# Option 1: Static IPs + use of OVA template
ansible-playbook -i staging -e cluster=[cluster_name] static_ips_ova.yml

# Option 2: ISO + Static IPs
ansible-playbook -i staging -e cluster=[cluster_name] static_ips.yml
DHCP - Refer to restricted.adoc[] file for more details
# Option 3: DHCP + use of OVA template
ansible-playbook -i staging -e cluster=[cluster_name] dhcp_ova.yml

# Option 4: DHCP + PXE boot
ansible-playbook -i staging -e cluster=[cluster_name] dhcp_pxe.yml
Restricted Networks - Refer to restricted.adoc file for more details
# Option 5: DHCP + use of OVA template in a Restricted Network
ansible-playbook -i staging -e cluster=[cluster_name] restricted_dhcp_ova.yml

# Option 6: Static IPs + use of ISO images in a Restricted Network
ansible-playbook -i staging -e cluster=[cluster_name] restricted_static_ips.yml


# Option 7: Static IPs + use of OVA template in a Restricted Network
# Note: OpenShift 4.6 or higher required
ansible-playbook -i staging -e cluster=[cluster_name] restricted_static_ips_ova.yml

Miscellaneous

  • If you are re-running the installation playbook make sure to blow away any existing VMs (in ocp4 folder) listed below:

    • bootstrap

    • masters

    • workers

    • rhcos-vmware template (if not using the extra param as shown below)

  • If a template by the name rhcos-vmware already exists in vCenter, you want to reuse it and skip the OVA download from Red Hat and upload into vCenter, use the following extra param.

  -e skip_ova=true
  • If you would rather want to clean all folders bin, downloads, install-dir and re-download all the artifacts, append the following to the command you chose in the first step

  -e clean=true

Expected Outcome

  1. Necessary Linux packages installed for the installation. NOTE: support for Mac client to run this automation has been added but is not guaranteed to be complete

  2. SSH key-pair generated, with key ~/.ssh/ocp4 and public key ~/.ssh/ocp4.pub

  3. Necessary folders [bin, downloads, downloads/ISOs, install-dir] created

  4. OpenShift client, install and .ova binaries downloaded to the downloads folder

  5. Unzipped versions of the binaries installed in the bin folder

  6. In the install-dir folder:

  7. append-bootstrap.ign file with the HTTP URL of the boostrap.ign file

  8. master.ign and worker.ign

  9. base64 encoded files (append-bootstrap.64, master.64, worker.64) for (append-bootstrap.ign, master.ign, worker.ign) respectively. This step assumes you have base64 installed and in your $PATH

  10. The bootstrap.ign is copied over to the web server in the designated location

  11. A folder is created in the vCenter under the mentioned datacenter and the template is imported

  12. The template file is edited to carry certain default settings and runtime parameters common to all the VMs

  13. VMs (bootstrap, master0-2, worker0-2) are generated in the designated folder and (in state of) poweredon

Post Install (Hybrid clusters)

In the event that you need to add nodes to a hybrid cluster post install, there is a new_worker_iso.yml that can generate additional ISOs for new nodes. The requirements to this playbook are the same as the other playbooks here with 1 exception, you need to create a new {{ clusters_folder }}/{{ cluster }}_additional_nodes.yaml file. The format of that file is as follows:

Example 1. Additional node file

By calling this file we override the node type arrays found in the main cluster file to either an empty array [] or an array of new nodes. This allows us to only create new ISOs not re-create any ISOs you have already created using the static_ips playbook and do not wish to re-create.

Note
If you wish to re-create any previously created ISOs then make sure that the node is represented in this file as well when calling this playbook.
Note
The role that we use for this playbook is a shared role and is used by the static_ips playbook as well. This means that we need the same variables defined in this playbook as we had defined in the static_ips playbook.
Example run
ansible-playbook -i staging -e "cluster=ocp-example" new_worker_isos.yml

Final Check:

If everything goes well you should be able validate the cluster using the included validateCluster.yml playbook.

$ ansible-playbook -i staging -e 'cluster=mycluster' -e "username=kubeadmin" -e "password=$(cat install-dir/auth/kubeadmin-password)" validateCluster.yml

You can also manually review with the following commands:

Manually review the cluster objects after install
oc --kubeconfig=$(pwd)/install-dir/auth/kubeconfig get nodes
oc --kubeconfig=$(pwd)/install-dir/auth/kubeconfig co
oc --kubeconfig=$(pwd)/install-dir/auth/kubeconfig get mcp
oc --kubeconfig=$(pwd)/install-dir/auth/kubeconfig get csr
Note
You can also export KUBECONFIG=$(pwd)/install-dir/auth/kubeconfig rather than using --kubeconfig= on oc commands. Always remember to unset KUBECONFIG when done though to avoid corrupting your system:admin kubeconfig. It is the only copy of this special users kubeconfig.

In the works and wishlist (Call to arms)

Note
Contributions are Welcomed!

This repo is always in a state of development and as we all know OpenShift updates/changes can often break automation code. This means that we will from time to time need to update plays, tasks, and even vars to reflect these new changes. Also, this is a derived work and not all of the code has been thoroughly tested (specifically restricted and dhcp requires updating). So please, do feel free to fork this code and contribute changes where needed!

Actively in development

  • Code cleanup/refactoring

Wishlist

  • More common roles and tasks and less duplication of code

  • One playbook to rule them all (using tags?)

ocp4-vsphere-upi-automation's People

Contributors

bmarlow avatar christianh814 avatar cptmorgan-rh avatar ddreggors avatar dlbewley avatar gauthiersiri avatar jimbarlow avatar kllkss avatar mallmen avatar marcno avatar mj12301 avatar therevoman avatar thoward-rh avatar vchintal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocp4-vsphere-upi-automation's Issues

/var/www/html/install does not exist

It seems that missing web server installed

  • Log Output
PLAY [webservers] *****************************************************************************

TASK [Copy the all the ignition files over to the webserver] **********************************
changed: [localhost] => (item=/root/ocp4-vsphere-upi-automation-1/install-dir/master.ign)      changed: [localhost] => (item=/root/ocp4-vsphere-upi-automation-1/install-dir/worker.ign)
changed: [localhost] => (item=/root/ocp4-vsphere-upi-automation-1/install-dir/bootstrap.ign)

TASK [webserver : Downloading OpenShift installer raw image] **********************************
fatal: [localhost]: FAILED! => {"changed": false, "checksum_dest": null, "checksum_src": "a2219eafe19be21b3556d58734f138a6991d3221", "dest": "/var/www/html/install/rhcos.raw.gz", "elapsed":
193, "msg": "Destination /var/www/html/install does not exist", "src": "/root/.ansible/tmp/ansi
ble-tmp-1595844691.17-4275-200989382286213/tmpLJnDis", "url": "https://mirror.openshift.com/pub
/openshift-v4/x86_64/dependencies/rhcos/latest/latest/rhcos-4.5.2-x86_64-metal.x86_64.raw.gz"}

PLAY RECAP ************************************************************************************
localhost                  : ok=24   changed=9    unreachable=0    failed=1    skipped=11   res
cued=0    ignored=0

static ips using iso is broken

As a result of changes in how the ISO image is laid out now in 4.6, the way the custom ISO images are built needs to be updated.

Copying ignition files needs root permission

The install should be able to run without root privileges on the control host/helper node. Copying the ignition files to /var/www/html/ignition requires become: true in order for this to run as a non-root user.

correct location for install-config?

TASK [common : Generate the ignition manifests] *************************************************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["openshift-install", "create", "manifests", "--dir=/ocp/ocp4-vsphere-upi-automation/install-dir"], "delta": "0:00:00.106394", "end": "2020-04-09 14:31:55.273405", "msg": "non-zero return code", "rc": 1, "start": "2020-04-09 14:31:55.167011", "stderr": "level=fatal msg="failed to fetch Master Machines: failed to load asset \"Install Config\": invalid \"install-config.yaml\" file: pullSecret: Invalid value: \"{\\\"auths\\\": \\\"...\\\"}\": json: cannot unmarshal string into Go struct field imagePullSecret.auths of type map[string]map[string]interface {}"", "stderr_lines": ["level=fatal msg="failed to fetch Master Machines: failed to load asset \"Install Config\": invalid \"install-config.yaml\" file: pullSecret: Invalid value: \"{\\\"auths\\\": \\\"...\\\"}\": json: cannot unmarshal string into Go struct field imagePullSecret.auths of type map[string]map[string]interface {}""], "stdout": "", "stdout_lines": []}

My install is suffering a fatal error as seen above. I'm executing from the /ocp/ocp4-vsphere-upi-automation directory and my /root/ocp4/install-config.yaml is as per the directions for the helper node. Where does this need to be?

2 issuse I will put them both in here

2 issues but I used a work around on the first one and then got stuck on the second one.
it fails here
TASK [vmware : Create the vCenter folder by the same name as the cluster, only if it doesn't exist] ******************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "govc folder.create /Lop_OCP_Datacenter/vm/ocp4-xdfh4", "msg": "[Errno 8] Exec format error: b'govc'", "rc": 8}

So I have to go on tar the file and then comment out the the govc line in all.yml and command out the download and unarchive and make govc excutable lines in pre-install.yml

Environment
OS: rhel 8.4
Ansible Version: 2.9.21-1.el8
OpenShift Version: 4.7
ESXi Version: 7

Commands used to run ansible-playbook
ansible-playbook -vvv -i staging static_ips_ova.yml

Second problem that i am stuck on : i think it needs mac addresses but how do you get mac addresses for vm's not created yet?

Ansible Playbook Output

see ErrorCreateBootstrap.txt
ErrorCreateBootstrap.txt

Ansible group_vars

[root@x150n151 ocp4-vsphere-upi-automation]# cat group_vars/all.yml
helper_vm_ip: 129.40.80.151
bootstrap_ignition_url: "http://{{helper_vm_ip}}:8080/ignition/bootstrap.ign"
config:
provider: vsphere
base_domain: ocphclvmware.com
cluster_name: ocp4
fips: false
isolationMode: NetworkPolicy
installer_ssh_key: "{{ lookup('file', '/.ssh/ocp4.pub') }}"
pull_secret: "{{ lookup('file', '
/pull-secret.json') }}"
vcenter:
ip: 129.40.80.150
datastore: OCP_vms_Datastore_Internal
network: VM Network
service_account_username: [email protected]
service_account_password: pw4LOPlocalteam!
admin_username: [email protected]
admin_password: pw4LOPlocalteam!
datacenter: Lop_OCP_Datacenter
folder_absolute_path:
vm_power_state: poweredon
template_name: rhcos-vmware
download:
clients_url: https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.7.13/
dependencies_url: https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/4.7/4.7.13/

govc: https://github.com/vmware/govmomi/releases/download/v0.26.0/govc_Linux_x86_64.tar.gz

bootstrap_vms:

  • { name: "bootstrap", ipaddr: "129.40.80.157", cpu: 4, ram: 16384}
    master_vms:
  • { name: "master0", ipaddr: "129.40.80.152", cpu: 4, ram: 16384}
  • { name: "master1", ipaddr: "129.40.80.153", cpu: 4, ram: 16384}
  • { name: "master2", ipaddr: "129.40.80.154", cpu: 4, ram: 16384}
    worker_vms:
  • { name: "worker0", ipaddr: "129.40.80.155", cpu: 4, ram: 16384}
  • { name: "worker1", ipaddr: "129.40.80.156", cpu: 4, ram: 16384}
    static_ip:
    gateway: 129.40.80.190
    netmask: 255.255.255.192
    dns: "{{ helper_vm_ip }}"
    network_interface_name: ens192
    proxy:
    enabled: false
    http_proxy: http://helper.ocp4.example.com:3129
    https_proxy: http://helper.ocp4.example.com:3129
    no_proxy: example.com
    cert_content: |
    -----BEGIN CERTIFICATE-----

    -----END CERTIFICATE-----
    registry:
    ErrorCreateBootstrap.txt

enabled: false
product_repo: openshift-release-dev
product_release_name: ocp-release
product_release_version: 4.4.0-x86_64
username: ansible
password: ansible
email: [email protected]
cert_content:
host: registry.ocp4.example.com
port: 5000
repo: ocp4/openshift4
ntp:
custom: false
ntp_server_list:
- 0.rhel.pool.ntp.org
- 1.rhel.pool.ntp.org


vSphere Components - details !!

Team,

Per the looks of it - vSphere is a pretty huge product suite and perhaps has "many components" under it.
I know for sure we need ESXi, perhaps can you point/document what else would be needed?

Apply patch /root/ocp4-vsphere-upi-automation/roles/static_ips/files/bootstrap-isolinux.cfg.patch error

Hi Experts

I got below error when building offline OCP4 on vSphere6.7, running offline-registry on helper node

error msg:

TASK [Apply patch /root/ocp4-vsphere-upi-automation/roles/static_ips/files/bootstrap-isolinux.cfg.patch] ***************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: PatchError: 2 out of 2 hunks FAILED
fatal: [localhost]: FAILED! => {"changed": false, "msg": "2 out of 2 hunks FAILED\n"}

How I setup the helper node:

ansible-playbook -e @vars.yaml tasks/main.yml
ansible-playbook -i staging restricted_static_ips.yml

My vars.yaml for helper node setup:

---
disk: vda
helper:
  name: "helper"
  ipaddr: "192.168.7.77"
dns:
  domain: "homelab.kdkd"
  clusterid: "ocp4"
  forwarder1: "8.8.8.8"
  forwarder2: "8.8.4.4"
dhcp:
  router: "192.168.7.1"
  bcast: "192.168.7.255"
  netmask: "255.255.255.0"
  poolstart: "192.168.7.10"
  poolend: "192.168.7.30"
  ipid: "192.168.7.0"
  netmaskid: "255.255.255.0"
bootstrap:
  name: "bootstrap"
  ipaddr: "192.168.7.20"
  macaddr: "52:54:00:60:72:67"
masters:
  - name: "master0"
    ipaddr: "192.168.7.21"
    macaddr: "52:54:00:e7:9d:67"
  - name: "master1"
    ipaddr: "192.168.7.22"
    macaddr: "52:54:00:80:16:23"
  - name: "master2"
    ipaddr: "192.168.7.23"
    macaddr: "52:54:00:d5:1c:39"
workers:
  - name: "worker0"
    ipaddr: "192.168.7.11"
    macaddr: "52:54:00:f4:26:a1"
  - name: "worker1"
    ipaddr: "192.168.7.12"
    macaddr: "52:54:00:82:90:00"
  - name: "worker2"
    ipaddr: "192.168.7.13"
    macaddr: "52:54:00:8e:10:34"
other:
  - name: "registry"
    ipaddr: "192.168.7.77"
    macaddr: "00:50:56:82:a8:ec"

the groups_var/all.yaml

helper_vm_ip: 192.168.7.77
bootstrap_ignition_url: "http://{{helper_vm_ip}}:8080/ignition/bootstrap.ign"
config:
  provider: vsphere
  base_domain: homelab.kdkd
  cluster_name: ocp4
  fips: false
  isolationMode: NetworkPolicy
  installer_ssh_key: "{{ lookup('file', '~/.ssh/ocp4.pub') }}"
  pull_secret: {"auths"...}}}
vcenter:
  ip: 192.168.31.140
  datastore: datastore1
  network: Internal Network
  service_account_username: [email protected]
  service_account_password: I3core1024m!@#
  admin_username: [email protected]
  admin_password: I3core1024m!@#
  datacenter: homelab
  folder_absolute_path:
  vm_power_state: poweredon
  template_name: rhcos-vmware
download:
  clients_url: https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest
  dependencies_url: https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/latest/latest
  govc: https://github.com/vmware/govmomi/releases/download/v0.23.0/govc_linux_amd64.gz
bootstrap_vms:
  - { name: "bootstrap", macaddr: "00:50:56:a8:aa:a1", ipaddr: "192.168.7.181", cpu: 4, ram: 16384}
master_vms:
  - { name: "master1", macaddr: "00:50:56:a8:aa:a2", ipaddr: "192.168.7.182", cpu: 4, ram: 16384}
  - { name: "master2", macaddr: "00:50:56:a8:aa:a3", ipaddr: "192.168.7.183", cpu: 4, ram: 16384}
  - { name: "master3", macaddr: "00:50:56:a8:aa:a4", ipaddr: "192.168.7.184", cpu: 4, ram: 16384}
worker_vms:
  - { name: "worker1", macaddr: "00:50:56:a8:aa:a5", ipaddr: "192.168.7.185", cpu: 4, ram: 8192}
  - { name: "worker2", macaddr: "00:50:56:a8:aa:a6", ipaddr: "192.168.7.186", cpu: 4, ram: 8192}
  - { name: "worker3", macaddr: "00:50:56:a8:aa:a7", ipaddr: "192.168.7.187", cpu: 4, ram: 8192}
static_ip:
  gateway: 192.168.7.1
  netmask: 255.255.255.0
  dns: "{{ helper_vm_ip }}"
  network_interface_name: ens224
proxy:
  enabled: false
  http_proxy: http://helper.ocp4.example.com:3129
  https_proxy: http://helper.ocp4.example.com:3129
  no_proxy: example.com
  cert_content: |
    -----BEGIN CERTIFICATE-----
        <certficate content>
    -----END CERTIFICATE-----
registry:
  enabled: true
  product_repo: openshift-release-dev
  product_release_name: ocp-release
  product_release_version: 4.6.0-x86_64
  username: ansible
  password: ansible
  email: [email protected]
  cert_content:
  host: registry.ocp4.homelab.kdkd
  port: 5000
  repo: ocp4/openshift4
ntp:
  custom: true
  ntp_server_list:
    - 0.rhel.pool.ntp.org
    - 1.rhel.pool.ntp.org

Expect the playbook running well and return me a ocp cluster
Thank you

In OCP 4.5 the VMware StorageClass now looks for the infraID not the Cluster Name

Starting in OCP 4.5 the cloud.conf configuration in the kube-cloud-config configmap now points to //vm/infraID e.g. "/LAB/vm/openshift-hkzhw"

Currently the playbook creates a folder in VMware using the Cluster name e.g. "/LAB/vm/openshift".

This results in a failure when trying to create a PVC using the default StorageClass created by the UPI installer.

I have created the following PR #47 which moves when the fact that is created.

No check for rsync to be installed

static_ips role uses Ansible synchronize module which requires rsync to be installed. Should ensure rsync is installed before using it.

Failed to create a virtual machine

Hi, I hit a bug on vSphere 6.7u3, the error msg is

 msg: 'Failed to create a virtual machine : Customization of the guest operating system ''rhel7_64Guest'' is not supported in this configuration. Microsoft Vista (TM) and Linux guests with Logical Volume Manager are supported only for recent ESX host and VMware Tools versions. Refer to vCenter documentation for supported configurations.'

Below is my groups/all.yml file

helper_vm_ip: 192.168.87.180
bootstrap_ignition_url: "http://{{helper_vm_ip}}:8080/ignition/bootstrap.ign"
config:
  provider: vsphere
  base_domain: ocp.com
  cluster_name: ocp4
  fips: false
  networkType: OVNKubernetes
  isolationMode: Multitenant
  installer_ssh_key: "{{ lookup('file', '~/.ssh/helper_rsa.pub') }}"
  pull_secret: "{{ lookup('file', '~/pull-secret.yml') }}"
vcenter:
  ip: 192.168.87.140
  datastore: ssd
  network: Internal Network
  service_account_username: [email protected]
  service_account_password: 123456
  admin_username: [email protected]
  admin_password: 123456
  datacenter: homelab
  folder_absolute_path:
  vm_power_state: poweredon
  template_name: rhcos-vmware
  hw_version: 14
download:
  clients_url: https://mirror.openshift.com/pub/openshift-v4/clients/ocp/latest
  dependencies_url: https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/latest
  govc: https://github.com/vmware/govmomi/releases/download/v0.27.4
bootstrap_vms:
  - { name: "bootstrap", macaddr: "52:54:00:60:72:67", ipaddr: "192.168.87.20", cpu: 4, ram: 16384}
master_vms:
  - { name: "master0", macaddr: "52:54:00:e7:9d:67", ipaddr: "192.168.87.21", cpu: 4, ram: 16384}
  - { name: "master1", macaddr: "52:54:00:80:16:23", ipaddr: "192.168.87.22", cpu: 4, ram: 16384}
  - { name: "master2", macaddr: "52:54:00:d5:1c:39", ipaddr: "192.168.87.23", cpu: 4, ram: 16384}
worker_vms:
  - { name: "worker0", macaddr: "52:54:00:f4:26:a1", ipaddr: "192.168.87.11", cpu: 4, ram: 16384}
  - { name: "worker1", macaddr: "52:54:00:82:90:00", ipaddr: "192.168.87.12", cpu: 4, ram: 16384}
static_ip:
  gateway: 192.168.87.1
  netmask: 255.255.255.0
  dns: "{{ helper_vm_ip }}"
  network_interface_name: ens192
network_modifications:
  enabled: true
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - cidr: 172.30.0.0/16
  machineNetwork:
  - cidr: 192.168.29.0/24
proxy:
  enabled: false
  http_proxy: http://helper.ocp4.example.com:3129
  https_proxy: http://helper.ocp4.example.com:3129
  no_proxy: example.com
  cert_content: |
    -----BEGIN CERTIFICATE-----
        <certficate content>
    -----END CERTIFICATE-----
registry:
  enabled: true
  product_repo: openshift-release-dev
  product_release_name: ocp-release
  product_release_version: 4.11.1-x86_64
  username: ansible
  password: ansible
  email: [email protected]
  cert_content:
  host: registry.ocp4.ocp.com
  port: 5000
  repo: ocp4/openshift4
ntp:
  custom: enable
  ntp_server_list:
    - 0.rhel.pool.ntp.org
    - 1.rhel.pool.ntp.org

I found that the OVF is always deployed as version 13 and RHEL7 rather than version 14 and RHEL8, may I know whether it is correct?

❯ govc vm.info /homelab/vm/rhcos-vmware
Name:           rhcos-vmware
  Path:         /homelab/vm/rhcos-vmware
  UUID:         42020eff-63b1-ad74-b97a-0e6022e8750f
  Guest name:   Red Hat Enterprise Linux 7 (64-bit)

I tried to manually change the OVF to RHEL8 and version 14, but still fail to create new VM

OCP Upgrade

Hi Team,

Do we have any playbook for upgrading the OCP cluster.
Eg: 4.4 to 4.6

Node / Worker Scaling

Hello Everyone,

is there anything new on the playbook for adding new nodes to the cluster?

Best regards

Update govc download link

The version of govc has changed which breaks the download.govc link. I don't see a way to get a link to the "latest" binary version available. The download.govc variable needs to be updated to the current download link.

Accomodate worker vm groups a.k.a MachineConfigPools with custom hardware specs

It would be useful to define groups of worker vms, i.e., infra, storage, app, etc., at provisioning time. Each of these groups might require its own hardware specs, i.e., memory_mb, num_cpus, etc. Finally, it might be useful to optionally automate the creation of OpenShift MachineConfigPools for these worker_vms subgroups.

If the VMs were defined as traditional Ansible Hosts in an inventory file that belonged to a group bootstrap_vms, master_vms, worker_vms, or a worker_vms subgroup, then it would be easy for deployers to customize host properties on a per-group basis in the inventory. This would also be familiar to OpenShift 3.11 on vSphere users moving to this repo from https://github.com/openshift/openshift-ansible-contrib/tree/master/reference-architecture/vmware-ansible.

RHEL 7.9 static_ips_ova playbook issues

When the running the static_ips_ova.yml playbook on a RHEL 7.9 vm using Ansible 2.9 & python2 this error.

TASK [common : Install pyvmomi] ********************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": ["/bin/pip2", "install", "pyvmomi"], "msg": "stdout: Collecting pyvmomi\n  Downloading https://files.pythonhosted.org/packages/da/ec/38044d41138a687930687f6b990c9dfe8d863dfcbae3ea16f346d3a78131/pyvmomi-7.0.2.tar.gz (589kB)\n    Complete output from command python setup.py egg_info:\n    /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'python_requires'\n      warnings.warn(msg)\n    error in pyvmomi setup command: 'extras_require' must be a dictionary whose values are strings or lists of strings containing valid project/version requirement specifiers.\n    \n    ----------------------------------------\n\n:stderr: Command \"python setup.py egg_info\" failed with error code 1 in /tmp/pip-build-HWMnxa/pyvmomi/\nYou are using pip version 8.1.2, however version 21.2.4 is available.\nYou should consider upgrading via the 'pip install --upgrade pip' command.\n"}

Looking at https://pypi.org/project/pyvmomi/ I can see python2 is deprecated.

So from the RHEL 7.9 VM python3 is installed. And yum remove ansible is run. Then ansible is installed with python3 pip3 install ansible.

ansible_python_interpreter: /usr/bin/python3 is added to the inventory file and the static_ips_ova.yml playbook is run again. Things get further but I'm stuck at this error.

TASK [static_ips_ova : Create bootstrap VM from the template] ************************************************************************************************************************************
failed: [localhost] (item={'name': 'bootstrap', 'ipaddr': '10.0.200.110', 'cpu': 4, 'ram': 16384, 'datastore': 'datastore1'}) => {"ansible_loop_var": "item", "changed": false, "item": {"cpu": 4, "datastore": "datastore1", "ipaddr": "10.0.200.110", "name": "bootstrap", "ram": 16384}, "msg": "Failed to create a virtual machine : Customization of the guest operating system 'rhel7_64Guest' is not supported in this configuration. Microsoft Vista (TM) and Linux guests with Logical Volume Manager are supported only for recent ESX host and VMware Tools versions. Refer to vCenter documentation for supported configurations."}

Seems like something with the ansible vmware_guest module and the customvalues options.
https://github.com/RedHatOfficial/ocp4-vsphere-upi-automation/blob/master/roles/static_ips_ova/tasks/main.yml#L71

If we run the same playbook with the same variables on a RHEL 8.4 VM things work fine.

I'm thinking something to do with pyvmomi and python modules.

Any idea what could be the issue? Should this be working from RHEL 7.9? Maybe call out in the README if this is only tested or known to be working on RHEL 8?

Thanks!

VMware folder paths with spaces do not work

There are several places where the vmware.folder_absolute_path are used in the command module which does not work with spaces. Re-factor the tasks to use command.argv to properly handle the spaces.

"govc: please specify a datacenter"

Hello,

while running the playbook we encounter the following problem during the task "TASK [static_ips : Upload all the custom generated ISOs to the datastore]"

Problem:

/ocp4-vsphere-upi-automation/downloads/ISOs/okdmaster03.iso", "okdmaster03.iso"], "delta": "0:00:00.025456", "end": "2020-09-21 16:31:38.623916", "item": {"ipaddr": "OUR IP", "macaddr": "OUR MAC", "name": "okdmaster03"}, "msg": "non-zero return code", "rc": 1, "start": "2020-09-21 16:31:38.598460", "stderr": "govc: please specify a datacenter", "stderr_lines": ["govc: please specify a datacenter"], "stdout": "", "stdout_lines": []}

We encountered the problem on a centos 7.7 machine aswell as a centos 8.2 machine. Both patched to the latest version of ansible.

We defined the datacenter in the /group_vars/all.yml file. Which doesn't help. The datastore which is also defined in the file works flawlessly.

Failed to transfert rhcos-vmware template to the vsphere

Vsphere 6.5

ocp4 repository is well created on vSphere ;
Failed to transfert rhcos-vmware template to the vsphere ,
Could you help me and explain where im doing wrong ?

fatal: [localhost]: FAILED! => {"changed": false, "dest": "/root/ocp4-vsphere-upi-automation/downloads/rhcos-vmware.ova", "elapsed": 10, "gid": 0, "group": "root", "mode": "0644", "msg": "Request failed: ", "owner": "root", "secontext": "system_u:object_r:admin_home_t:s0", "size": 831590400, "state": "file", "uid": 0, "url": "https://mirror.openshift.com/pub/openshift-v4/x86_64/dependencies/rhcos/latest/latest/rhcos-4.3.8-x86_64-vmware.x86_64.ova"}

thank you

SSH Key needs to be created sooner

On a fresh install, the install fails if ~/.ssh/ocp4 ssh key does not already exist. See error below. Need to move the two steps that create the ssh key to the top of the pre_install.yml task file in the common role so the key is created before it gets referenced.

TASK [common : Set the vcenter.folder_absolute_path if not provided] *************************************************************************************************************************************************************************************
[WARNING]: Unable to find '/.ssh/ocp4.pub' in expected paths (use -vvvvv to see paths)
fatal: [localhost]: FAILED! => {"msg": "An unhandled exception occurred while templating '{'provider': 'vsphere', 'base_domain': 'local.lab', 'cluster_name': 'ocp4', 'fips': False, 'installer_ssh_key': "{{ lookup('file', '
/.ssh/ocp4.pub') }}", 'pu
ll_secret': {'auths': '...'}}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception occurred while running the lookup plugin 'file'. Error was a <class 'ansible.errors.AnsibleError'>, original message: could no
t locate file in lookup: ~/.ssh/ocp4.pub"}

OVF deploy timedout

I am running ansible-playbook -vvv -i staging static_ips_ova.yml and got below error, it happend sometimes but sometimes it works well, not sure why it doesn't work somehow

The full traceback is:
  File "/tmp/ansible_vmware_deploy_ovf_payload_4zu7qy9c/ansible_vmware_deploy_ovf_payload.zip/ansible/modules/cloud/vmware/vmware_deploy_ovf.py", line 292, in run
  File "/tmp/ansible_vmware_deploy_ovf_payload_4zu7qy9c/ansible_vmware_deploy_ovf_payload.zip/ansible/modules/cloud/vmware/vmware_deploy_ovf.py", line 286, in _open_url
  File "/tmp/ansible_vmware_deploy_ovf_payload_4zu7qy9c/ansible_vmware_deploy_ovf_payload.zip/ansible/module_utils/urls.py", line 1390, in open_url
    unredirected_headers=unredirected_headers)
  File "/tmp/ansible_vmware_deploy_ovf_payload_4zu7qy9c/ansible_vmware_deploy_ovf_payload.zip/ansible/module_utils/urls.py", line 1294, in open
    r = urllib_request.urlopen(*urlopen_args)
  File "/usr/lib64/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib64/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/usr/lib64/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/tmp/ansible_vmware_deploy_ovf_payload_4zu7qy9c/ansible_vmware_deploy_ovf_payload.zip/ansible/module_utils/urls.py", line 467, in https_open
    return self.do_open(self._build_https_connection, req)
  File "/usr/lib64/python3.6/urllib/request.py", line 1351, in do_open
    raise URLError(err)
fatal: [localhost]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "allow_duplicates": false,
            "cluster": null,
            "datacenter": "homelab",
            "datastore": "datastore1",
            "deployment_option": null,
            "disk_provisioning": "thin",
            "fail_on_spec_warnings": false,
            "folder": "/homelab/vm/ocp4-sddll",
            "hostname": "192.168.31.140",
            "inject_ovf_env": false,
            "name": "rhcos-vmware",
            "networks": {
                "VM Network": "Internal Network"
            },
            "ova": "/root/ocp4-vsphere-upi-automation/downloads/rhcos-vmware.ova",
            "ovf": "/root/ocp4-vsphere-upi-automation/downloads/rhcos-vmware.ova",
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "port": 443,
            "power_on": false,
            "properties": null,
            "proxy_host": null,
            "proxy_port": null,
            "resource_pool": "Resources",
            "username": "[email protected]",
            "validate_certs": false,
            "wait": true,
            "wait_for_ip_address": false
        }
    },
    "msg": "<urlopen error The write operation timed out>"
}

Looking toward 4.6 release, a change with regards ignition spec should be considered

I guess this it was already considered but just in case, up to 4.5 version the ignition spec is based on spec 2.1.0 but from 4.6 onwards it changes.
Eg: roles/dhcp_ova/templates/append-bootstrap.ign.j2
{ "ignition": { "config": { "merge": [ { "source": "{{ bootstrap_ignition_url }}" } ] }, "version": "3.1.0" } }

I've been using this project to deploy a 4.6 tweaking up the file aforementioned, and I had no issues, either way, this change will not easy as such since it is a conditional change while we have both versions.

thanks!

Issue with govc download

TASK [vmware : Create the vCenter folder by the same name as the cluster, only if it doesn't exist] ******************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "govc folder.create /Lop_OCP_Datacenter/vm/ocp4-xdfh4", "msg": "[Errno 8] Exec format error: b'govc'", "rc": 8}

So I have to go on tar the file and then comment out the the govc line in all.yml and command out the download and unarchive and make govc excutable lines in pre-install.yml

Environment
OS: rhel 8.4
Ansible Version: 2.9.21-1.el8
OpenShift Version: 4.7
ESXi Version: 7

dhcp_ova.yaml fails because of latency=high and no cpu_reservation

While running
ansible-playbook -i staging dhcp_ova.yml

The following error occurred with the latest code from main/master.

It appears that govc sets -latency high on the vm template, but the ansible module vmware_guest is unable to create a VM without specifying a high enough cpu_reservation. Not completely sure though.

TASK [dhcp_ova : Create bootstrap VM from the template] *******************************************************************************************************************************************************
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ansible.module_utils.vmware.TaskError: ('Invalid CPU reservation for the latency-sensitive VM, (sched.cpu.min) should be at least 8396 MHz. ', None)
failed: [localhost] (item={u'macaddr': u'00:50:56:83:85:6d', u'ipaddr': u'172.16.21.10', u'name': u'bootstrap'}) => {"ansible_loop_var": "item", "changed": false, "item": {"ipaddr": "172.16.21.10", "macaddr": "00:50:56:83:85:6d", "name": "bootstrap"}, "module_stderr": "Traceback (most recent call last):\n File "/root/.ansible/tmp/ansible-tmp-1594711610.26-28980-210325313588722/AnsiballZ_vmware_guest.py", line 102, in \n _ansiballz_main()\n File "/root/.ansible/tmp/ansible-tmp-1594711610.26-28980-210325313588722/AnsiballZ_vmware_guest.py", line 94, in _ansiballz_main\n invoke_module(zipped_mod, temp_path, ANSIBALLZ_PARAMS)\n File "/root/.ansible/tmp/ansible-tmp-1594711610.26-28980-210325313588722/AnsiballZ_vmware_guest.py", line 40, in invoke_module\n runpy.run_module(mod_name='ansible.modules.cloud.vmware.vmware_guest', init_globals=None, run_name='main', alter_sys=True)\n File "/usr/lib64/python2.7/runpy.py", line 176, in run_module\n fname, loader, pkg_name)\n File "/usr/lib64/python2.7/runpy.py", line 82, in _run_module_code\n mod_name, mod_fname, mod_loader, pkg_name)\n File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code\n exec code in run_globals\n File "/tmp/ansible_vmware_guest_payload_2WL_69/ansible_vmware_guest_payload.zip/ansible/modules/cloud/vmware/vmware_guest.py", line 2834, in \n File "/tmp/ansible_vmware_guest_payload_2WL_69/ansible_vmware_guest_payload.zip/ansible/modules/cloud/vmware/vmware_guest.py", line 2823, in main\n File "/tmp/ansible_vmware_guest_payload_2WL_69/ansible_vmware_guest_payload.zip/ansible/modules/cloud/vmware/vmware_guest.py", line 2479, in deploy_vm\n File "/tmp/ansible_vmware_guest_payload_2WL_69/ansible_vmware_guest_payload.zip/ansible/module_utils/vmware.py", line 797, in set_vm_power_state\n File "/tmp/ansible_vmware_guest_payload_2WL_69/ansible_vmware_guest_payload.zip/ansible/module_utils/vmware.py", line 82, in wait_for_task\n File "/tmp/ansible_vmware_guest_payload_2WL_69/ansible_vmware_guest_payload.zip/ansible/module_utils/six/init.py", line 748, in raise_from\nansible.module_utils.vmware.TaskError: ('Invalid CPU reservation for the latency-sensitive VM, (sched.cpu.min) should be at least 8396 MHz. ', None)\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

Unmount iso mount path change state

Ansible mount adds an entry in /etc/fstab. When unmounting with unmounted, it does not remove the entry from /etc/fstab. If the control node is rebooted, the iso path is mounted again. If the automation is run again, the task that creates the mount path fails because of the active mount. By using absent, it remove the entry from /etc/fstab which makes the idempotency better.

Some tasks need become to run

Downloading the initramfs, installer kernel, and raw_image require root privileges. Allow these to be run as non-root user with sudo privileges.

task failed

TASK [common : Display the absolute folder path of the vCenter folder] ***************************
skipping: [localhost]

PLAY [localhost] *********************************************************************************

TASK [vmware : Check if the vCenter folder already exists] ***************************************
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "'govc folder.info /vsphere.com/vm/ocp4-t4kb5'", "msg": "[Errno 2] No such file or directory: b'govc folder.info /vsphere.com/vm/ocp4-t4kb5': b'govc folder.info /vsphere.com/vm/ocp4-t4kb5'", "rc": 2}
...ignoring

TASK [vmware : Create the vCenter folder by the same name as the cluster, only if it doesn't exist] ***
fatal: [localhost]: FAILED! => {"changed": false, "cmd": "'govc folder.create /vsphere.com/vm/ocp4-t4kb5'", "msg": "[Errno 2] No such file or directory: b'govc folder.create /vsphere.com/vm/ocp4-t4kb5': b'govc folder.create /vsphere.com/vm/ocp4-t4kb5'", "rc": 2}

PLAY RECAP ***************************************************************************************
localhost : ok=33 changed=12 unreachable=0 failed=1 skipped=12 rescued=0 ignored=1

[root@Orchestrator upi]#

customvalues are ignored

It seems that customvalues are ignored now in newer versions of ansible/vmware_guest. There are no errors it just does not set the values.

The advanced_settings key now successfully replaces the customvalues key:

advanced_settings:
     - key: guestinfo.ignition.config.data
       value: "{{ BootstrapContent }}"
     - key: guestinfo.ignition.config.data.encoding
       value: base64
     - key: guestinfo.afterburn.initrd.network-kargs
       value: "ip={{ item.ipaddr }}::{{ static_ip.gateway }}:{{ static_ip.netmask }}:{{ item.name }}:ens192:off:{{ static_ip.dns }}"

pip installation error

Installation fails on below error

ASK [Gathering Facts] *********************************************************************************************************************************************************
ok: [localhost]

TASK [common : Install the necessary linux packages which will be needed later in the ansible run] *****************************************************************************
ok: [localhost]

TASK [common : Install python3-pyvmomi] ****************************************************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Unable to find any of pip2, pip to use.  pip needs to be installed."}

I guess it's because of 198aa81. python-pip is the culprit.

Post cluster creation - addition of more worker nodes?

What is the best way to adapt the automation in this repo to add more worker nodes to a cluster that was created using same automations in this repo?

Here's what I tried:
1 - updated and reran the helper node config with info for the new worker nodes.
2 - Added same entries for new worker nodes in group_vars/all.yml. Then ran ansible-playbook -i staging dhcp_ova.yml

Quickly realized that this wasn't going to work. Instead of only processing changes (the Ansible way, right?...) the play started to add another entire cluster from scratch in a new vCenter folder. I killed the play execution and cleaned up. Then started to research how I can add worker nodes to this cluster. No plan on that yet...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.