Giter Club home page Giter Club logo

vsphere-tanzu-kubernetes-grid-image-builder's Introduction

vSphere Tanzu Kubernetes Grid Image Builder

vSphere Tanzu Kubernetes Grid Image Builder provides tooling that can be used to build node images for use with vSphere with Tanzu.

Content

Prerequisites

Below are the prerequisites for building the node images

  • vSphere Environment version >= 8.0
  • DCHP configured for vCenter (required by Packer)
  • jq version >= 1.6
  • make version >= 4.2.1
  • docker version >= 20.10.21
  • Linux environment should have the below utilities available on the system

Building Images

Demo

  • Clone this repository on the linux environment for building the image.
  • Update the vSphere environment details like vCenter IP, Username, Password, etc. in vsphere.j2
  • Select the Kubernetes version.
    • Use make list-versions to list supported Kubernetes versions.
  • Run the artifacts container for the selected Kubernetes version using make run-artifacts-container KUBERNETES_VERSION=v1.22.13+vmware.1.
    • Default port used by the artifacts container is 8081 but this can be configured using the ARTIFACTS_CONTAINER_PORT parameter.
  • Run the image-builder container to build the node image(use make build-node-image target).
    • Default port used the image-builder containter is 8082 but this can be configured using the PACKER_HTTP_PORT.
  • Once the OVA is generated upload the OVA to a content library used by the supervisor.
  • To clean the containers and artifacts use the make clean target.

Supported Kubernetes Versions

supported-versions.json holds information about the supported Kubernetes versions and their corresponding supported OS targets along with the artifacts container image URL. This file will be updated when a new Kubernetes version is supported by the vSphere Tanzu team.

Make targets

Help

  • make help Provides help information about different make targets
make
make help
  • make list-versions gives information about supported Kubernetes versions and the corresponding OS targets
make list-versions PRINT_HELP=y # To show the help information for this target.
make list-versions              # Retrieves information from supported-versions.json file.

Clean

There are three different clean targets to clean the containers or artifacts generated during the process or both.

  • make clean-containers is used to stop/remove the artifacts or image builder or both.
    • During the container creation, All containers related to BYOI will be labelled as byoi
      • artifacts container will have byoi_artifacts and Kubernetes version as labels.
      • image builder container will have byoi_image_builder, Kubernetes version, and os target as labels
make clean-containers PRINT_HELP=y         # To show the help information for this target
make clean-containers                      # To clean all the artifacts and image-builder containers
make clean-containers LABEL=byoi_artifacts # To remove artifact containers
  • make clean-image-artifacts is used to remove the image artifacts like OVA's and packer log files
make clean-image-artifacts PRINT_HELP=y                           # To show help information for this target
make clean-image-artifacts IMAGE_ARTIFACTS_PATH=/root/artifacts/  # To clean the image artifacts in a folder
  • make clean is a combination of clean-containers and clean-image-artifacts that cleans both containers and image artifacts
make clean PRINT_HELP=y                                                   # To show the help information for this target
make clean IMAGE_ARTIFACTS_PATH=/root/artifacts/                          # To clean image artifacts and containers
make clean IMAGE_ARTIFACTS_PATH=/root/artifacts/ LABEL=byoi_image_builder # To clean image artifacts and image builder containers

Image Building

  • make run-artifacts-container is used to run the artifacts container for a Kubernetes version at a particular port
    • artifacts image URL will be fetched from the supported-versions.json based on the Kubernetes version selected.
    • By default artifacts container uses port 8080 by default however this can be configured through the ARTIFACTS_CONTAINER_PORT parameter.
make run-artifacts-container PRINT_HELP=y                                                       # To show the help information for this target
make run-artifacts-container KUBERNETES_VERSION=v1.22.13+vmware.1 ARTIFACTS_CONTAINER_PORT=9090 # To run 1.22.13 Kubernetes artifacts container on port 9090
  • make build-image-builder-container is used to build the image builder container locally with all the dependencies like Packer, Ansible, and OVF Tool.
make build-image-builder-container PRINT_HELP=y # To show the help information for this target.
make build-image-builder-container KUBERNETES_VERSION=v1.23.15+vmware.1 # To create the image builder container.
  • make build-node-image is used to build the vSphere Tanzu compatible node image for a Kubernetes version.
    • Host IP is required to pull the required Carvel Packages during the image build process and the default artifacts container port is 8080 which can be configured through ARTIFACTS_CONTAINER_PORT.
    • TKR(Tanzu Kubernetes Release) Suffix is used to distinguish images built on the same version for a different purpose. Maximum suffix length can be 8 characters.
make build-node-image PRINT_HELP=y # To show the help information for this target.
make build-node-image OS_TARGET=photon-3 KUBERNETES_VERSION=v1.23.15+vmware.1 TKR_SUFFIX=byoi HOST_IP=1.2.3.4 IMAGE_ARTIFACTS_PATH=/Users/image ARTIFACTS_CONTAINER_PORT=9090 # Create photon-3 1.23.15 Kubernetes node image

Customizations Examples

Sample customization examples can be found here

Debugging

  • To enable debugging for the make file scripts export DEBUGGING=true.
  • Debug logs are enabled by default on the image builder container which can be viewed through the docker logs -f <container_name> command.
  • Packer logs can be found at <artifacts-folder>/logs/packer-<random_id>.log which will be helpful when debugging issues.

Contributing

The vSphere Tanzu Kubernetes Grid Image Builder project team welcomes contributions from the community. Before you start working with VMware Image Builder, please read our Developer Certificate of Origin. All contributions to this repository must be signed as described on that page. Your signature certifies that you wrote the patch or have the right to pass it on as an open-source patch. For more detailed information, please refer to CONTRIBUTING.

License

This project is available under the Mozilla Public License, V2.0.

Support

VMware will support issues with the vSphere Tanzu Kubernetes Grid Image Builder, but you are responsible for any issues relating to your image customizations and custom applications. You can open VMware Support cases for TKG clusters built with a custom Tanzu Kubernetes release image, however, VMware Support will be limited to best effort only, with VMware Support having full discretion over how much effort to put in to troubleshooting. On opening a case with VMware Support regarding any issue with a cluster built with a custom Tanzu Kubernetes release image, VMware Support asks that you provide support staff with the exact changes made to the base image.

vsphere-tanzu-kubernetes-grid-image-builder's People

Contributors

akutz avatar dimplerajavamsi avatar jvrahav avatar lgayatri avatar lparis avatar mohdwaquar avatar ridaz avatar vmwghbot avatar zyiyi11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vsphere-tanzu-kubernetes-grid-image-builder's Issues

Revisit on enabling goss test for sshd.service

Currently the goss tests for sshd.service in case of photon3 is expecting the value to be,
enabled: true and running: true

With the Photon3 STIG compliance support for k8s versions 1.25 or above,
the sshd.service is being set to enabled: true and running: false
For now, the test has been skipped while building the OVA.

Need to add a condition check in order to set this value accordingly.

[Documentation] Failed to prepare build: "vsphere" when following image builder tutorial

Attempting to follow the tutorial in docs/examples "Tutorial for using vSphere Tanzu Kubernetes Image Builder" (https://github.com/vmware-tanzu/vsphere-tanzu-kubernetes-grid-image-builder/blob/main/docs/examples/tutorial_building_an_image.md). Prereqs are met, running vSphere 8.0.1, etc.

Everything in the tutorial works fine up until the make build-node-image step. The build process fails to create any artifacts and returns an error after packer begins Preparing build: vsphere:

Docker container log:

Error: Failed to prepare build: "vsphere"

4 error(s) occurred: 

* 'vcenter_server' is required
* 'username' is required
* 'password' is required
* 'host' or 'cluster' is required

This is despite the fact that the all of the above fields are accurately defined in the packer-variables/vsphere.j2 configuration file, as are the ARTIFACTS_CONTAINER_IP and ARTIFACTS_CONTAINER_PORT parameters in the make build-node-image command.

I am wondering if it is failing to locate/parse the packer-variables/vsphere.j2 configuration file properly.

Update the Project name in the docs

Describe the bug

Update the Project name in README, Code of Conduct, contributing, and tutorial docs.

Reproduction steps

NA

Expected behavior

NA

Additional context

No response

Update apparmor parameter in Photon Boot config

Describe the bug

Currently, /boot/photon.cfg contains both apparmor=0 and apparmor=1 flags set. apparmor=0 is set by the upstream image-builder while the downstream code apparmor=1 flag resulting in both the flags.

Reproduction steps

Build Photon Image, create a Cluster check the /boot/photon.cfg

Expected behavior

/boot/photon.cfg should contain only apparmor=1

Additional context

No response

Add pos flag `pos=1` to kernel parameters to mitigate regression in Pod's Datapath Performance on TKG

Describe the bug

Currently, there's a regression in Pod's Datapath performance for the TKG 2.0 TKRs.
Redis throughput degrades by 55%
intra-node TCP bandwidth degrades by 44%
intra-node UDP throughput by 57%

Reproduction steps

  1. Build Photon image, Deploy cluster and check /boot/photon.cfg
  2. To check the throughput, deploy a redis pod using
    https://github.com/vmware-tanzu/k-bench/blob/master/config/dp_redis/redis_pod.yaml
  3. Run the following command:
mkdir /tmp/redisoutput; redis-server > /tmp/redisoutput/redisserver.out 2> /tmp/redisoutput/redisserver.err &
cd /memtier_benchmark; ./memtier_benchmark

Expected behavior

Pod's Datapath Performance on TKG should not degrade when deployed with TKG2.0

Additional context

No response

Auto Clean VM on failed builds

When a build fails, it preserves the VM (which would be good for debugging purposes) but in cases where there's just some failure, it would be nice to have an option to automatically clean up the VM, so that the next run will be successful. Right now, it fails and requires user to manually delete, clean up the container AND then re-run the build command

Fix the logic for updating the secretRef using update_cbt function's log in tkg_byoi.py and remove the --rm flag in docker start command

  1. With the change in schema for ClusterBootstrapTemplate.yaml,
    the logic for updating secretRef presemt in additionalPackages in the cbt config needs to be updated.

Currently, the code is expecting in the additionalPackages that if valuesFrom is present as the key, it should have a childkey as secretRef

cbt_data["spec"]["additionalPackages"][index]["valuesFrom"]["secretRef"].replace(old_tkr_name,new_tkr_name)

which is does not hold true in the case of some additional Packages.

valuesFrom:
      inline:
        abcController:
          securityContext:
            seccompProfile:
              type: RuntimeDefault
  1. with containerd version: v1.6.18 and above -rm and -d flag cant be used together in ctr command. The --rm tag needs to be removed from ctr command

STIG support for Photon TKR versions 1.25 and above

Consume the photon-3-stig-hardening.tar.gz from the artifacts bundle,
and deliver Photon TKRs for kubernetes version 1.25 and above to be STIG compliant.

This would be an enhancement for the Photon TKRs and
Disable photon va hardening since the TKRs would be hardened using STIG.

[TKG]Improve documentation about the TKR suffix

Describe the bug

Improve documentation about the TKR suffix

  • How it affects the TKR versioning if two images are built on the same Kubernetes release?
  • Any known limitations to the TKR suffix like max character length etc.

Reproduction steps

NA

Expected behavior

NA

Additional context

No response

Ability build Photon/Ubuntu images on Mac/Linux with multiple interfaces

History

When building Photon Images during the packer build process it starts a HTTP server and hosts the preseed file. Packer Picks the localhost IP and uses a random available port between 8000 and 9000. On a system that has multiple interfaces packer binds to an IP where there is no connection.

  • Using http_port_max and http_port_min we can configure the Packer HTTP port.
  • Using the http_ip packer configuration can be used to make the packer use the IP that is reachable from the vCenter/VM (This requires an upstream change to add http_ip in kubernetes-sigs image-builder)

Approach

When building the TKR we can take Packer HTTP port as an input parameter and by default use ARTIFACTS_CONTAINER_IP as the http_ip.

packer-http-config.j2

{
    "http_port_max": "{{ packer_http_port }}",
    "http_port_min": "{{ packer_http_port }}",
    "http_ip": "{{ artifacts_container_ip }}"
}

Add Support statement

Add BYOI Cluster VMware Support statement.

Below is the Support statement that needs to be added to the repo.

VMware will fully support issues with BYOI, but the customer is responsible for any issues relating to their custom application/customizations. Customers can open support tickets regarding their BYOI Clusters; however, VMware support will be limited to best effort support basis only, with support having full discretion of how much effort to put in to troubleshooting. Upon opening a case with VMware Support regarding any issue with a BYOI Cluster, VMware Support asks that the customer supplies support with the exact changes made to the base image.

[TKG Documentation]Add a sample customization doc on how to add new packages and configure sources

Is your feature request related to a problem? Please describe.

Add a sample customization document on how to add new OS packages like pciutils and also add examples on how to configure sources or repos for both Photon and Ubuntu using packer variables provided by Kubernetes image builder

Describe the solution you'd like

NA as this is an issue created for Documentation improvement.

Describe alternatives you've considered

No response

Additional context

No response

Question on customization

Is it possible to edit the hardware options of the image? Specifically I'm interested in 'Expose hardware assisted virtualization to the guest OS' in the VM. Or would that be modified somewhere else?

Thanks,
John

Add 1.26.5 TKR Support

Add 1.26 TKR Support. Update the supported-versions.json with the new artifacts bundle for 1.26

Delete the STIG ansible role after the image build process

When Building 1.25.7 Image I observed the tanzu-compliance folder present in the ansible folder. Raising this Issue to remove the tanzu-compliance folder once Photon 1.25 Image build is completed or failed during the image build process

kosarajud@kosarajudDXJW3 vsphere-tanzu-kubernetes-grid-image-builder % tree ansible -L 1               
ansible
├── defaults
├── files
├── tanzu-compliance --------> Needs to be removed
├── tasks
└── templates

6 directories, 0 files

[TKG]Support custom k8s image builder commit ID per supported Kubernetes version

Is your feature request related to a problem? Please describe.

As proceed to support multiple Kubernetes versions with different OS support there is a possibility that the upstream Kubernetes image builder stop supporting a few of the older Kubernetes versions/OS. As we are maintaining a single upstream Kubernetes image builder Commit ID for all the supported Kubernetes versions this will cause failures.

Describe the solution you'd like

To avoid any breaking changes propose a separate field in the supported-versions.json such that a custom upstream Kubernetes image builder commit ID can be passed per the supported Kubernetes version. As a further enhancement, we can pass all the key-value pairs as docker variables to the image-builder container.

Example supported-versions.json

{
    "v1.24.9+vmware.1" : {
        "supported_os": ["photon-3", "ubuntu-2004-efi"],
        "artifacts_image": "projects-stg.registry.vmware.com/tkg/tkg-vsphere-linux-resource-bundle:v1.24.9_vmware.1-tkg.1",
        "extra_params": {
            "image_builder_commit": "<commit_hash>"
        }
    },
    "v1.25.9+vmware.1" : {
        "supported_os": ["photon-3", "ubuntu-2204-efi"],
        "artifacts_image": "projects-stg.registry.vmware.com/tkg/tkg-vsphere-linux-resource-bundle:v1.25.9_vmware.1-tkg.1",
        "extra_params": {
            "image_builder_commit": "<different_commit_hash>"
        }
    }
}

Describe alternatives you've considered

No response

Additional context

No response

Use 1.25.7 Kubernetes version as the key in the supported versions json file

What steps did you take and what happened:
When I ran make list-versions it showed the 1.25 K8s version as v1.25.7+vmware.3 but after the OVA is generated OVA name is ubuntu-2004-amd64-v1.25.7---vmware.3-fips.1-delete.ova. We use <OS>-<version>-<arch>-<TKR k8s version>-<suffix>.ova, so instead of v1.25.7+vmware.3 it should be v1.25.7+vmware.3-fips.1

What did you expect to happen:
k8s version for 1.25.7 should be v1.25.7+vmware.3-fips.1

Now

kosarajud@kosarajudDXJW3 vsphere-tanzu-kubernetes-grid-image-builder % make list-versions
            Kubernetes Version  |  Supported OS
              v1.24.9+vmware.1  |  [photon-3,ubuntu-2004-efi]
              v1.25.7+vmware.3  |  [photon-3,ubuntu-2004-efi]

 Hint: Use "make run-artifacts-container KUBERNETES_VERSION=<version>" to run the artifacts container.

Expected

kosarajud@kosarajudDXJW3 vsphere-tanzu-kubernetes-grid-image-builder % make list-versions
            Kubernetes Version  |  Supported OS
              v1.24.9+vmware.1  |  [photon-3,ubuntu-2004-efi]
       v1.25.7+vmware.3-fips.1  |  [photon-3,ubuntu-2004-efi]

 Hint: Use "make run-artifacts-container KUBERNETES_VERSION=<version>" to run the artifacts container.

Is there anything else you would like to add?
I looked at the code we don't use this data apart from checking[1] the if the k8s version is greater than 1.25 to enable the STIG role or not. So this should be a cosmetic change. (Better UX as it helps in specifying it is FIPS version)

[1] https://github.com/vmware-tanzu/vsphere-tanzu-kubernetes-grid-image-builder/blob/main/build-ova.sh#L72

Please tell us about your environment.

Value How to Obtain
Commit ID a21891e Run git log -1
Kubernetes Version 1.25.7 Kubernetes version that you are trying to build the image
OS Type Photon-3/Ubuntu-20.04 OS Type and version that you are trying to build the image

Pull networkd-dispatcher from the artifacts bundle

Describe the bug

networkd-dispatcher is pulled from the internet https://gitlab.com/craftyguy/networkd-dispatcher/-/archive/2.1/networkd-dispatcher-2.1.tar.bz2 and there is a possibility this might get expired or changed so avoid failures and to support in air-gapped scenarios pull networkd-dispatcher service from the artifacts bundle. This can be done by setting the networkd_dispatcher_download_url in the ansible_user_vars packer variable.

Refer to the upstream kubernetes image builder code where which downloads and installs the networkd-dispatcher

Reproduction steps

Build Photon image

Expected behavior

NA

Additional context

No response

Add logging for error cases while building the node image

Building the node image doesn't log for the error conditions and it simply stops without providing any information regarding the actual error.
For example, if we build with incorrect HOST_IP parameter:

$ make build-node-image KUBERNETES_VERSION=v1.26.5+vmware.2-fips.1 OS_TARGET=photon-3 TKR_SUFFIX=byoi IMAGE_ARTIFACTS_PATH=$HOME/images HOST_IP=1.2.3.4
sha256:d64a60cc9a0937d1e03796809e3386aaff2ef2bc26b1f978c1f61a3edb49a4a9
Using default port for artifacts container 8081
Using default Packer HTTP port 8082
v1.26.5---vmware.2-fips.1-photon-3-image-builder
620dce058b414f31f7820c215d7cbfaf2cd5049eaf2b59533d4018ca656ba7ed

 Hint: Use "docker logs -f v1.26.5---vmware.2-fips.1-photon-3-image-builder" to see logs and status
 Hint: Node Image OVA can be found at /root/images/ovas

The docker logs only contain the following messages:

$ docker logs -f v1.26.5---vmware.2-fips.1-photon-3-image-builder
+ image_builder_root=/image-builder/images/capi
+ default_packer_variables=/image-builder/images/capi/image/packer-variables/
+ packer_configuration_folder=/image-builder/images/capi
+ tkr_metadata_folder=/image-builder/images/capi/tkr-metadata/
+ custom_ovf_properties_file=/image-builder/images/capi/custom_ovf_properties.json
+ artifacts_output_folder=/image-builder/images/capi/artifacts
+ ova_destination_folder=/image-builder/images/capi/artifacts/ovas
+ photon3_stig_compliance=false
+ main
+ copy_custom_image_builder_files
+ cp image/hack/tkgs-image-build-ova.py hack/image-build-ova.py
+ cp image/hack/tkgs_ovf_template.xml hack/ovf_template.xml
+ download_configuration_files
+ wget -q http://1.2.3.4:8081/artifacts/metadata/kubernetes_config.json

From the above logs, it is not evident why the build did not generate any OVA.
The ask is to provide relevant error log messages for any failures reported in build-ova.sh.

In addition to above, there is a minor request to update the Readme with the HOST_IP parameter description which is available in the make help command.

Add 1.25.7 TKR Support

Add 1.25.7 TKR Support. Update the supported-versions.json with the new artifacts bundle for 1.25.7

[TKG] Add support for 1.24.9 TKR and remove 1.22.13

Is your feature request related to a problem? Please describe.

Add support for building 1.24.9 TKR TKG on the vSphere node image and remove 1.22.13 TKR support.

Describe the solution you'd like

NA

Describe alternatives you've considered

No response

Additional context

No response

make build-node-image fails with Ansible requires 3.9.0 or greater.

What steps did you take and what happened:

# make build-node-image OS_TARGET=ubuntu-2004-efi KUBERNETES_VERSION=v1.26.5+vmware.2-fips.1 TKR_SUFFIX=gpu ARTIFACTS_CONTAINER_IP=192.168.30.95 IMAGE_ARTIFACTS_PATH=/root/image ARTIFACTS_CONTAINER_PORT=8081
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon   31.2MB
Step 1/26 : FROM photon:3.0
 ---> 42ffa214b8d3
<snip>
....

Cloning into 'image-builder'...
Removing intermediate container 1324a8a3ce48
 ---> fe4910d02848
Step 18/26 : WORKDIR $IMAGE_BUILDER_REPO_NAME
 ---> Running in 994e9b92f18b
Removing intermediate container 994e9b92f18b
 ---> c3f9ed5b598f
Step 19/26 : WORKDIR images/capi
 ---> Running in 2b8c0347c5d3
Removing intermediate container 2b8c0347c5d3
 ---> 816a749a884b
Step 20/26 : RUN make deps-ova
 ---> Running in 86e1292f295f
hack/ensure-python.sh
Checking if python is available
Detected python version: Python 3.7.5.
Ansible requires 3.9.0 or greater.
Please install 3.9.0 or later.
make: *** [Makefile:59: deps-common] Error 2
The command '/bin/bash -c make deps-ova' returned a non-zero code: 2
make: *** [Makefile:69: build-image-builder-container] Error 2

What did you expect to happen:
Build to be successful

Please tell us about your environment.

[Documentation]Add the docker, jq, and make versions

Add the docker, jq, and make versions as in some environments docker/jq/make versions are very old where docker/jq/make commands are not working as expected.

Attaching the snippet with the versions where everything is working fine.
Versions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.