metal3-io / project-infra Goto Github PK
View Code? Open in Web Editor NEWMetal3 testing infrastructure configuration
Home Page: https://prow.apps.test.metal3.io
License: Apache License 2.0
Metal3 testing infrastructure configuration
Home Page: https://prow.apps.test.metal3.io
License: Apache License 2.0
As of now, in project-infra we only have check-prow-config
tests as default and mandatory, and other integration test triggers can also be triggered per need basis. However, we need to look forward to adding a new YAML linter job to CI that checks and catches unexpected format, blank space etc in YAML files being changed or added.
The need for this is because there were multiple cases up until now, where we had inappropriate YAML files after making changes (extra blank space addition mostly) to prow config files and that merged unnoticed, resulting in the unformatted and hard to read k8s resource YAML definitions being created in the prow cluster.
These parameters are always set to either ubuntu
or centos
, but with the difference that TARGET_NODE_OS
has it capitalized and DISTRIBUTION
does not.
TARGET_NODE_OS=Ubuntu
DISTRIBUTION=ubuntu
We never mix them so that we have Ubuntu and centos, which means that we could drop one of them. This would simplify the pipelines and reduce the number of variables. Note that TARGET_NODE_OS is translated to IMAGE_OS in the pipeline env.
In order to get the UEFI to work for metal3-dev-env in Ubuntu, we need a recent version of Libvirt that is in Focal, not in Xenial. So we should upgrade Ubuntu, to support this new feature
markdownlint
image used in our tests does not honor markdownlint-disable
rules in the form we're currently using them. They used to work, but as noted in metal3-io/community#3 we actually have global ignore for each of the repos where markdownlint-disable
in the documents, so the usage is masked out right now.
Recently, we have started experiencing Docker rate limit issues:
toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit make: *** [Makefile:4: install_requirements] Error 1
in our CI runs and in integration test runs in the pull requests. Possible ways as discussed in the metal3 community meeting could be:
When prow leaves a comment on a PR about a test failure, the "Details" link just links back to the PR itself instead of to something more useful. I suspect this is missing configuration of a base URL somewhere.
Example: #14
It would be useful to collect BareMetalHost, Machine, and Node YAML as part of the artifacts created in CI.
test
Currently everything ran on Centos images are failing, since there is a problem with the base image which has conflicting rpm packages with newly packages. So everything related to upgrade fails in the tests such as sudo dnf upgrade -y
https://jenkins.nordix.org/blue/organizations/jenkins/metal3-centos-e2e-integration-test-main/detail/metal3-centos-e2e-integration-test-main/150/pipeline/
The error is the following:
[2024-05-07T05:29:39.094Z] Error:
[2024-05-07T05:29:39.094Z] Problem 1: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - cannot install the best update candidate for package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z] - cannot install the best update candidate for package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z] Problem 2: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - problem with installed package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z] - problem with installed package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z] - package selinux-policy-targeted-38.1.35-2.el9.noarch from @System requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - package selinux-policy-targeted-38.1.35-2.el9.noarch from baseos requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - package nbdkit-1.36.2-1.el9.x86_64 from @System requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - package nbdkit-1.36.2-1.el9.x86_64 from appstream requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from @System
[2024-05-07T05:29:39.095Z] - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from baseos
[2024-05-07T05:29:39.095Z] - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from @System
[2024-05-07T05:29:39.095Z] - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from appstream
[2024-05-07T05:29:39.095Z] - cannot install the best update candidate for package selinux-policy-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.095Z] - cannot install the best update candidate for package nbdkit-basic-filters-1.36.2-1.el9.x86_64
In order to prevent issues of old branches not including some fix commits, and to test the commit as if merged, the CI should rebase all commits on top of the target branch before running the tests.
When building an image with DiB it accepts a environment variable called DIB_DEV_USER_AUTHORIZED_KEYS
. This takes a file path and this file will be copied to created image as the authorized_key file for the user defined in DIB_DEV_USER_USERNAME
environment variable.
To rotate the current key a new ed25519 key should be generated and added to the authorized key file to be accepted to login with. Once the new key is validated to be working the old key should be rotated out and removed
Right now only a small group has access to the CI cluster itself. It would be nice to allow access to anyone in the metal3-io github org, at least with read-only access to the test-pods
namespace to view the Pods for test jobs and to inspect their logs directly.
We don't clean the /tmp/manifests
. This is not a problem normally as we start from a fresh VM every time, but for BML it builds up.
Addresses this.
This has already been done in CAPM3 and IPAM. Check these repo's for reference.
To test transfer issue plugin
When trying to build the ci-image with the Jenkins pipeline there is a conflict between RPM packages nbdkit and selinux-policy-targeted. This same error was seen in centos tests when setting up the metal3-dev-env #738 .
[2024-05-07T05:29:39.094Z] Error:
[2024-05-07T05:29:39.094Z] Problem 1: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - cannot install the best update candidate for package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z] - cannot install the best update candidate for package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z] Problem 2: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - problem with installed package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z] - problem with installed package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z] - package selinux-policy-targeted-38.1.35-2.el9.noarch from @System requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - package selinux-policy-targeted-38.1.35-2.el9.noarch from baseos requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - package nbdkit-1.36.2-1.el9.x86_64 from @System requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - package nbdkit-1.36.2-1.el9.x86_64 from appstream requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z] - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from @System
[2024-05-07T05:29:39.095Z] - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from baseos
[2024-05-07T05:29:39.095Z] - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from @System
[2024-05-07T05:29:39.095Z] - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from appstream
[2024-05-07T05:29:39.095Z] - cannot install the best update candidate for package selinux-policy-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.095Z] - cannot install the best update candidate for package nbdkit-basic-filters-1.36.2-1.el9.x86_64
A workaround was applied to metal3-dev-env to get the tests passing but the issue should ultimately be fixed in the image build and / or monitor if there is a upstream fix in the RPM packages to fix the conflict. After this is done the workaround could be reverted to make sure that we are using the latest packages.
Add transfer-issue plugin to Prow so that we can move issues from one repo to another via Prow instead of closing and re-opening it. Ref: https://prow.k8s.io/command-help#transfer_issue
/kind feature
/priority important-longterm
Currently, the retention policy saves the last 5 images, but this isn't very safe. The process to take a new image is manual and lacks visibility into which image is currently in use. You have to be a Jenkins Admin to see and change the image for CI, which means someone with triggering rights to the build can start it without knowing they might erase the actively used image from OpenStack.
This problem also applies to node images. However, everyone currently has visibility into both Artifactory and the dev-env code, allowing them to see what image is used and understand how a new trigger will affect it.
To address these issues, we need to ensure that the actively used image is never deleted. We also need a way to ensure that if the active image is changed, the new image will work properly through some testing. Additionally, any changes to an image build should be testable in the PR before merging.
A potential solution could involve having the active image with a separate naming convention from the candidate images. For promotion, there would be a pipeline that takes a candidate image as input, runs tests on it, and if the tests pass, automatically changes the active image to the candidate.
Note: Jenkins also offers an artifactory plugin which supports promtion logic out of the box which could be investigated. Not sure if same exists for the openstack plugin
By implementing these changes, we can increase the reliability and safety of our image retention process, improve coordination among team members triggering builds, and reduce the risk of active image overwriting and build failures. Testing new images before they become active will ensure they are reliable and functional, providing a smoother and more predictable CI/CD process.
Currently we only collect logs from the "normal" containers in each Pod (i.e. the ones defined in spec.containers
). It would be good to also get the logs for init containers since it can sometimes happen that they get stuck.
Make the run_fetch_logs.sh script collect logs from spec.initContainers
also.
Prow config is thousands of lines, and hard to manage and read. Split it in the style of https://github.com/kubernetes/test-infra/tree/master/config/jobs
We decided to use dynamic jenkins worker.
Here the progress of the move.
Done:
✅ clusterctl upgrade tests
✅ feature tests
✅ dev env integration tests
✅ e2e basic
✅ e2e_integration tests
✅ k8s_upgrade tests
✅ ephemeral tests
✅ bmo e2e
In progress:
Todo:
⚪ job periodic clean
⚪ Nordix clone
⚪ fullstack build
Note! We are are also splitting pipelines for dev_env and e2e_tests, and for feature tests we have already seperate pipeline.
*.apps.ci.metal3.io
uses an SSL certificate from letsencrypt, but api.ci.metal3.io
is still using a self signed SSL cert.
https://docs.openshift.com/container-platform/4.1/authentication/certificates/api-server.html
The CI cluster is currently running a nightly prerelease of OpenShift 4.2. Once 4.2 is officially released, we should upgrade the cluster to an official release.
We have travis-ci running some jobs against baremetal-operator and cluster-api-provider-baremetal. We should be able to replace those with prow jobs.
Now we have moved most of the jjb and pipelines to cover trigger tests from prow this issue report the current status and state the next TODOs:
🟢 job_capm3_e2e_basic_tests.yml
🟢 job_capm3_e2e_clusterctl_upgrade_tests_prow.yml
🟢 job_capm3_e2e_feature_tests_prow.yml
🟢 job_capm3_e2e_integration_tests_prow.yml
🟢 job_capm3_e2e_k8s_upgrade_tests_prow.yml
🟢 job_capm3_periodic_e2e_clusterctl_upgrade_tests_prow.yml
🟢 job_capm3_periodic_e2e_feature_tests _prow.yml
🟢 job_capm3_periodic_e2e_integration_tests_prow.yml
🟢 job_capm3_periodic_e2e_k8s_upgrade_tests_prow.yml
🟢 job_periodic_clean.yml
🟢 job_capm3_periodic_e2e_ephemeral_tests.yml
🟢 job_capm3_periodic_integration_tests.yml
🟢 job_dev_env_integration_tests.yml
🟢 job_integration_tests.yml
🟡 job_artifact_cleanup.yml
🟡 job_bml_integration_tests.yml
🟡 job_bml_periodic_integration_tests.yml
🟡 job_ci_image_building.yml
🟡 job_container_image_building.yaml
🟡 job_docker_image_building.yml
🟡 job_fullstack_building_test.yml
🟡 job_fullstack_project-infra_building_test.yml
🟡 job_ironic_image_build_test.yml
🟡 job_metal3_dev_tools_integration_test.yml
🟡 job_openstack_image_building.yml
🟡 job_openstack_node_image_building.yml
🟡 job_periodic_fullstack_building.yml
🟡 job_update_nordix_repos.yml
🟢 CAPM3 https://github.com/metal3-io/cluster-api-provider-metal3
🟢 IPAM https://github.com/metal3-io/ip-address-manager
🟢 BMO https://github.com/metal3-io/baremetal-operator
🟢 DEV_ENV https://github.com/metal3-io/metal3-dev-env
🟢 Project-infra https://github.com/metal3-io/project-infra
🟢 Ironic-image https://github.com/metal3-io/ironic-image
🟢 Mariadb-image https://github.com/metal3-io/mariadb-image
🟢 ironic-ipa-downloader https://github.com/metal3-io/ironic-ipa-downloader
Prow is deployed, but we have no upgrade strategy in place to keep it up to date. We should fix that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.