Giter Club home page Giter Club logo

project-infra's People

Contributors

adilghaffardev avatar derekhiggins avatar dhellmann avatar digambar15 avatar dtantsur avatar elfosardo avatar fmuyassarov avatar furkatgofurov7 avatar honza avatar huutomerkki avatar jaakko-os avatar jan-est avatar kashifest avatar lentzi90 avatar macaptain avatar maelk avatar mboukhalfa avatar metal3-io-bot avatar mikkosest avatar mquhuy avatar namnx228 avatar nymanrobin avatar peppi-lotta avatar rozzii avatar russellb avatar smoshiur1237 avatar stbenjam avatar sunnatillo avatar tuminoid avatar wgslr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

project-infra's Issues

Add yaml linter job to CI

As of now, in project-infra we only have check-prow-config tests as default and mandatory, and other integration test triggers can also be triggered per need basis. However, we need to look forward to adding a new YAML linter job to CI that checks and catches unexpected format, blank space etc in YAML files being changed or added.
The need for this is because there were multiple cases up until now, where we had inappropriate YAML files after making changes (extra blank space addition mostly) to prow config files and that merged unnoticed, resulting in the unformatted and hard to read k8s resource YAML definitions being created in the prow cluster.

Merge parameters DISTRIBUTION and TARGET_NODE_OS

These parameters are always set to either ubuntu or centos, but with the difference that TARGET_NODE_OS has it capitalized and DISTRIBUTION does not.

TARGET_NODE_OS=Ubuntu
DISTRIBUTION=ubuntu

We never mix them so that we have Ubuntu and centos, which means that we could drop one of them. This would simplify the pipelines and reduce the number of variables. Note that TARGET_NODE_OS is translated to IMAGE_OS in the pipeline env.

Discussion on Docker rate limit

Recently, we have started experiencing Docker rate limit issues:
toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit make: *** [Makefile:4: install_requirements] Error 1

in our CI runs and in integration test runs in the pull requests. Possible ways as discussed in the metal3 community meeting could be:

Broken link to test failure details

When prow leaves a comment on a PR about a test failure, the "Details" link just links back to the PR itself instead of to something more useful. I suspect this is missing configuration of a base URL somewhere.

Example: #14

All Centos tests are failing

Currently everything ran on Centos images are failing, since there is a problem with the base image which has conflicting rpm packages with newly packages. So everything related to upgrade fails in the tests such as sudo dnf upgrade -y
https://jenkins.nordix.org/blue/organizations/jenkins/metal3-centos-e2e-integration-test-main/detail/metal3-centos-e2e-integration-test-main/150/pipeline/

The error is the following:

[2024-05-07T05:29:39.094Z] Error: 
[2024-05-07T05:29:39.094Z]  Problem 1: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - cannot install the best update candidate for package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z]   - cannot install the best update candidate for package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z]  Problem 2: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - problem with installed package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z]   - problem with installed package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z]   - package selinux-policy-targeted-38.1.35-2.el9.noarch from @System requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - package selinux-policy-targeted-38.1.35-2.el9.noarch from baseos requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - package nbdkit-1.36.2-1.el9.x86_64 from @System requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - package nbdkit-1.36.2-1.el9.x86_64 from appstream requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from @System
[2024-05-07T05:29:39.095Z]   - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from baseos
[2024-05-07T05:29:39.095Z]   - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from @System
[2024-05-07T05:29:39.095Z]   - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from appstream
[2024-05-07T05:29:39.095Z]   - cannot install the best update candidate for package selinux-policy-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.095Z]   - cannot install the best update candidate for package nbdkit-basic-filters-1.36.2-1.el9.x86_64

Rebase all PRs in Jenkins jobs

In order to prevent issues of old branches not including some fix commits, and to test the commit as if merged, the CI should rebase all commits on top of the target branch before running the tests.

Rotate the dev key in DiB image building workflow

When building an image with DiB it accepts a environment variable called DIB_DEV_USER_AUTHORIZED_KEYS. This takes a file path and this file will be copied to created image as the authorized_key file for the user defined in DIB_DEV_USER_USERNAME environment variable.

To rotate the current key a new ed25519 key should be generated and added to the authorized key file to be accepted to login with. Once the new key is validated to be working the old key should be rotated out and removed

Set up RBAC so all metal3-io members can view test pods

Right now only a small group has access to the CI cluster itself. It would be nice to allow access to anyone in the metal3-io github org, at least with read-only access to the test-pods namespace to view the Pods for test jobs and to inspect their logs directly.

BML: Cleanup /tmp/manifests

We don't clean the /tmp/manifests. This is not a problem normally as we start from a fresh VM every time, but for BML it builds up.

Fix the Centos CI openstack image building pipeline

When trying to build the ci-image with the Jenkins pipeline there is a conflict between RPM packages nbdkit and selinux-policy-targeted. This same error was seen in centos tests when setting up the metal3-dev-env #738 .

[2024-05-07T05:29:39.094Z] Error: 
[2024-05-07T05:29:39.094Z]  Problem 1: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - cannot install the best update candidate for package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z]   - cannot install the best update candidate for package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z]  Problem 2: package nbdkit-1.38.0-1.el9.x86_64 from appstream requires (nbdkit-selinux if selinux-policy-targeted), but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - problem with installed package selinux-policy-targeted-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.094Z]   - problem with installed package nbdkit-1.36.2-1.el9.x86_64
[2024-05-07T05:29:39.094Z]   - package selinux-policy-targeted-38.1.35-2.el9.noarch from @System requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - package selinux-policy-targeted-38.1.35-2.el9.noarch from baseos requires selinux-policy = 38.1.35-2.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - package nbdkit-1.36.2-1.el9.x86_64 from @System requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - package nbdkit-1.36.2-1.el9.x86_64 from appstream requires nbdkit-basic-filters(x86-64) = 1.36.2-1.el9, but none of the providers can be installed
[2024-05-07T05:29:39.094Z]   - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from @System
[2024-05-07T05:29:39.095Z]   - cannot install both selinux-policy-38.1.36-1.el9.noarch from baseos and selinux-policy-38.1.35-2.el9.noarch from baseos
[2024-05-07T05:29:39.095Z]   - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from @System
[2024-05-07T05:29:39.095Z]   - cannot install both nbdkit-basic-filters-1.38.0-1.el9.x86_64 from appstream and nbdkit-basic-filters-1.36.2-1.el9.x86_64 from appstream
[2024-05-07T05:29:39.095Z]   - cannot install the best update candidate for package selinux-policy-38.1.35-2.el9.noarch
[2024-05-07T05:29:39.095Z]   - cannot install the best update candidate for package nbdkit-basic-filters-1.36.2-1.el9.x86_64

A workaround was applied to metal3-dev-env to get the tests passing but the issue should ultimately be fixed in the image build and / or monitor if there is a upstream fix in the RPM packages to fix the conflict. After this is done the workaround could be reverted to make sure that we are using the latest packages.

Sync labels across repos

@stbenjam proposed using label_sync to sync labels across repos here: #12

My first try with it didn't work, so I proposed a revert (#19), so this issue is a reminder to come back and try to make it work later.

Improve retention policy and management of node and ci images

Current Situation

Currently, the retention policy saves the last 5 images, but this isn't very safe. The process to take a new image is manual and lacks visibility into which image is currently in use. You have to be a Jenkins Admin to see and change the image for CI, which means someone with triggering rights to the build can start it without knowing they might erase the actively used image from OpenStack.

This problem also applies to node images. However, everyone currently has visibility into both Artifactory and the dev-env code, allowing them to see what image is used and understand how a new trigger will affect it.

What needs to be fixed

To address these issues, we need to ensure that the actively used image is never deleted. We also need a way to ensure that if the active image is changed, the new image will work properly through some testing. Additionally, any changes to an image build should be testable in the PR before merging.

  • Make the active image separate from the candidate images
  • Add a promotion process for changing the active image
  • Add tests for new image when changes happen to the DiB image workflow
  • Check that all file changes affecting the DiB workflow trigger tests and a new build trigger on merge

Potential solution

A potential solution could involve having the active image with a separate naming convention from the candidate images. For promotion, there would be a pipeline that takes a candidate image as input, runs tests on it, and if the tests pass, automatically changes the active image to the candidate.

Note: Jenkins also offers an artifactory plugin which supports promtion logic out of the box which could be investigated. Not sure if same exists for the openstack plugin

By implementing these changes, we can increase the reliability and safety of our image retention process, improve coordination among team members triggering builds, and reduce the risk of active image overwriting and build failures. Testing new images before they become active will ensure they are reliable and functional, providing a smoother and more predictable CI/CD process.

Collect logs from initContainers

Currently we only collect logs from the "normal" containers in each Pod (i.e. the ones defined in spec.containers). It would be good to also get the logs for init containers since it can sometimes happen that they get stuck.

Make the run_fetch_logs.sh script collect logs from spec.initContainers also.

Migration to dynamic worker workflow

We decided to use dynamic jenkins worker.
Here the progress of the move.

Done:
✅ clusterctl upgrade tests
✅ feature tests
✅ dev env integration tests
✅ e2e basic
✅ e2e_integration tests
✅ k8s_upgrade tests
✅ ephemeral tests
✅ bmo e2e

In progress:

Todo:
⚪ job periodic clean
⚪ Nordix clone
⚪ fullstack build

Note! We are are also splitting pipelines for dev_env and e2e_tests, and for feature tests we have already seperate pipeline.

Replace travis CI usage with prow jobs

We have travis-ci running some jobs against baremetal-operator and cluster-api-provider-baremetal. We should be able to replace those with prow jobs.

Migration to prow jenkins operator

Now we have moved most of the jjb and pipelines to cover trigger tests from prow this issue report the current status and state the next TODOs:

Done:

🟢 job_capm3_e2e_basic_tests.yml
🟢 job_capm3_e2e_clusterctl_upgrade_tests_prow.yml
🟢 job_capm3_e2e_feature_tests_prow.yml
🟢 job_capm3_e2e_integration_tests_prow.yml
🟢 job_capm3_e2e_k8s_upgrade_tests_prow.yml
🟢 job_capm3_periodic_e2e_clusterctl_upgrade_tests_prow.yml
🟢 job_capm3_periodic_e2e_feature_tests _prow.yml
🟢 job_capm3_periodic_e2e_integration_tests_prow.yml
🟢 job_capm3_periodic_e2e_k8s_upgrade_tests_prow.yml
🟢 job_periodic_clean.yml
🟢 job_capm3_periodic_e2e_ephemeral_tests.yml
🟢 job_capm3_periodic_integration_tests.yml
🟢 job_dev_env_integration_tests.yml
🟢 job_integration_tests.yml

In progress:

To do:

🟡 job_artifact_cleanup.yml
🟡 job_bml_integration_tests.yml
🟡 job_bml_periodic_integration_tests.yml
🟡 job_ci_image_building.yml
🟡 job_container_image_building.yaml
🟡 job_docker_image_building.yml
🟡 job_fullstack_building_test.yml
🟡 job_fullstack_project-infra_building_test.yml
🟡 job_ironic_image_build_test.yml
🟡 job_metal3_dev_tools_integration_test.yml
🟡 job_openstack_image_building.yml
🟡 job_openstack_node_image_building.yml
🟡 job_periodic_fullstack_building.yml
🟡 job_update_nordix_repos.yml

Update gh required checks by Admin

🟢 CAPM3 https://github.com/metal3-io/cluster-api-provider-metal3
🟢 IPAM https://github.com/metal3-io/ip-address-manager
🟢 BMO https://github.com/metal3-io/baremetal-operator
🟢 DEV_ENV https://github.com/metal3-io/metal3-dev-env
🟢 Project-infra https://github.com/metal3-io/project-infra
🟢 Ironic-image https://github.com/metal3-io/ironic-image
🟢 Mariadb-image https://github.com/metal3-io/mariadb-image
🟢 ironic-ipa-downloader https://github.com/metal3-io/ironic-ipa-downloader

To do

  • Remove Ubuntu required test from BMO config
  • BMO release 0.6 tests is not set at all apart from bmo e2e
  • For mariadb image repo add centos tests in project infra

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.