google / llvm-premerge-checks Goto Github PK

View Code? Open in Web Editor NEW

41.0 8.0 40.0 1.17 MB

CI system for premerge-testing in LLVM project

License: Apache License 2.0

Dockerfile 9.39% Shell 8.12% Python 72.27% PowerShell 5.83% Jupyter Notebook 3.50% Batchfile 0.03% PLpgSQL 0.87%

llvm-premerge-checks's Issues

Provide enough disk space to agents

Some data:

The ccache can take up to 20 GB and is shared between jobs on the same machine.
The job workspace for llvm is about 46 GB.
GCP provides 375GB of storage per SSD.
So we can only keep several workspace per pod before running out of disc space.

acceptance criteria:

The agents are configured to have enough storage to run the assigned about of jobs.
We can easily scale up the disc space, when adding more jobs to an agent.

acceptance criteria

the documentation is checked by running sphinx

strip down git clone operation

git clone --depth 1
git fetch
cache the git repo on the builder

hook it up to github Pull Requests

offer support for PRs on github (if the LLVM projects allows pull requests)

move build scripts to some repo, use pipelines

status quo:

the build steps are configured in a text box in Jenkins.
we do no have them in version control
you can't run them locally
the Jenkins jobs are configured in the UI

acceptance criteria:

build scripts are moved from the Jenkins text box to the LLVM repository
build scripts can be checked by build server before merging
build scripts can be code-reviewed before merging
Jenkins uses the "Pipeline" feature to configure jobs from a SCM
A build log is uploaded even when arc patch fails (solve #11)

documentation:

applying Phabricator diff manually: uber-archive/phabricator-jenkins-plugin#198 (comment)
access to the build logs: https://stackoverflow.com/questions/37018509/jenkinsfile-build-log

Create nicer user feedback in Phabricator.

acceptance criteria

The comment in Phabricator contains.

report build status: success/failure
direct links to:
- cmake log
- ninja log

idea

Instead of the Jenkins plugin, create a script that reports the results back.

documentation

Documentation: https://secure.phabricator.com/T9478

acceptance criteria

We have a new Cloud project with defined billing information.
The cluster is run from a new, separate "project" in GCP.
The create cluster script is cleaned up before setting up the cluster:
- The "services" node pool is removed
- The "default-pool" has only one n1-standard-4 machine.
- The proxy, Jenkins, etc. use the "default-pool".
- The "jenkins-agents" pool is unchanged (or scaled up if required).

make sure dependencies between patches on Phabricator works

When applying patches with parent diffs set something seems to go wrong repeatedly. Somehow arc patchis not able to apply these. I'm not really sure what the problem is and how we can work around this.

Maybe this is related to arc applying patches that were already merged. So maybe we need to manually iterate over the parents and check in the git log, if they were applied already...

Setup e-mail

acceptance criteria:

There is a "no-reply" email address configured from which Jenkins can send emails.
Replies to that address are deleted.
There is an admin-mailing list where the server can send error messages.

hint:

have a look at https://cloud.google.com/compute/docs/tutorials/sending-mail/

get static IP address

acceptance criteria:

the ingress has a static IP address assigned.
The domain is configured to use the static IP.

manual:
https://cloud.google.com/kubernetes-engine/docs/tutorials/configuring-domain-name-static-ip

Investigate llvm/test/tools/llvm-ar/mri-utf8.test

The premerge check (always) claims this test is failing.
However it passes locally for me, and AFAICS on all the buildbots.
So there may be something odd about the configuration of the premerge check workers (or maybe this is a real test failure in a valid though untested configuration and the test/code needs to be fixed)

announce beta test on mailing list

once we have a reaonable version:

announce a public beta test on the mailing list
add people to the herald config
point to documentation on github in email.

Only build and test changed modules

Ideas

Right now we're building and testing several projects in LLVM for every change (see script for configured projects).
To speed things up it would be nice to only build and test the projects that are affected by a patch. This would also reduce the number reported errors that are not related to the patch.
We could use the ENABLED_PROJECTS flag of cmake to select which projects to build and test.

Problems

We do not know the dependencies between files, projects and tests.
If we set up a dependency table, this needs to be maintained manually.

technical considerations

input options:
- run git diff and git status --short in the script to get the new and changed files and folders
- parse the patch from Phabricator for the changes. You can look at apply_patch.py for an example on how to get the diff from Phabricator.
output: print string to stdout that can be used for ENABLED_PROJECTS in CMake.

Auto-create a bug for failing tests/builds

acceptance criteria

If someone pushed a commit to master that breaks a test, a bug on buganizer is created automatically.
The bug is assigned to the person pushing the change.

Gather user feedback

Send out questions to all know beta testers and ask for feedback and recommendation on turning it on for all users.

make Jenkins log accessable

acceptance criteria:

the full Jenkins build log is available on the result server

idea

each build trigger another build that gets the results from the server (file system or web UI) and copies it to the result storage
look into Pipelines for this

Can't trigger build from Phabricator with CSRF check enabled

Despite using the Root token plugin, I can't trigger a build job from Phabricator with the "CSRF Protection" enabled in Jenkins. This is supposed to work and it also worked before. Looks like a bug in Jenkins...

create test report

acceptance criteria:

results of the tests are written to a test report.
test report is available via the web interface.
the comment on Phabricator lists the failed tests
The test report is nicely readable by a human, maybe we need to post-process is to html somehow.

This is also related to #14

Mark unrelated problems

acceptance criteria:

If a test on the parent revision of the patch fails, do not complain in the pre-merge tests based on that revision.

Ideas

Build every revision on master and remember which tests failed.
When running the tests on a patch, check which tests already failed on parent revision.
In the user feedback differentiate between failures on the parent and the patch.
Do not fail the build if all failing tests are already broken on the parent.

Understand performance issue with LIT

background

When running the LLVM test suite on a workstation, it takes ~ 70 sec.
When running it on the Kubernetes cluster on a 32 core machine...
- ... via the Jenkins agent it takes 25 min.
- ... via local login it takes 90 sec (which is what we expected).
The problem can be fixed with setting the open files limit uname -n 1024 (instead of the current value of ~1.000.000) before running the test suite. Values between 512 and 8192 were also tested and resulted in the same execution time as 1024.
We have no clue why that solves the performance problem.

acceptance criteria

We know why changing the ulimit impacts the LIT performance.
If it's a bug in LIT: either there is a bug report in LLVM or the bug in LIT is fixed.

Auto-revert failing patches

acceptance criteria:

all patches that cause a failure (failed build or failed test) on master are reverted automatically
we have community buy-in for this feature.

Set up monitoring and central logging

acceptance criteria

measure build times (from submitting a patch to phabricator until results are in)
measure CPU, RAM and disc usage on all machines
measure number of rollbacks/day

Look at other tools

acceptance criteria

we had a look at alternative tools to run the infrastructure
decide if it makes sense to switch

candidates

https://chromium.googlesource.com/infra/infra/+/master/doc/users/services/about_luci.md
Buildkite
we build our own, custom build scheduler
Github actions
cirrus-ci.org
https://circleci.com
https://concourse-ci.org/
Travis CI
...

switch on checks for all users on Phabricator

after #41 worked for a while: switch it on for all users

Build and test on Windows

acceptance criteria

Jenkins can build and test LLVM on a Windows machine
The Windows machine is running on GCP.
get failing tests fixed: https://bugs.llvm.org/show_bug.cgi?id=44151
automatically trigger Windows builds in Phabricator diffs
also show build results for Windows builds to users

documentation

setup staging for CI changes

acceptance criteria:

There is a way to test CI changes before rolling them out to users.
There is a workflow how to push changes first to testing and then to production.

Build in Phabricator does not finish

acceptance criteria

after a build is completed the status in Phabricator shows is as completed

consider using sscache

acceptance criteria

figure out if it makes sense to use sccache
if so: set it up

extend documentation

acceptance criteria

document the vision
document the current solution and UI
move installation instructions to different file
document the limitations

connect agents via swarm plugin

acceptance criteria

The agent are not configured statically
The agents are using the Swarm plugin
We can scale the number of agents by just changing the number of replicas.

documentation

Plugin: https://wiki.jenkins.io/display/JENKINS/Swarm+Plugin
Example agent: https://hub.docker.com/r/vfarcic/jenkins-swarm-agent

rename .log files to .txt files

...otherwise Chrome will complain when downloading/viewing them

Sign up for public beta testing

I you are interested in paricipating in the beta tests for the pre-merge checks:
Leave a comment on this issue with your Phabricator user name. We will then add you.

For the Phabricator integration and bug reports, please see the user documentation.

add build status shield

embeddable build status

Provides an embeddable image that shows current status (IFAIU) of the build.

I would imagine that it will work great if phabricator can show such embedded images in comments.

have infrastructure reviewed by security team

update a-records when IP changes

left over from #1:
automagically update the A-record for the domains when the IP address of the ingress changes.

Authenticate with github account

acceptance criteria

all Jenkins users are authenticated with their github accounts
build results are still public accessible

rationale

LLVM contributors need a github account anyway.
keeping the credentials in files is annoying
we might want to give more people access eventually
2-factor-authentication is cool

Manage storage properly

acceptance criteria:

ccache is in a volume and configured per machine
agent build dir is on a volume with enough storage

run build without asserts enabled and with ASAN enabled

request from Eric

acceptance criteria

There is build job compiling and testing without asserts
There is a build job compiling and testing with ASAN enabled.

Run tests on MacOS

At the moment all our resources are on Google Cloud and that does not offer MacOS machines. So we need someone else to host and pay for those.

Try goma/RBE to speed up the builds

fix failing test mri-utf8.test

The test seems to be only failing on our build machines. Figure out what is causing it and fix it.

add checks for clang-tidy and clang-format

acceptance criteria:

all patched are checked with clang-tidy
all patched are checked with clang-format
the results are reported to the Phabricator page
the checks only consider the modified lines (using clang-format-diff.py / clang-tidy-diff.py)

Improve feedback in Phabricator (even further)

acceptance criteria

the feedback in Phabricator looks something like this:

Ran `check-all`, 1 failures:
  LLVM.tools/llvm-ar::mr-utf8.test
Logs: [ninja log], [cmake log], [CMakeCache.txt]

acceptance criteria

the containers are set up in a way so that users can run the tests locally
there is documentation on how to do this
We clarified if we can share the containers (e.g. Windows licensing)
If we can share the containers: users can access the containers from a public Docker repository

Build + test on ARM

Scale compute power

acceptance criteria

All new/changed patches are checked within 2h.
We benchmarked the tests on "C" type machines to see if that's faster.
We have benchmarks for 16, 32 and 64 cores for:
- clean build
- cached build
- ninja check
- ninja check-all

setup authentication for swarm agents properly

acceptance criteria

swarm agent need to authenticate when connecting to the master
the secret for the agents is handled in kubernetes secrets

⚠️ review if this actually provides any additional security before implementing

documentation

https://wiki.jenkins.io/display/JENKINS/Swarm+Plugin

Print link to revision on Phabricator in build log

As a user I want to navigate easily from the build log to the revision in phabricator so that I can see what triggered the build.

acceptance criteria

a link to the revision in phabricator is shown in the build log
if possible: also add a link to the build job in jenkins to the change in phabricator

google / llvm-premerge-checks Goto Github PK

llvm-premerge-checks's Issues

acceptance criteria

status quo:

acceptance criteria:

documentation:

acceptance criteria

idea

documentation

acceptance criteria

Ideas

Problems

technical considerations

acceptance criteria

acceptance criteria:

idea

acceptance criteria:

Ideas

background

acceptance criteria

acceptance criteria

acceptance criteria

candidates

acceptance criteria

documentation

acceptance criteria

acceptance criteria

acceptance criteria

acceptance criteria

documentation

acceptance criteria

rationale

acceptance criteria

acceptance criteria:

acceptance criteria

acceptance criteria

acceptance criteria

acceptance criteria

documentation

acceptance criteria

Recommend Projects

Recommend Topics

Recommend Org