Giter Club home page Giter Club logo

llvm-premerge-checks's Issues

Provide enough disk space to agents

Some data:

  • The ccache can take up to 20 GB and is shared between jobs on the same machine.
  • The job workspace for llvm is about 46 GB.
  • GCP provides 375GB of storage per SSD.
  • So we can only keep several workspace per pod before running out of disc space.

acceptance criteria:

  • The agents are configured to have enough storage to run the assigned about of jobs.
  • We can easily scale up the disc space, when adding more jobs to an agent.

run sphinx

acceptance criteria

  • the documentation is checked by running sphinx

move build scripts to some repo, use pipelines

status quo:

  • the build steps are configured in a text box in Jenkins.
  • we do no have them in version control
  • you can't run them locally
  • the Jenkins jobs are configured in the UI

acceptance criteria:

  • build scripts are moved from the Jenkins text box to the LLVM repository
  • build scripts can be checked by build server before merging
  • build scripts can be code-reviewed before merging
  • Jenkins uses the "Pipeline" feature to configure jobs from a SCM
  • A build log is uploaded even when arc patch fails (solve #11)

documentation:

Create separate GCP project

acceptance criteria

  • We have a new Cloud project with defined billing information.
  • The cluster is run from a new, separate "project" in GCP.
  • The create cluster script is cleaned up before setting up the cluster:
    • The "services" node pool is removed
    • The "default-pool" has only one n1-standard-4 machine.
    • The proxy, Jenkins, etc. use the "default-pool".
    • The "jenkins-agents" pool is unchanged (or scaled up if required).

make sure dependencies between patches on Phabricator works

When applying patches with parent diffs set something seems to go wrong repeatedly. Somehow arc patchis not able to apply these. I'm not really sure what the problem is and how we can work around this.

Maybe this is related to arc applying patches that were already merged. So maybe we need to manually iterate over the parents and check in the git log, if they were applied already...

Investigate llvm/test/tools/llvm-ar/mri-utf8.test

The premerge check (always) claims this test is failing.
However it passes locally for me, and AFAICS on all the buildbots.
So there may be something odd about the configuration of the premerge check workers (or maybe this is a real test failure in a valid though untested configuration and the test/code needs to be fixed)

announce beta test on mailing list

once we have a reaonable version:

  • announce a public beta test on the mailing list
  • add people to the herald config
  • point to documentation on github in email.

Only build and test changed modules

Ideas

  • Right now we're building and testing several projects in LLVM for every change (see script for configured projects).
  • To speed things up it would be nice to only build and test the projects that are affected by a patch. This would also reduce the number reported errors that are not related to the patch.
  • We could use the ENABLED_PROJECTS flag of cmake to select which projects to build and test.

Problems

  • We do not know the dependencies between files, projects and tests.
  • If we set up a dependency table, this needs to be maintained manually.

technical considerations

  • input options:
    • run git diff and git status --short in the script to get the new and changed files and folders
    • parse the patch from Phabricator for the changes. You can look at apply_patch.py for an example on how to get the diff from Phabricator.
  • output: print string to stdout that can be used for ENABLED_PROJECTS in CMake.

Auto-create a bug for failing tests/builds

acceptance criteria

  • If someone pushed a commit to master that breaks a test, a bug on buganizer is created automatically.
  • The bug is assigned to the person pushing the change.

Gather user feedback

Send out questions to all know beta testers and ask for feedback and recommendation on turning it on for all users.

make Jenkins log accessable

acceptance criteria:

  • the full Jenkins build log is available on the result server

idea

  • each build trigger another build that gets the results from the server (file system or web UI) and copies it to the result storage
  • look into Pipelines for this

create test report

acceptance criteria:

  • results of the tests are written to a test report.
  • test report is available via the web interface.
  • the comment on Phabricator lists the failed tests
  • The test report is nicely readable by a human, maybe we need to post-process is to html somehow.

This is also related to #14

Mark unrelated problems

acceptance criteria:

If a test on the parent revision of the patch fails, do not complain in the pre-merge tests based on that revision.

Ideas

  • Build every revision on master and remember which tests failed.
  • When running the tests on a patch, check which tests already failed on parent revision.
  • In the user feedback differentiate between failures on the parent and the patch.
  • Do not fail the build if all failing tests are already broken on the parent.

Understand performance issue with LIT

background

  • When running the LLVM test suite on a workstation, it takes ~ 70 sec.
  • When running it on the Kubernetes cluster on a 32 core machine...
    • ... via the Jenkins agent it takes 25 min.
    • ... via local login it takes 90 sec (which is what we expected).
  • The problem can be fixed with setting the open files limit uname -n 1024 (instead of the current value of ~1.000.000) before running the test suite. Values between 512 and 8192 were also tested and resulted in the same execution time as 1024.
  • We have no clue why that solves the performance problem.

acceptance criteria

  • We know why changing the ulimit impacts the LIT performance.
  • If it's a bug in LIT: either there is a bug report in LLVM or the bug in LIT is fixed.

Auto-revert failing patches

acceptance criteria:

  • all patches that cause a failure (failed build or failed test) on master are reverted automatically
  • we have community buy-in for this feature.

Set up monitoring and central logging

acceptance criteria

  • measure build times (from submitting a patch to phabricator until results are in)
  • measure CPU, RAM and disc usage on all machines
  • measure number of rollbacks/day

Build and test on Windows

acceptance criteria

  • Jenkins can build and test LLVM on a Windows machine
  • The Windows machine is running on GCP.
  • get failing tests fixed: https://bugs.llvm.org/show_bug.cgi?id=44151
  • automatically trigger Windows builds in Phabricator diffs
  • also show build results for Windows builds to users

documentation

setup staging for CI changes

acceptance criteria:

  • There is a way to test CI changes before rolling them out to users.
  • There is a workflow how to push changes first to testing and then to production.

extend documentation

acceptance criteria

  • document the vision
  • document the current solution and UI
  • move installation instructions to different file
  • document the limitations

Sign up for public beta testing

I you are interested in paricipating in the beta tests for the pre-merge checks:
Leave a comment on this issue with your Phabricator user name. We will then add you.

For the Phabricator integration and bug reports, please see the user documentation.

Authenticate with github account

acceptance criteria

  • all Jenkins users are authenticated with their github accounts
  • build results are still public accessible

rationale

  • LLVM contributors need a github account anyway.
  • keeping the credentials in files is annoying
  • we might want to give more people access eventually
  • 2-factor-authentication is cool

Manage storage properly

acceptance criteria:

  • ccache is in a volume and configured per machine
  • agent build dir is on a volume with enough storage

Run tests on MacOS

At the moment all our resources are on Google Cloud and that does not offer MacOS machines. So we need someone else to host and pay for those.

add checks for clang-tidy and clang-format

acceptance criteria:

  • all patched are checked with clang-tidy
  • all patched are checked with clang-format
  • the results are reported to the Phabricator page
  • the checks only consider the modified lines (using clang-format-diff.py / clang-tidy-diff.py)

Improve feedback in Phabricator (even further)

acceptance criteria

  • the feedback in Phabricator looks something like this:
Ran `check-all`, 1 failures:
  LLVM.tools/llvm-ar::mr-utf8.test
Logs: [ninja log], [cmake log], [CMakeCache.txt]

Jenkins agent failure should lead to restart

If build / test has failed due to e.g. machine restart Jenkins gives up and reports build as failed.
Would be nice to restart build automatically in such cases.

Initial search found naginator plugin that should do the trick.

reproduce build locally

acceptance criteria

  • the containers are set up in a way so that users can run the tests locally
  • there is documentation on how to do this
  • We clarified if we can share the containers (e.g. Windows licensing)
  • If we can share the containers: users can access the containers from a public Docker repository

Scale compute power

acceptance criteria

  • All new/changed patches are checked within 2h.
  • We benchmarked the tests on "C" type machines to see if that's faster.
  • We have benchmarks for 16, 32 and 64 cores for:
    • clean build
    • cached build
    • ninja check
    • ninja check-all

Print link to revision on Phabricator in build log

As a user I want to navigate easily from the build log to the revision in phabricator so that I can see what triggered the build.

acceptance criteria

  • a link to the revision in phabricator is shown in the build log
  • if possible: also add a link to the build job in jenkins to the change in phabricator

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.