Giter Club home page Giter Club logo

GitBug-Actions

GitBug-Actions is a tool that builds bug-fix benchmarks by leveraging GitHub Actions. The tool mines GitHub repositories and navigates through their commits, locally executing GitHub Actions using act in each commit considered. Finally, the tool checks if a bug-fix pattern was found by looking at the test results parsed from the GitHub Actions runs. If a bug-fix is found, GitBug-Actions is able to export a Docker image with the reproducible environment for the bug-fix. The reproducible environment will preserve all the dependencies required to run the tests for the bug-fix, avoiding the degradation of the benchmark due to dependencies that become unavailable.

If you use GitBug-Actions, please cite:

GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions (doi:10.1145/3639478.3640023)

@inproceedings{gitbugactions,
 title = {GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions},
 year = {2024},
 doi = {10.1145/3639478.3640023},
 author = {Saavedra, Nuno and Silva, Andr{\'e} and Monperrus, Martin},
 booktitle = {Proceedings of the ACM/IEEE 46th International Conference on Software Engineering: Companion Proceedings},
}

Requirements

Act

It is required to have act installed and functional. At the moment, GitBug-Actions only works correctly with the modified version of act available here. Other versions will work but some issues may arise.

To install this version:

git clone https://github.com/gitbugactions/act
cd act
make build

A binary file dist/local/act will be created. This binary file should be made available in the $PATH of the system:

export PATH="<REPLACE_WITH_PATH_TO_ACT>:$PATH"

Python dependencies

GitBug-Actions runs on Python3.10 and above.

Ensure Poetry is installed.

Then, to install the Python dependencies run:

poetry shell
poetry install

How to run

Ensure the commands are executed inside the Poetry shell:

poetry shell

Set the environment variable GITHUB_ACCESS_TOKEN with your GitHub access token. The token is used to perform calls to GitHub's API.

export GITHUB_ACCESS_TOKEN="<YOUR_ACCESS_TOKEN>"

Use the --help command to obtain the list of options required to run each script.

python collect_repos.py --help
python collect_bugs.py --help
python export_bugs.py --help
python filter_bugs.py --help

Overview of GitBug-Actions

The figure above provides an overview of the pipeline of GitBug-Actions.

The scripts above should be executed in the same order shown on the figure. collect_bugs will use the repositories found by collect_repos as input. export_bugs uses the bug-fixes found by collect_bugs as input. Finally, filter_bugs uses the bug-fixes found by collect_bugs and the containers exported by export_bugs as input. The output of filter_bugs is a file with a list of non-flaky bug-fixes able to be reproduced in the exported containers.

Tests

To run the tests:

pytest test -s

Practical Challenges

While developing GitBug-Actions, we found some challenges of running CI builds at a large scale. Here we enumerate these challenges and explain how we mitigate them and, in cases that was not possible, how the user should handle them.

Handling Commits without GitHub Actions

One challenge in collecting bug-fix commit pairs by reproducing GitHub Actions is that GitHub Actions were only released in late 2019. Moreover, albeit being the most popular as of 2023, its adoption was not immediate. As a result the majority of commits found on GitHub do not have any associated workflows.

To increase the number of supported commits by GitBug-Actions, it identifies the oldest locally reproducible GitHub Action for each project. Then, for commits not associated with GitHub Actions, GitBug-Actions uses these as an approximation of the intended configuration.

Disk Space Management

Build execution has the potential to exhaust available disk space. To mitigate this, we restrict each build's allocation to a maximum of 3GiB. This restriction is handled by our version of act. However, users are advised to check disk usage frequently and remove dangling docker containers/images in case they occur.

Concurrent File Access

CI builds may initiate concurrent file access operations, a situation that can escalate to the point of surpassing the user-level open-file limit set by Linux. This is exarcebated when running multiple builds in parallel. To overcome this, we recommend setting the open-file limit for your user profile to a higher threshold than the default.

To check the current limit for your user run ulimit -Sn.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

gitbugactions's Projects

act icon act

Run your GitHub Actions locally 🚀

adr-j icon adr-j

Java based command-line tool for working with Architecture Decision Records

ari-proxy icon ari-proxy

Ari-proxy connects Asterisk, an open source communication server, to the Apache Kafka distributed streaming platform.

aws-secretsmanager-jdbc icon aws-secretsmanager-jdbc

The AWS Secrets Manager JDBC Library enables Java developers to easily connect to SQL databases using secrets stored in AWS Secrets Manager.

cbor-java icon cbor-java

Java implementation of RFC 7049: Concise Binary Object Representation (CBOR)

chesslib icon chesslib

chess library for legal move generation, FEN/PGN parsing and more

cloudsimplus icon cloudsimplus

State-of-the-art Framework 🏗 for Cloud Computing ⛅️ Simulation: a modern, full-featured, easier-to-use, highly extensible 🧩, faster 🚀 and more accurate ☕️ Java 17+ tool for cloud computing research 🎓. Examples: https://github.com/cloudsimplus/cloudsimplus-examples

common-custom-user-data-gradle-plugin icon common-custom-user-data-gradle-plugin

Gradle plugin that enhances published build scans by adding a set of tags, links and custom values that have proven to be useful for many projects building with Gradle Enterprise.

configme icon configme

A simple configuration management library for any Java project!

crawler-commons icon crawler-commons

A set of reusable Java components that implement functionality common to any web crawler

database-engine icon database-engine

Database Engine that supports Creating tables, Inserting, Deleting, Updating tuples, Selecting from table, and Creating Index

dataframe-ec icon dataframe-ec

A tabular data structure (aka a data frame) based on the Eclipse Collections framework

dotenv-java icon dotenv-java

🗝️ Dotenv is a no-dep, pure Java module that loads environment variables from a .env file

ensembler icon ensembler

Ensembler: the Remote Ensemble Registration System

epubcheck icon epubcheck

The conformance checker for EPUB publications

evalex icon evalex

EvalEx is a handy expression evaluator for Java, that allows to evaluate simple mathematical and boolean expressions.

event-ruler icon event-ruler

Event Ruler is a Java library that allows matching many thousands of Events per second to any number of expressive and sophisticated rules.

frigga icon frigga

Utilities for working with Asgard named objects

gitbugactions icon gitbugactions

⚙️ A tool to build bug-fix benchmarks with GitHub Actions ⚙️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.