This repository has been forked from https://bitbucket.org/rjust/fault-localization-data/overview . For analysis data and results visit http://bit.ly/pr_research_spreadsheet . Change sheets from the bottom bar for different results.
This repository contains data files, data-collection scripts, and data-analysis scripts of the "Evaluating and Improving Fault Localization Techniques" project. Before exploring this repository, please read the technical report that describes the results.
The experiments evaluate various fault localization techniques on artificial faults and on real faults.
At a high level, here's how it all works:
- The real and artificial faults come from the Defects4J Project.
- For each D4J fault, the scripts in
d4j_integration/
determine which lines are faulty. The resultant files are "buggy-lines" files, and live inanalysis/pipeline-scripts/buggy-lines/
. - Many fault localization techniques require coverage information. We use GZoltar to gather coverage information. The resultant files are called "matrix" and "spectra".
- Mutation-based fault localization (MBFL) techniques require mutation analysis. Our Killmap project (which lives in
killmap/
) does mutation analysis on all faults. The resultant files are called "killmaps," and specify how each test behaves on each mutant. (Each killmap also has an associated "mutants-log" file, which describes all the mutants that were analyzed.) - Our scripts enable you to compute all the mutation and coverage information, but doing so takes a great deal of computation. The resulting mutation/coverage information is available at http://fault-localization.cs.washington.edu.
- The "scoring pipeline" (which lives in
analysis/pipeline-scripts/
) determines how well each FL technique does on each fault -- that is, where the real buggy lines appear in the FL technique's ranking of the line of the program. The results appear indata/
.
Before doing anything else, run ./setup.sh
. This:
- clones the appropriate Defects4J fork (unless you've already exported a
D4J_HOME
directory); - updates your
.bashrc
to export some environment variables:D4J_HOME
andDEFECTS4J_HOME
, pointing to the newdefects4j
repository, if it neededFL_DATA_HOME
, pointing hereKILLMAP_HOME
, pointing at./killmap/
GZOLTAR_JAR
, pointing to./gzoltar/gzoltar.jar
The workflow to score a set of FL techniques on a given fault looks like this:
-
Various pieces of fault information were generated by the tools in
./d4j_integration/
and then checked in. You don't need to generate them yourself, but if you want to, see theREADME.md
in that directory. -
To run GZoltar, use
gzoltar/run_gzoltar.sh
.Example invocation:
bash run_gzoltar.sh Lang 37 . developer
Creates the files
./matrix
and./spectra
. -
To run Killmap, use
killmap/scripts/generate-matrix
.Example invocation:
killmap/scripts/generate-matrix \ Lang 37 \ /tmp/Lang-37 \ Lang-37.mutants.log \ | gzip > Lang-37.killmap.csv.gz
Creates the files
Lang-37.killmap.csv.gz
andLang-37.mutants.log
. -
To run the scoring pipeline, use
analysis/pipeline-scripts/do-full-analysis
.Example invocation:
analysis/pipeline-scripts/do-full-analysis \ Lang 37 'developer' \ ./matrix ./spectra \ Lang-37.killmap.csv.gz Lang-37.mutants.log \ /tmp/Lang-37-scoring \ Lang-37.scores.csv`
Creates the file
Lang-37.scores.csv
.
For more details on any of these scripts, see the README.md
in the script's directory.
If you want to skip running GZoltar and Killmap (which can be very computationally expensive), you can download the resulting files from http://fault-localization.cs.washington.edu.
-
analysis/
: Tools for analyzing the output of coverage/mutation analyses. -
aws/
: Scripts for computing killmaps on AWS. -
cluster_scripts/
: Scripts for computing killmaps on a Sun Grid cluster. -
d4j_integration/
: Scripts that build upon or extend Defects4J to populate or query its database. -
data/
: Data files for the final results and corresponding support scripts. -
gzoltar/
: Scripts for running the GZoltar tool to collect line coverage information. -
killmap/
: Mutation-analysis tool whose output is used for the MBFL techniques we study. -
stats/
: R scripts that crunch the data to produce numbers for the paper. -
utils/
: Utility programs and libraries for running/analyzing tests and parsing data files.