Giter Club home page Giter Club logo

debiancoinstevol's Introduction

Replication package for MSR 2015

We provide here the information necessary to replicate the historical analysis of coinstallability issues in the Debian distribution described in the article A historical analysis of Debian package incompatibilities by Maelick Claes, Tom Mens, Roberto Di Cosmo and Jérôme Vouillon.

This information consists of the code and scripts used to retrieve and analyse the historical information from the snapshot.debian.net archive.

The instruction provided here are intended for execution on a *nix system, especially a GNU/Linux based one, and preferentially a Debian based one, where the prerequisite tools are pre-packaged.

Prerequisites

Install or compile the coinst tool, which is described elsewhere. On a debian system, just issue the command

apt-get install coinst

Also install the statistical R environment. On a Debian based system a recent installation of R may be required. To fulfill this purpose one can use one of the additional package repository. Then issue the command

apt-get install r-base

Finally a few R package are also required. After running R itself one can install them as:

install.packages("data.table", "logging", "XML", "reshape2",
"stringr", "rjson", "igraph", "parallel", "ggplot2", "survival",
"directlabels", "scales", "devtools")

Note that it might require a few additional Debian package like libxml2-dev.

Finally installing the two packages can be done using the bash command:

R CMD INSTALL DebianEvolData
R CMD INSTALL DebianEvolAnalysis

Step 1: retrieve and process the historical data

First make sure there is a data folder where scripts will be run. Then run data retrieval script with Rscript:

mkdir -p data
Rscript scripts/data/raw.R
Rscript scripts/data/parse.R
Rscript scripts/data/process.R
Rscript scripts/data/aggregate.R

Note that this will require to have enough disk space to store raw and processed data. It can amount to more than 1To of data. It may also require a lot of memory, in particular for the aggregate data step. By default most operations will be run in parallel on 2, 4 or 6 processes. If you have less than 6 cores you can adjust the number of process to use in the scripts. Using a lot of cores may also require you to have more memory.

Step 2: perform the statistical analysis

To avoid having to run data retrieval and processing step, we provide the required aggregated data sets as R serialized data.table objects.

Make sure an images folder exists then run the analysis which will generate svg and pdf plots in this folder:

mkdir -p images
Rscript scripts/analysis/history.R
Rscript scripts/analysis/survival.R

debiancoinstevol's People

Contributors

maelick avatar rdicosmo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.