Giter Club home page Giter Club logo

checkpoint's Introduction

checkpoint - Simple reproducibility for R scripts that depend on packages

Build status

master: Build Status release: Build Status dev: Build Status

Version on CRAN

CRAN_Status_Badge

Overview

The goal of checkpoint is to solve the problem of package reproducibility in R. Specifically, checkpoint solve the problems that occur when you don't have the correct versions of R packages. Since packages get updated on CRAN all the time, it can be difficult to recreate an environment where all your packages are consistent with some earlier state.

To solve this, checkpoint allows you to install packages from a specific snapshot date. In other words, checkpoint makes it possible to install package versions from a specific date in the past, as if you had a CRAN time machine.

Checkpoint Features

With the checkpoint package, you can easily:

  • Write R scripts or projects using package versions from a specific point in time;
  • Write R scripts that use older versions of packages, or packages that are no longer available on CRAN;
  • Install packages (or package versions) visible only to a specific project, without affecting other R projects or R users on the same system;
  • Manage multiple projects that use different package versions;
  • Share R scripts with others that will automatically install the appropriate package versions;
  • Write and share code R whose results can be reproduced, even if new (and possibly incompatible) package versions are released later.

Using the checkpoint function

Using checkpoint is simple:

  • The checkpoint package has only a single function, checkpoint() where you specify the snapshot date.
  • Example: checkpoint("2015-01-15") instructs R to install and use only package versions that existed on January 15, 2015.

To write R code for reproducibility, simply begin your master R script as follows:

library(checkpoint)
checkpoint("2015-01-15") ## or any date in YYYY-MM-DD format after 2014-09-17

Choose a snapshot date that includes the package versions you need for your script (or today's date, to get the latest versions). Any package version published since September 17, 2014 is available for use.

Sharing your scripts for reproducibility

Sharing your R analysis reproducibly can be as easy as emailing a single R script. Begin your script with the following commands:

  • Load the checkpoint package using library(checkpoint)
  • Ensure you specify checkpoint() with your checkpoint date, e.g. checkpoint("2014-10-01")

Then send this script to your collaborators. When they run this script on their machine, checkpoint will perform the same steps of installing the necessary packages, creating the checkpoint snapshot folder and producing the same results.

How checkpoint works

When you create a checkpoint, the checkpoint() function performs the following:

  • Creates a snapshot folder to install packages. This library folder is located at ~/.checkpoint
  • Scans your project folder for all packages used. Specifically, it searches for all instances of library() and requires() in your code.
  • Installs these packages from the MRAN snapshot into your snapshot folder using install.packages()
  • Sets options for your CRAN mirror to point to a MRAN snapshot, i.e. modify options(repos)

This means the remainder of your script will run with the packages from a specific date.

Where checkpoint finds historic package versions

To achieve reproducibility, once a day we create a complete snapshot of CRAN, on the "Managed R archived network" (MRAN) server. At midnight (UTC) MRAN mirrors all of CRAN and saves a snapshot. (MRAN has been storing daily snapshots since September 17, 2014.) This allows you to install packages from a snapshot date, thus "going back in time" to this date, by installing packages as they were at that snapshot date.

Together, the checkpoint package and the MRAN server act as a CRAN time machine. The checkpoint() function installs the packages to a local library exactly as they were at the specified point in time. Only those packages are available to your session, thereby avoiding any package updates that came later and may have altered your results. In this way, anyone using checkpoint() can ensure the reproducibility of your scripts or projects at any time.

Resetting the checkpoint

To revert to your default CRAN mirror and access globally-installed packages, simply restart your R session.

Worked example


# Create temporary project and set working directory

example_project <- paste0("~/checkpoint_example_project_", Sys.Date())

dir.create(example_project, recursive = TRUE)
oldwd <- setwd(example_project)


# Write dummy code file to project

cat("library(MASS)", "library(foreach)",
    sep="\n", 
    file="checkpoint_example_code.R")


# Create a checkpoint by specifying a snapshot date

library(checkpoint)
checkpoint("2014-10-01")

# Check that CRAN mirror is set to MRAN snapshot
getOption("repos")

# Check that library path is set to ~/.checkpoint
.libPaths()

# Check which packages are installed in checkpoint library
installed.packages()

# cleanup
unlink(example_project, recursive = TRUE)
setwd(oldwd)

Installation

To install checkpoint directly from CRAN, use:

install.packages("checkpoint")
library("checkpoint")

To install checkpoint directly from github, use the devtools package. In your R session, try:

install.packages("devtools")
devtools::install_github("RevolutionAnalytics/checkpoint")
library("checkpoint")

Using knitr and rmarkdown with checkpoint

Although checkpoint will scan for dependencies in .Rmd files if knitr is installed, it does not automatically install the knitr or rmarkdown packages.

To build your .Rmd files, you will have to add a script in your project that explicitly loads all the packages required to build your .Rmd files.

A line like the following may be sufficient:

library(rmarkdown)

This should automatically resolve dependencies on the packages knitr, yaml and htmltools

To build your rmarkdown file, use a call to rmarkdown::render(). For example, to build a file called example.Rmd, use:

rmarkdown::render("example.Rmd")

More information

Issues

Post an issue on the Issue tracker at https://github.com/RevolutionAnalytics/checkpoint/issues

Project website

http://projects.revolutionanalytics.com/rrt/

Checkpoint server

https://github.com/RevolutionAnalytics/checkpoint-server

Made by

Revolution Analytics

checkpoint's People

Contributors

andrie avatar chatchavan avatar piccolbo avatar revodavid avatar sckott avatar swells avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.