Giter Club home page Giter Club logo

cookiecutter-reproducible-research's Introduction

Reproduction

cookiecutter-reproducible-research

This repository provides cookiecutter templates for reproducible research projects. The templates do not attempt to be generic, but have a clear and opinionated focus.

Projects build with these templates aim at full automation, and use Python 3.10, mamba/conda, Git, Snakemake, and pandoc to create a HTML report out of raw data, code, and Markdown text. Fork, clone, or download this repository on GitHub if you want to change any of these.

The template includes a few lines of code as a demo to allow you to create a HTML report out of made-up simulation results right away. Read the README.md in the generated repository to see how.

Template types

default

This generates the basic structure of a reproducible workflow.

cluster

The cluster template extends the basic template by adding infrastructure to support running on a compute cluster.

Getting Started

Make sure you have cookiecutter installed, otherwise install it with conda:

conda install cookiecutter -c conda-forge

Then create a repository using:

cookiecutter gh:timtroendle/cookiecutter-reproducible-research --directory=[default/cluster]

You will be asked for the following parameters:

Parameter Description
project_name The name of your project, used in the documentation and report.
project_short_name An abbreviation, used for environments and such. Avoid special characters and whitespace.
author Your name.
institute The name of your institute, used for report metadata.
short_description A short description of the project, used for documentation and report.

The cluster template requires the following parameter values in addition:

Parameter Description
cluster_url The address of the cluster to allow syncing to and from the cluster.
cluster_base_dir The base path for the project on the cluster (default: ~/<project-short-name>).
cluster_type The type of job scheduler used on the cluster. Currently, only LSF is supported.

Project Structure

The generated repository will have the following structure:

├── config                  <- Configuration files, e.g., for your model if needed.
│   └── default.yaml        <- Default set of configuration parameter values.
├── data                    <- Raw input data.
├── envs                    <- Execution environments.
│   ├── default.yaml        <- Default execution environment.
│   ├── report.yaml         <- Environment for compilation of the report.
│   └── test.yaml           <- Environment for executing tests.
├── report                  <- All files creating the final report, usually text and figures.
│   ├── apa.csl             <- Citation style definition to be used in the report.
│   ├── literature.yaml     <- Bibliography file for the report.
│   ├── report.md           <- The report in Markdown.
│   └── pandoc-metadata.yaml<- Metadata for the report.
├── rules                   <- The place for all your Snakemake rules.
├── scripts                 <- Scripts go in here.
│   ├── model.py            <- Demo file.
│   └── vis.py              <- Demo file.
├── tests                   <- Automatic tests of the source code go in here.
│   └── test_model.py       <- Demo file.
├── .editorconfig           <- Editor agnostic configuration settings.
├── .flake8                 <- Linting settings for flake8.
├── .gitignore
├── environment.yaml        <- A file to create an environment to execute your project in.
├── LICENSE.md              <- MIT license description
├── Snakefile               <- Description of all computational steps to create results.
└── README.md

cluster templates additionally contain the following files:

├── config
│   └── cluster                 <- Cluster configuration.
│       ├── cluster-config.yaml <- A Snakemake cluster-config file.
│       └── config.yaml         <- A set of Snakemake command-line parameters for cluster execution.
├── envs
│   └── shell.yaml              <- An environment for shell rules.
├── rules
│   └── sync.yaml               <- Snakemake rules to sync to and from the cluster.
├── .syncignore-receive         <- Build files to ignore when receiving from the cluster.
└── .syncignore-send            <- Local files to ignore when sending to the cluster.

License

Some ideas for this cookiecutter template are taken from cookiecutter-data-science and mkrapp/cookiecutter-reproducible-science. This template is MIT licensed itself.

cookiecutter-reproducible-research's People

Contributors

timtroendle avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.