Giter Club home page Giter Club logo

toil-rnaseq's Introduction

Computational Genomics Lab, Genomics Institute, UC Santa Cruz

Toil RNA-Seq Pipeline

Scalable, reproducible, and robust RNA-seq expression quantification.

The Toil RNA-seq workflow converts RNA sequencing data into gene- and transcript-level expression quantification.

Please open issues for any bugs, errors, corrections, or feature requests.

If there are any questions not answered by this README or the wiki, contact John Vivian.

Appendix

For detailed information and troubleshooting, see the Wiki

Overview

Toil-RNAseq DAG

This workflow takes RNA sequencing reads (fastq / BAM) as input and outputs the following (if all options enabled):

<SAMPLE>
├── Kallisto
│   ├── abundance.h5
│   ├── abundance.tsv
│   ├── fusion.txt
│   └── run_info.json
├── QC
│   ├── fastQC
│   │   ├── R1_fastqc.html
│   │   ├── R1_fastqc.zip
│   │   ├── R2_fastqc.html
│   │   └── R2_fastqc.zip
│   └── STAR
│       ├── Log.final.out
│       └── SJ.out.tab
├── Hera
│   ├── abundance.h5
│   ├── abundance.tsv
│   ├── fusion.bedpe (paired-end data only)
│   └── summary
└── RSEM
    ├── Hugo
    │   ├── rsem_genes.hugo.results
    │   └── rsem_isoforms.hugo.results
    ├── rsem_genes.results
    └── rsem_isoforms.results

If the user selects options such as save-bam, or wiggle, additional files will appear in the output directory:

  • SAMPLE.sorted.bam
  • SAMPLE.wiggle.bg

The output tarball is prepended with the unique name for the sample (e.g. SAMPLE.tar.gz).

Dependencies and Installation

This workflow has been tested on Ubuntu 14.04, 16.04 and Mac OSX, but should also run on other unix based systems.
apt-get and pip often require sudo privilege, so if the below commands fail, try prepending sudo.
If you do not have sudo privileges you will need to build these tools from source, or bug a sysadmin about how to get them (they don't mind).

General Dependencies

1. Python 2.7
2. Curl         apt-get install curl
3. Docker       http://docs.docker.com/engine/installation/

Python Dependencies

1. Toil         pip install toil
2. S3AM         pip install s3am (optional, needed for uploading output to S3)

System Dependencies

The workflow requires approximately 50-60G of RAM in order to run STAR alignment.

Installation

The Toil RNA-seq workflow is pip installable!

For most users, the preferred installation method is inside a virtualenv to avoid dependency conflicts:

  • virtualenv ~/toil-rnaseq
  • source ~/toil-rnaseq/bin/activate
  • pip install toil-rnaseq

After installation, the workflow can be executed by typing toil-rnaseq into the teriminal.

Quickstart

First, obtain all of the necessary workflow inputs.

Then, type toil-rnaseq to get basic help menu and instructions

  1. Type toil-rnaseq generate to create an editable manifest and config in the current working directory.
  2. Parameterize the workflow by editing the config.
  3. Fill in the manifest with information pertaining to your samples.
  4. Type toil-rnaseq run [jobStore] to execute the workflow.

Citation

If you use this workflow to produce data for published research please cite the Toil white paper:

Vivian, J. et al. 
Toil enables reproducible, open source, big biomedical data analyses. 
Nat Biotech 35, 314–316 (2017).

toil-rnaseq's People

Contributors

jvivian avatar briandoconnor avatar wshands avatar hannes-ucsc avatar cket avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.