Giter Club home page Giter Club logo

singlecell's Introduction

SingleCell

SingleCell is a Python package for processing single-cell RNA-Seq data.

Requirements

  • Python 3 (tested with Python 3.5)
  • STAR (tested with version 2.5.3a)
  • samtools (tested with version 1.4.1)

The STAR and samtools executables must both be in the PATH. To test this, you can run the following commands, and check that they return the respective version identifiers:

$ STAR --version
STAR_2.5.3a

$ samtools --version
samtools 1.4.1
Using htslib 1.4.1
Copyright (C) 2017 Genome Research Ltd.

Installation

$ cd singlecell
$ pip install -e .

Creating a STAR index (only once)

To run the inDrop pipeline on your data, the first thing you need is a STAR genome index for the species that your data is from. A STAR index consists of a directory containing a bunch of files. For the human genome, the size of these files totals about 25 GB. You only need to create an index once (per species), which is then used by all future runs of the inDrop pipeline.

To generate an index, you need to download and decompress (using gunzip) the genome (in FASTA file) and genome annotations (in GTF format) for the species from the Ensembl FTP server. For example, for human:

$ curl -O http://ftp.ensembl.org/pub/release-88/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
$ gunzip Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz

$ curl -O http://ftp.ensembl.org/pub/release-88/gtf/homo_sapiens/Homo_sapiens.GRCh38.88.gtf.gz
$ gunzip -c Homo_sapiens.GRCh38.88.gtf.gz > Homo_sapiens.GRCh38.88.gtf

For the genome annotation (GTF) file, you want to also keep the compressed version, because this is the version used by the inDrop pipeline afterwards.

Now that you have those files ready, you can run the following:

$ indrop_generate_star_index.py -g Homo_sapiens.GRCh38.dna.primary_assembly.fa \
        -n Homo_sapiens.GRCh38.88.gtf \
        -od star_index_human -os build_star_index_human.sh \
        -ol build_star_index_human_log.txt \
        -t 16

This will output the STAR index in the directory "star_index_human" (see -od parameter), and will use 16 threads in parallel (-t), making the build process signficantly faster than if you were to run it single-threaded.

Running the inDrop pipeline

To run the inDrop pipeline, you need to first create a configuration file (in YAML format), which contains the locations (paths) of all the input files, specifies an output directory, and sets a few parameters (e.g., how many cells you want to include in the expression matrix). To generate a configuration file template that you can then modify according to your setup, run the following:

$ indrop_create_config_file.py -o my_configuration.yaml

After adjusting the parameters in the configuration file, you can check if everything is configured correctly:

$ indrop_check_pipeline.py -o my_configuration.yaml

If there are no errors, you can run the pipeline:

$ indrop_pipeline.py -c my_configuration.yaml

singlecell's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.