Giter Club home page Giter Club logo

migec's Introduction

MIGEC is free for academic use. This is a legacy version, the software is no longer supported, if you need similar software solutions please visit MiLaboratory.

JitPack

MiGEC: Molecular Identifier Guided Error Correction pipeline

This pipeline provides several useful tools for analysis of immune repertoire sequencing data. Its main feature is the ability to use information from unique nucleotide tags (UMIs, see this paper for details), which are attached to molecules before sequencing library preparation and allow to backtrack the original sequence of molecule. UMIs make it possible to computationally filter nearly all experimental errors from resulting immune receptor sequences.

This pipeline was designed for libraries sequenced using Illumina MiSeq and HiSeq and the main requirement for sequencing reads is that they should contain the entire CDR3 region of immune receptor gene. Sequencing libraries with high over-sequencing, i.e. ones that have 5+ reads per starting molecule (unique UMI tag), should be used for optimal error elimination.

Several modules of the pipeline, such as de-multiplexing and CDR3 extraction could be utilized for a wider range of datasets.

Compiled binaries are available from here. You can download them and execute as

java -jar migec.jar ...

Make sure that you've specified the full/correct path to jar file. In case of Java Heap Space exception, you can increase the JVM memory limit by adding -Xmx20G (for extra 20G) after the -jar argument.

The software is cross-platform and requires Java v1.8+ to run.

Easy installation on MacOS/Linux via Homebrew or Linuxbrew:

brew tap homebrew/science
brew tap mikessh/repseq
brew install migec
migec Checkout ...

See homebrew-repseq for other RepSeq analysis software Homebrew installers.

For more details please see the paper describing MiGEC.

Full documentation is provided via ReadTheDocs. You might be also interested in taking the following tutorial.

Please cite the tool as:

Shugay M et al. Towards error-free profiling of immune repertoires. Nature Methods 11, 653โ€“655 (2014)

You may be also interested in the following Nature Protocol describing MIGEC applications for full-length immunoglobulin sequencing.

Some notes on MiGEC study

The basic idea behind this project is to incorporate unique molecular identifier tags to trace individual cDNA molecules during library preparation and sequencing in order to eliminate experimental errors. This is especially crucial for B-cell studies as it is necessary to tell real hypermutations from hot-spot PCR and sequencing errors.

Here we provide a modified protocol for performing nearly error-free high-throughput sequencing of TCR and antibody repertoires, as well as a novel error-correction algorithm that successfully handles hot-spot PCR errors.

A detailed description of the study and protocol used could be found in our Nature Methods paper.

A software pipeline that allows analysis of immune repertoire sequencing data (for both T- and B-cells) prepared using our protocol is available at GitHub. Here are the links to readme and a package containing the executable Jar. The pipeline is written using Java/Groovy and has moderate system requirements for large datasets: it will analyze a HiSeq lane in 4-5 hours on a commodity server with 8 core Intel Xeon and 36 Gb RAM.

Raw datasets used in the study are deposed in SRA under accessions SRP040329. The barcode sequences that are necessary for de-multiplexing and UMI extraction with Checkout utility are available here. RNA for immunoglobulin heavy chain (IGH) from more than 600,000 B-cells from blood of a healthy individual was sequenced, yielding a repertoire of more than 100,000 clonotypes.

A set of spike-ins EHEB, EHEB-V1 (with one mismatch difference from EHEB-V0) and EHEB-V2 (with two mismatches) were added in order to monitor the efficiency of error filtration.

Have a glance at the analyzed data (done with legacy MIGEC version):

  • The spreadsheet demonstrating efficient error elimination for spike-in clonotypes with known sequences [Download]
  • The spreadsheet with whole repertoire [Download]
  • An interactive network of B-cell clonal trees (created using Cytoscape) [Download]

migec's People

Contributors

dbolotin avatar entropicus96 avatar fabio-t avatar mikessh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.