Giter Club home page Giter Club logo

rapmap's Introduction

What is RapMap?

Join the chat at https://gitter.im/COMBINE-lab/RapMap

RapMap is a testing ground for ideas in read mapping. That means that, at this point, it is somewhat experimental and there are no guarantees on stability / compatibility between commits. Currently, RapMap is a stand-alone quasi-mapper (and pseudo-aligner) that can be used with other tools. It is also being used as part of Sailfish and Salmon. Eventually, the hope is to create and stabilize an API so that it can be used as a library from other tools.

Lightweight-alignment / quasi-mapping / pseudo-alignment is the term I'm using here for the type of information required for certain tasks (e.g. transcript quantification) that is less "heavyweight" than what is provided by traditional alignment. For example, one may only need to know the transcripts / contigs to which a read aligns and, perhaps, the position within those transcripts rather than the optimal alignment and base-to-base CIGAR string that aligns the read and substring of the transcript. For details on RapMap (quasi-mapping in particular), please check out the pre-print on bioRxiv.

There are a number of different ways to collect such information, and the idea of RapMap (as the repository grows) will be to explore multiple different strategies in how to most rapidly determine all feasible / compatible locations for a read within the transcriptome. In this sense, it is like an all-mapper; the alignments it outputs are intended to be (eventually) disambiguated (Really, it's more like an "all-best" mapper, since it returns all hits in the top "stratum" of lightweight/quasi/pseudo mappings). If there is a need for it, best-mapper functionality may be added in the future.

Building RapMap

To build RapMap, you need a C++11 compliant compiler (g++ >= 4.7 and clang >= 3.4) and CMake. RapMap is built with the following steps (assuming that path_to_rapmap is the toplevel directory where you have cloned this repository):

[path_to_rapmap] > mkdir build && cd build
[path_to_rapmap/build] > cmake ..
[path_to_rapmap/build] > make
[path_to_rapmap/build] > make install
[path_to_rapmap/build] > cd ../bin
[path_to_rapmap/bin] > ./rapmap -h

This should output the standard help message for rapmap.

Can I use RapMap for genomic alignment?

No, at least not right now. The index and mapping strategy employed by RapMap are highly geared toward mapping to transcriptomes. It may be the case that some of these ideas can be successfully applied to genomic alignment, but this functionality is not currently suppored (and is not a high priority right now).

How fast is RapMap?

RapMap is very fast --- hence the name. On a synthetic test dataset comprised of 75 million 76bp paired-end reads, mapping to a human transcriptome with ~213,000 transcripts, RapMap takes ~ 10 minutes to align all of the reads on a single core (on an Intel Xeon E5-2690 @ 3.00 GHz) --- if you actually want to write out the alignments --- it depends on you disk speed, but for us it's ~15 minutes. Again, these mapping times are on a single core, and significant optimizations are still possible --- but RapMap is trivially parallelizable and can already be run with multiple threads.

OK, that's fast, but is it accurate?

Yes, in testing on synthetic data from multiple simulators, we find RapMap to be highly accurate. Moreover, when comparing RapMap against more traditional aligners, we find that it gives highly-concordant results. For more details, please check out the pre-print.

Caveats

RapMap is experimental, and the code, at this point, is subject to me testing out new ideas. This also means that little effort has been put into size or speed optimizaiton (but it's already very fast --- see above). There are numerous ways that the code can be sped up and the memory footprint reduced, but that hasn't been the focus yet --- it will be eventualy. All of this being said --- RapMap is open to the community because I'd like feedback / help / thoughts. So, if you're not scared off by any of the above, dig in!

External dependencies

tclap

cereal

Jellyfish

Google Sparse Hash

License

Since RapMap uses Jellyfish, it must be released under the GPL. However, this is currently the only GPL dependency. If it can be replaced, I'd like to re-license RapMap under the BSD license. I'd be happy to accept pull-requests that replace the Jellyfish components with a library released under a more liberal license (BSD-compatible), but note that I will not accept such pull requests if they reduce the speed or increase the memory consumption over the Jellyfish-based version.

rapmap's People

Contributors

rob-p avatar geetduggal avatar keyavi avatar vals avatar lynxoid avatar gitter-badger avatar

Watchers

James Cloos avatar love2018 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.