From DNAnexus R&D: a scalable datastore for population genome sequencing, with on-demand joint genotyping. (GL, genotype likelihood)
This is an early-stage R&D project we're developing openly. The code doesn't yet do anything useful! There's a wiki project roadmap, which should be read in the spirit of "plans are worthless, but planning is indispensable."
First install gcc 4.9 or higher, cmake
libjemalloc-dev
libboost-dev
libzip-dev
libsnappy-dev
liblz4-dev
libbz2-dev
python-pyvcf
. Then:
cmake -Dtest=ON . && make && ./unit_tests
Other dependencies (should be set up automatically by CMake):
Evolving developer documentation can be found on the project github page.
- C++14 - take advantage of the goodies
- Use smart pointers to avoid passing resources needing manual deallocation across function/class boundaries
- Prefer references over pointers when they shouldn't be null nor change ever.
- Avoid exceptions; prefer returning a
Status
, defined early in types.h - nb the frequently-used convenience macro
S()
defined just belowStatus
- Avoid public constructors with nontrivial bodies; prefer static initializer function returning
Status
- Avoid elaborate templated class hierarchies
The code has some hooks for performance profiling using
perf
and
FlameGraph.
To profile performance within the DNAnexus applet run the applet as
usual plus -i enable_perf=true
. This produces an output file
genotype.stacks
containing sampling observation counts for common call
stacks. To generate an SVG visualization with FlameGraph:
git clone https://github.com/brendangregg/FlameGraph
FlameGraph/flamegraph.pl < genotype.stacks > genotype.svg