Giter Club home page Giter Club logo

svaha's Introduction

svaha - generate variation graphs for structural variants.

Make variation graphs from structural variants:
[x] Deletions [x] Inversions [x] Insertions [x] SNPs [ ] Duplications [ ] Transversions [ ] Breakpoints

Don't worry: we'll be adding these as time permits.

What is svaha?

svaha is a small program that converts Variant Call Format (VCF) records into Graphical Fragment Assembly format (i.e. sequence graphs like those in vg). It does so using a minimal single-base graph representation, the world's smallest and least-safe VCF parser (well, probably), and almost no dependencies.

Build it

svaha brings in its own libraries, except for zlib. Make sure to have zlib installed. It uses a frozen version of htslib and floating versions of gfakluge. To build svaha:

            git clone --recursive https://github.com/edawson/svaha
            make

and that should do it.

Run svaha

svaha takes a FASTA file and a VCF as arguments:
./svaha -r MYFASTA.fa -v MYVARIATION.vcf

and outputs sorted GFA, which is text-based and easily exchangeable to other, more useful programs (like vg).

Options

-r: a fasta reference
-v: a vcf containing variants (must be relative to the given fasta)
-m: maximum node size. When creating graphs for vg, make sure to use a maximum node size of between 32 and 1023.
1023 is a hard limit (nothing 1024 or over will be indexable) and below 32 the graph begins to eat tons of memory. I tend to use -m 64 or -m 128.

Workflows

  1. Build a variation graph with svaha containing structural variants
  2. Reduce node size with a cat result.gfa | vg view -F -v - | vg mod -X 1000 - > new_graph.vg to make the resulting graph indexable with GCSA2.
  3. Map reads to that graph using vg map
  4. Call variants using vg call or vg genotype

Get help

Reach out to me (@edawson) on GitHub and I'll do my best to help!

svaha's People

Contributors

edawson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

dcgenomics

svaha's Issues

make errors

/usr/bin/ld: libhts.a(rANS_static.o): relocation R_X86_64_32 against .rodata' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: libhts.a(thread_pool.o): relocation R_X86_64_32 against .text' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: libhts.a(vlen.o): relocation R_X86_64_32S against .rodata' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: libhts.a(zfio.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: libhts.a(knetfile.o): relocation R_X86_64_32 against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: libhts.a(md5.o): relocation R_X86_64_32S against .rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: libhts.a(sam.o): relocation R_X86_64_32 against .rodata.str1.8' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: libhts.a(files.o): relocation R_X86_64_PC32 against symbol __xstat@@GLIBC_2.2.5' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:294: bgzip] Error 1
make[1]: Leaving directory '/home/dnanexus/svaha/deps/htslib'
make: *** [Makefile:33: lib/libhts.a] Error 2

Update to gfakluge required

GFAKluge itself needs updates, and then we need to update svaha to handle said updates as we move vg to gfa2.

No option to output flat alleles

svaha doesn't support flat alleles, which we use for variant recall in vg right now. It would be nice to have it as an option, even if the new SnarlTraversal-based variant pipeline should remove our dependencies on nice, flat alleles.

Anaconda package

Hello,
would it be possible to have svaha available as a anaconda package?
Thank you in advance!
Andrea

SNPs and SVs cannot share a start position.

When inserting a SNP and an SV into the graph the two variants may not share a start position. This is because of the way we do indexing. We also have the problem that SNP IDs are non-contiguous in the graph and break the monotonic increasing nature of SV IDs. Further, SNPs currently can't start a contig graph.

ID space is not compacted

lasso uses the IDs as a map(basepair : node) in the graph. This unfortunately means that our ID space across multiple contigs quickly grows, and that there are collisions in the GFA output if we have multiple variant contigs (though we could construct each contig graph individually and then join them with a vg ids -j).

We should ideally maintain our map outside of ID space and then increment IDs by one every time a new node is added. This is in line with what VG does.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.