Giter Club home page Giter Club logo

svimmer's Introduction

svimmer - SV merging tool

Merges similar SVs from multiple single sample VCF files. The tool was written for merging SVs discovered using Manta calls, but should support (almost) any SV VCFs. The output is a VCF file containing all merged SV sites (with no calls). The output can be given as input into GraphTyper to genotype the sites.

Requirements

  • Python 3.4+
  • pysam

Usage

python3 svimmer input_vcfs chrA chrB chrC ...

where input is a list of tabix indexed+bgzipped VCF files and chromosomes are the chromosomes to merge. For further details see the help page:

python3 svimmer -h

Test data example

python3 svimmer test_vcfs chr20 > test/actual_output.vcf
diff test/actual_output.vcf test/expected_output.vcf

License

GNU GPLv3

svimmer's People

Contributors

hannespetur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

svimmer's Issues

svimmer

Dear,
Dose the svimmer generate SVs that can be detected by all softerwares ( intersection of SVs from different softewares), or merge all SVs of different files (union of SVs from different softewares)?
Many thanks!

Coordinate <= 0 detected

Dear developers.
When I use svimmer to merge the vcf created by manta. I found some warnings in the log file.

[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?
[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?

This is probably due to the position becoming from 0 after converting the INV variant type with the convertInversion.py script supported by manta.

Here is an example created by manta (diploidSV.vcf.gz)

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE1
 1       1       MantaBND:82572:0:1:0:0:0:0      G       [1:5440[G       279     PASS    SVTYPE=BND;MATEID=MantaBND:82572:0:1:0:0:0:1;CIPOS=0,2;HOMLEN=2;HOMSEQ=CT;BND_DEPTH=12;MATE_BND_DEPTH=14
 1       5438    MantaBND:82572:0:1:0:0:0:1      G       [1:3[G  279     PASS    SVTYPE=BND;MATEID=MantaBND:82572:0:1:0:0:0:0;CIPOS=0,2;HOMLEN=2;HOMSEQ=CA;BND_DEPTH=14;MATE_BND_DEPTH=12

After converting, the position changed to 0:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE1
  1       0       MantaINV:82572:0:1:0:0:0                <INV>   279     PASS    END=5439;SVTYPE=INV;SVLEN=5439;CIPOS=0,2;CIEND=-2,0;HOMLEN=2;HOMSEQ=GC;INV5     GT:FT:GQ:PL:PR:SR

This result may caused "[W::tbx_parse1] Coordinate <= 0 detected. Did you forget to use the -0 option?"
Does this warning have a bad effect on the merged.vcf.gz results?

Error when Merging VCFs

Hi,

I am trying to merge a list of VCFs of SVs using svimmer, but receive the following error:

Traceback (most recent call last): File "../../../svimmer/svimmer", line 140, in <module> header = read_header(vcf_f) # Read the header of the first VCF file File "../../../svimmer/svimmer", line 32, in read_header line = vcf_f.readline().decode("utf-8") AttributeError: 'str' object has no attribute 'decode'

Thanks

Warnings into STDOUT instead of STDERR

Dear Hannes,

First of all, thank you for writing and maintaining such great resources as graphtyper and svimmer!

I have noticed that svimmer writes warnings into STDOUT, messing up piping into bgzip / a VCF file if no variant is present for one of the chromosomes in one of the input VCF files. This is only a minor complication as warnings could be grepped out before bgzip.

Nevertheless, it would be nicer to stream into STDERR instead, by adjusting line 77 to:
print("Warning: Contig '%s' was not found in file %s!" % (chrom, vcf_filename.rstrip("\n")), file=sys.stderr)

Thanks and best,
David

Can you support vcf files of csi index

Dear author,
Some of my vcf files can only generate csi index and cannot generate tbi index due to the length of chromosome, so I would like to ask if you can optimize the code to support csi index

Support for .csi indexed vcfs

Hi Hannes,

I am trying to merge SV calls from marsupial chromosomes. Some of these are >> 512 Mb in length, and hence need to be indexed via tabix -C -p vcf. This doesn't create a .tbi index, but a .csi one instead (a bit more explanation here).

However, svimmer currently relies on .tbi inputs only - could you possibly at .csi support? Happy to provide you with log files & tests if this helps.

Many thanks,
Max

support SVIM VCF

Hi,

I found the SVTYPE for duplication annotated by three differet types in SVIM VCF: "DUP", "DUP:TANDEM", "DUP:INT". Maybe I need add info_dict["SVTYPE"] == "DUP:TANDEM" or info_dict["SVTYPE"] == "DUP:INT" to the following part highlighted by italic. Also, from the following codes, why did you treat the INV as the INS?
Join related SV types
if "SVTYPE" in info_dict:
if info_dict["SVTYPE"] == "DEL_ALU" or info_dict["SVTYPE"] == "DEL_LINE1":
info_dict["SVTYPE"] = "DEL"
elif info_dict["SVTYPE"] == "ALU" or info_dict["SVTYPE"] == "LINE1" or info_dict["SVTYPE"] == "SVA" or
info_dict["SVTYPE"] == "DUP" or info_dict["SVTYPE"] == "CNV" or info_dict["SVTYPE"] == "INVDUP" or
info_dict["SVTYPE"] == "INV":

info_dict["SVTYPE"] = "INS"
elif info_dict["SVTYPE"] == "TRA":
info_dict["SVTYPE"] = "BND"

Sincerely,
Zheng Zhuqing

format problem

i merge vcf by using svimmer, but the the format of DEL is strange, the REF is G while ALT is

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    10649859        .       G       <DEL>   0       .       END=10653389;SVTYPE=DEL

but I want to gained the normal format with DEL, for example:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
chr1    10649859        .      GTGTCTCATGTCCGCGTCCCGTGTCTC.....  G          0       .       END=10653389;SVTYPE=DEL

the REF is GTGTCTCATGTCCGCGTCCCGTGTCTC..... while ALT is G,how to gain it,thanks~

svimmer and graphtyper for forced genotyping of UNION of Manta and SVIM-ASM discovered SVs

Dear @hannespetur

Thank you and colleagues for the very nice svimmer and graphtyper software.

I would like to use svimmer and graphtyper for forced genotyping of the UNION of Manta ( many WGS) and SVIM-ASM (few assembly) discovered SVs in many WGS samples.

SVIM-ASM github
https://github.com/eldariont/svim-asm

The versions that I am using are svimmer/20211209 and graphtyper/2.7.3

When I try to get the (merged) UNION of SVs via svimmer I get this error.

Traceback (most recent call last):
  File "/tools/eb/software/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/tools/eb/software/Python/3.8.6-GCCcore-10.2.0/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/tools/eb/software/svimmer/20211209-GCC-10.2.0/svimmer", line 82, in append_svs_from_vcf
    svs.append(SV(record, check_type=not args.ignore_types, join_mode=args.join_mode, output_ids=args.ids))
  File "/tools/eb/software/svimmer/20211209-GCC-10.2.0/sv.py", line 75, in __init__
    assert False
AssertionError

svimmer/sv.py

Line 75 in f2d78b2

assert False

This is caused by svimmer not recognizing the DUP:TANDEM and DUP:INT types that SVIM-ASM outputs.

svimmer/sv.py

Line 41 in f2d78b2

# Join related SV types

I can use the svimmer argument --ignore-types to get svimmer to work.
But then graphtyper complains about Unknown SV type and I guess also drops the SVs of unknown type??

<warning> constructor.cpp:106 Unknown SV type DUP:TANDEM
<warning> constructor.cpp:106 Unknown SV type DUP:TANDEM

Would it be possible to add a mapping for DUP:TANDEM and DUP:INT in the main branch of the svimmer code here?

svimmer/sv.py

Line 41 in f2d78b2

# Join related SV types

Then the the combination of SVIM-ASM and svimmer/graphtyper would work for me and others with the same use case/combination of tools.

I also don't understand why SVs of type DUP, CNV and INV are mapped to type INS here

svimmer/sv.py

Line 45 in f2d78b2

elif info_dict["SVTYPE"] == "ALU" or info_dict["SVTYPE"] == "LINE1" or info_dict["SVTYPE"] == "SVA" or \

That does not make sense to me. INS is a novel sequence , DUP, CNV and INV are sequences already found on the reference genome and therefore also need to genotyped differently in graphtyper?

Also what I find strange is that both svimmer and graphtyper do output SVs of type DUP.
That I can't square with the mapping of DUP, CNV and INV to INS. Or maybe the SV type is re-calculated again somewhere else in svimmer/graphtyper?

Thank you for your thoughts and help on this.

join-mode

@hannespetur

Just a couple questions on join-mode...

What is the difference between the two join-modes? When should the non-default join-mode be used (eg: join_mode=false)?

Is the output different or is it related to some optimizing of the run-time?

--join-mode",
dest="join_mode",
action="store_true",
help="""Set if the merging should join VCFs from the first file to the other files."""

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.