illumina / paragraph Goto Github PK
View Code? Open in Web Editor NEWGraph realignment tools for structural variants
License: Other
Graph realignment tools for structural variants
License: Other
Running a test of paragraph and I got this error
$ python3 paragraph/bin/multigrmpy.py -i test.vcf \
-m HG03097.manifest.txt \
-r GRCh38_full_analysis_set_plus_decoy_hla.fa \
-o paragraph_test
Traceback (most recent call last):
File "paragraph/bin/multigrmpy.py", line 353, in <module>
main()
File "paragraph/bin/multigrmpy.py", line 349, in main
run(args)
File "paragraph/bin/multigrmpy.py", line 249, in run
raise Exception("Illegal header name %s. Allowed headers:\n%s" % (field, header_str))
Exception: Illegal header name depth sd. Allowed headers:
id,path,idxdepth,depth,read length,sex,depth variance
In the README, "sd depth" is one of the options, is this "idxdepth?" in the allowed headers? If I change "sd depth" to "idxdepth" the program begins to run.
Thanks!
I'm trying to install Paragraph on Linux CentOS 6.9. I am currently using gcc 5.4.0, though I get the same error with 5.1.0.
After running cmake (version 3.8.2) successfully, I get the following error when running 'make' to compile.
Scanning dependencies of target graphIO
[ 64%] Building CXX object external/graphtools-build/src/graphIO/CMakeFiles/graphIO.dir/GraphJson.cpp.o
In file included from /home-4/[email protected]/bin/packages/paragraph-tools-build/external/graphtools-src/src/graphIO/../../include/graphIO/GraphJson.hh:30:0,
from /home-4/[email protected]/bin/packages/paragraph-tools-build/external/graphtools-src/src/graphIO/GraphJson.cpp:25:
...
/home-4/[email protected]/bin/packages/paragraph-tools-build/external/graphtools-src/src/graphIO/../../external/include/nlohmann/json.hpp:17216:25: required from here
/home-4/[email protected]/bin/packages/paragraph-tools-build/external/graphtools-src/src/graphIO/../../external/include/nlohmann/json.hpp:8678:43: error: logical ‘and’ of mutually exclusive tests is always false [-Werror=logical-op]
const bool is_negative = (x <= 0) and (x != 0); // see issue #755
^
cc1plus: all warnings being treated as errors
make[2]: *** [external/graphtools-build/src/graphIO/CMakeFiles/graphIO.dir/GraphJson.cpp.o] Error 1
make[1]: *** [external/graphtools-build/src/graphIO/CMakeFiles/graphIO.dir/all] Error 2
make: *** [all] Error 2
I found issue #755 in the json repo but they seem to have fixed the issue in release in 2017, and it seems to be related to using the Intel icpc compiler, which I am not using. It does appear that perhaps it also appears with gcc 5.2; I tried using gcc 4.9.2 instead of 5.1.0 or 5.4.0, but gcc 4.9.0 yielded other errors, before this point in the compilation process. If this is related to gcc version, what version(s) has Paragraph been successfully compiled with?
Will include this fix in v2.2c
Can paragraph be used for indel from 2 bp to 30 bp?
For non-symbolic alleles Paragraph seems to be stripping the END tag from the INFO field (see below). This isn't desired behavior as it can impact tools that rely on this tag (for example vcfToBedpe).
Preserve the original info fields as they were in the input and only append the GRMPY_ID tag.
Manta
chr1 66160 MantaDEL:8:0:0:0:1:0 TTATATATATATATATTATATATACTATATATTTATATATATTACATATTATATATATAATATATATTATATAATATATATTATATTATATAATATATAATATAAATATAATATAAATTATATTATATAATATATAATATAAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAAATATAATATATAAATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTATAATATATATTATATAAATATATTTATATATTATATAAA T 196 PASS END=66613;SVTYPE=DEL;SVLEN=-453;CIGAR=1M453D;CIPOS=0,9;HOMLEN=9;HOMSEQ=TATATATAT
Paragraph
chr1 66160 MantaDEL:8:0:0:0:1:0 TTATATATATATATATTATATATACTATATATTTATATATATTACATATTATATATATAATATATATTATATAATATATATTATATTATATAATATATAATATAAATATAATATAAATTATATTATATAATATATAATATAAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAAATATAATATATAAATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTATAATATATATTATATAAATATATTTATATATTATATAAA T 196 PASS SVTYPE=DEL;SVLEN=-453;CIGAR=1M453D;CIPOS=0,9;HOMLEN=9;HOMSEQ=TATATATAT;GRMPY_ID=chr1.vcf@a66f377e14617d867835ed906c5d6b272b1c404e2263781380e6c6c1da4e9267:1 GT:DP:FT:AD:ADF:ADR:PL 0/0:54:PASS:119,0:70,0:49,0:0,167,781
I have a VCF of SVs in GRCh38, but I need to genotype a number of samples mapped to GRCh37. Remapping the samples is not an option. I converted my original variants from VCF to BED by just keeping the CHROM, BEGIN and END (from INFO) fields. I then lifted the BED file using UCSC and updated the VCF file with new coordinates (only BEGIN and END will change).
Paragraph could genotype the original calls without problem but it crashes when trying to genotype the lifted. I created a new VCF file with only 10 of the SVs and passed that to Paragraph to try to figure out what's wrong. Here's the error I get:
2020-10-13 16:44:49,884 WARNING chr1:114350134 Padding base in genome is different from VCF. Use the one from genome.
2020-10-13 16:44:54,586 ERROR Traceback (most recent call last):
2020-10-13 16:44:54,586 ERROR File "/share/binaries/Paragraph/bin/multigrmpy.py", line 315, in run subprocess.check_call(commandline, shell=True, stderr=subprocess.STDOUT)
2020-10-13 16:44:54,587 ERROR File "/software/anaconda3/4.5.12/lssc0-linux/lib/python3.6/subprocess.py", line 311, in check_call raise CalledProcessError(retcode, cmd)
2020-10-13 16:44:54,587 ERROR subprocess.CalledProcessError: Command '/share/binaries/Paragraph/bin/grmpy --response-file=/tmp/tmpzmne8osh.txt' returned non-zero exit status 1.
Traceback (most recent call last):
File "/share/binaries/Paragraph/bin/multigrmpy.py", line 353, in <module>
main()
File "/share/binaries/Paragraph/bin/multigrmpy.py", line 349, in main
run(args)
File "/share/binaries/Paragraph/bin/multigrmpy.py", line 315, in run
subprocess.check_call(commandline, shell=True, stderr=subprocess.STDOUT)
File "/software/anaconda3/4.5.12/lssc0-linux/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '/share/hormozdiarilab/Codes/NebulousSerendipity/binaries/Paragraph/bin/grmpy --response-file=/tmp/tmpzmne8osh.txt' returned non-zero exit status 1.
The SVs being genotyped (original coordinates):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GENOTYPE 2:GENOTYPE 3:GENOTYPE
chr1 59605 INS0000 C <INS> 30 . END=59605;SVTYPE=INS;SEQ=tttcttttttttttttttttttttttgaggagttccttgtcgccgctgggtggcggcgcgattgctcctgcagctccgcccccgtccccattcctgcctcgcctcccaagtactggactcagcgccccctcgcccggctaatttttgtatttttagtaagacgtttccgtttagcggggttcgatctctgacttcgtgtcctccgcctcgctcccagtgtgattacagCTGACCACCCCCCCAG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 181325 DEL0000 G <DEL> 30 . END=181448;SVTYPE=DEL;SEQ=GCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGAGGAGGCGTGGCACAGGCGCAGAGACACATGCTAGCGCGCCCAGGGGAGGAGGCGTggcgcaggcgcagagaggcgcgCCGTGCTG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 191407 INS0001 A <INS> 30 . END=191407;SVTYPE=INS;SEQ=TGTCTCTAGCACCTGGGATGGGCCTGATGTGTAACAGCTGCTGGCTGAACAGAAAGtgacagatgagcaaacatctcaaggaggtgatgaggatggtgatgagtgagaactcccgacatgtgaagataactgaagatgttctggctaaagatccgaagactctaagaatatgatcattccctttgaatatcaaatatcaaaagggctgtcaggtggagaagtgagtaaacttgtatcagaatagcggcagagTTGCAAGGAAACAGATCTCTGTTCTGTTAAAAAAAAAAAATTCCATAAACAACTGCATCACTTTGCATAGCAATTAGGTTCCAGCTCACAAGCGCCTTCCGGGGTGCCCCAAGGGTGAATCCTGCTAAGGTGGAGGTAGAAGACATGACCCTGGGGCTCTTTCCTTAGCCAAGAGCCCATGAGACTAAGGAACATCGTGCTTGTTGACAAAGACCCCGGACAGTCTATTCTCTTACGGTCACAGGCTATGGTGCCAAGGACAAGTGCAGACTCAGGATCAGAAAGCTTGCAGCATATCTGCTATCTCCATGGATAGCAGGATGGTCTGGAAGGCTGTGTCGGAAGGCCCTTAGGCCTCACTGGGGCCAGGCCGTTGATGAACAATGTCCACCCTGAGGGTCGGGAATGGTGCCATTTGTTTGTCATTCCTGGTCCAGACGCCCTTGGCTTGGTGGCTACTCAAGTAGGTCAGTTTACAAGCTCAGTGCTGAACCCATACCCTATGGCACGCTCGCCAGCACTAGAGAGGAAGCTGCCTCTGTGGACATCAGGGACGGAAGTGGCTCACCCAGCCTGTTCTGCGCGTGTCTCACTAAGGGTCCATCTTCCTCTATCTGCCCCGGAGGGGACCATCTCCAAGCATCCCTTGCTTTCCTTCTCCCCCCTCCACCCTCACTGTTCAATAACTTGAGTGCATCCCATTTGTAGAGCACATGCTGGGCCGTGGAGTGAAAGACAATCAGACGACACACATCCACATTCAAAGGGACTCAGGGCTCCTGGGAAGTAGAAATGAATATCAATAACCAAACATCCCacagcctgggtttcacatctgtttagcagcaggtgaccctggggaggtcactaactggtctctgcctcagcttcttccactgaaaaacaggaatggtcccttctacatcatggatactgtgaggTGAGAAGGAGCTGATCATGGCCCATCAACCTCAGCACACCAGTCCCCCTAGAGGCTGCTGGGAGAAGAAGCAGGGAGCACCCACTCCTGACCCAGATTCACATTCACTGCTCTCCTCCCCTGCCTCTGTCATGACCCCAGGGAAGCAGACGCTGAACCTGGGCTCTTGCCTTCATCTTTATCTTCTCCACTCTGGGATAATTAAGAATGACTTGCTAATTATGCAGATCTAGTGCAATGTGTAACTTCGGGCCACCAGTGCCAATCAGTAGAGCGGAGATGACGaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaTCAAATCAATTTAAAAAACAATAAACTCCACCACCTCCCCCCTCACCCTCCCGTCATCTGCACTGATTTGTTCTCCCGGGAGCTGGAGAGGAGGGGGGGGGGGCAGCG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 710579 INS0002 T <INS> 30 . END=710579;SVTYPE=INS;SEQ=AAAGAACTGCCCGCCggcgcggtggctcacgcctgtaatcccagcactttgggaggccgaggcgggcggatcacgaggtcaggagatcgagaccatcccggctaaaacggtgaaacccgtctctactaaaaatacaaaaattagccgggcgtagtggcggcgcctgtagtcccagctacttgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagccgagatcccgccactgcactccagcctgggcgacagagcgagactccgtctcaaaaaaaaaaaaaaaaaaaaaaaaaa;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 732377 INS0003 A <INS> 30 . END=732377;SVTYPE=INS;SEQ=GACAGAGAGTAAAAAGAGAAATTAGGAAAGCATTCTACATGTTGAATAGGAAGACACTGGCCATGTTCGTGCAGCAGCAGTATGTCGTGACATGACATACCTTGGAGAGAAGTTAACAGATGAGGAAGTTGATAAAAATCATCAGAGAAGCAAAATACTGGTAGCGACACTCAAGTAAACCATGAAATTTCCATAACTTATGTCAGCAAAGTGGGAATATTGTACAGTGTGTGTTGAAGTTCCTATACAACATTGTTTATCTGCCTTTTGTTTGTTTGTAAGGAATGTACATACTAAAAGTTCTTCTTGCTGTCAAAAGAATATGCGTGAATAAGTCATTTTAACTTATTCTTCTGTTTTTCTTTTATCTTCCTGCCATCATCCCACAGCCTTACTTTAGAAATTTCTTTTTTAGAAAATTGAACAAGTGCTCCCTGTGGTGGCACATACCTCGAGGAtgggaggcagggtggaagggtcacttgaggccattagtttgacaccagcctggccaacaaagtgagaccccgtgtctacaaacaatttaaaaattagccaagtatcgtcatgtatacctacagtcccagctaTCTGAACTTACTGAGAATGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGGCTCATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCTGGGAGGTAGGAGCTGGGCCAAAAgatgtaagcacatttgcatttattaggcactttatttgcattattacactgtaatatataataaaataattatagaactcaccataatgtagaatcagtgggcgtgttaagcttgttttcctgcaactggatggtcccacctgagcgtgatgggagaaagtgacagatcaataggtattagattctcataaggacagcgcaacctagatccctcacatgcacggttcacaacagggtgcgttctcctatgagaatctaacgctgctgctcatctgagaaggtggagctcaggcgggaatgtgagcaaaggggagtggctgtaaatacagacgaagcttccctcactccctcactcgacaccgctcacctcctgctgtgtggctccttgcggctccatggctcaggggttggggacccctgCTCAAGTGCATCCAAAGCGACCCTTCCCACACCAGTCTTCACAGTGGTCAAGGGCAGCAACCACTTAGCTCCCAAGGCATGTGCCTCAGCTGGCATTTCGTCACAATCAACAGTAAGTGGTAGCTTGAGTCACTGTGAGGTCACCTACTGGAAATCACCAGCATCCCATTTCCCACTGGCAAAGAGCTCAGCACTGCCCCCGGGAAACCAAACCTATGCCCAAATCCATCTGTGTGGGTGTATCTCCTGGGACCCTTCCTAACAtattagtcagagtccaatcaggaagcataaaccactcaaaagtttaaagtggtaaaatttaatacagagaattattcattgtaacaggtgaacagcataatgagagattggctagcacaaagtaaacagaactctagagaatataggactagcCCAggccaggcatggtggctcaggcctgaaattccagcaatttgagaagctaatgcaggaggattgcttaaggccaggagctagagaccggtctggacgacacagtgagaccctgtctctatccaaaagaagaaaaaagttagctgggggtggtagtgcacacttgtagtcccagctactcggaatgcggaagtttgagcctgggaggtcaaggctgcagtgaggcatgattatgccactacagtccagcctggtgacagagcaagaccctgtctcaaagaacaaaaCAACAACAACCATTTACAGACAGAAAAGAAATAGAGCTAATAAGCTGAGGAAAGATGTTgaaatgtgacaagtaaagtaatatgagttcttttgtctatgtaaaataatcaaacaaaaaatgacttactaaattataataccctgtgctggcaaaggtgcagtgaaatgggcaccttcttatactatgaggggtgtttaaattgtgtataagccttcccgggtaaagcctgtcaattttttaaaataatggagacagggtctcaccatactgccatactgcctcctccaactcttggcctcaagcaatcctcctctcttagcctcccaaagtgctaagattatagctgggaggcaccCAAAACCCTGTCAATTTACATCAAGGGTAAGGAGAATGTCCATTCACCATGACTCACAGTAATCTTACTTCTGGGGAGACAATTCAATCTAAACAAAAGGTCATCTGTACACACACAGTAAAAATCTGGGAGTAACTGAAGACAGAGTTGGTAAGTGAAATAAGAAACAGTTATAAGAAATTAAACTATGGTATCAATAGGCACCTGGTAAAAGGTCAGTTGATGTTAGCTGCTACttttttgttgttttgagacagggtctcactctgtcacccaggctggagtgcagaggcctgatcatgactcactgcagtctcagcctccctgggctcaagtgatcctcccacctcagcctcccaagtagctgggactacaggaacatgccaccacactaggctaattcatgtatttttctgtagggatggtgactccccctttgtttccaaggcctatcgcaaactcttggcctcaagccatcctcctgcctcagcctcccaaagtgttgcgattaccagtgtgagccaccacacctggccAGCTGCTACTTTTATCAATATTATTCTTATTCCACTCAATTAAAAATTATTATTTTCAAGGCTATGCAACAGTATGTATCCCACAGCATAATTGTAAAAACATATAGTCgtcgtccctcagtatacagaattagttccagccccccatctctgcatataccaaaatccatgcttactcacgtttcgctgtcacccctctagaatccacgtatacgaaaattccaaatgttagttgggcatagtggcaagcacctgtagtctcagccacgtgggaggttgaggtgggaggatcgcttcagcctggaaggttgaggctgcagtcagctgcgatagcactactacactccagccttggacaacagagggagaccctgtctcagaaaaaaaacaaaataaaaCAGGTTAGAAATTGTAATGAGGTCTGCTGGGCAAAATTCCATATAAGCAAAGTATAAATTAATAAAGCAAATCGTGATAAATTAGTACGATTGACTTTCTGGAGTTTCTGACAATAAAAGTAAGGAAAATGCAGAACACAAA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 737102 INS0004 G <INS> 30 . END=737102;SVTYPE=INS;SEQ=GGCAGCAACCACTTAGCTCCCAAGGCATGTGCCTCAGCTGGCATTTCGTCACAATCAACAGTAAGTGGTAGCTTGAGTCACTGTGAGGTCACCTACTGGAAATCACCAGCATCCCATTTCCCACTGGCAAAGAGCTCAGCACTGCCCCCGGGAAACCAAACCTATGCCCAAATCCCATCTGTGTGGGTGTATCTCCTGGGACCCTTCCTAACAtattagtcagagtccaatcaggaagcataaaccactcaaaagtttaaagtggtaaaatttaatacagagaattattcattataacaggtgaacagcataatgagagattggctagcacaaagtaaacagaactctagagaatatggactagcCCAggccaggcatggtggctcagcctgaaattccagcaatttgagaagctaatgcaggaggattgcttaaggccaggagctagagaccggtctggacgacacagtgagaccctgtctctatccaaaagaagaaaaaagttagctgggggtggtagtgcacacttgtagtcccagctactcggaatgcgaagtttgagcctgggaggtcaaggctgcagtgaggcatgattatgccactacagtccagcctggtgacagagcaagacctgtctcaaagaacaaaacaacaacaaCCATTTACAGACAGAAAAGAAATAGAGCTAATAAGCTGAGGAAAGATGTTgaaatgtgacaagtaaagtaatatgagttcttttgtctatgtaaaataatcaaacaaaaaatgacttactaaattataataccctgtgctggcaaaggtgcagtgaaatgggcaccttcttatactatgaggggtgtttaaattgtgtataagccttccgggtaaagcCTGTCAATTTTTTAAAATAAtggagacagggtctcaccatactgccatactgcctcctccaactcttggcctcaagcaatcctcctctcttagcctcccaaagtgctaagattatagctgggaggcaccCAAAACCCTGTCAATTTACATCAAGGGTAAGGAGAATGTCCATTCACCATGACTCACAGTAATCTTACTTCTGGGGAGACAATTCAATCTAAACAAAAGGTCATCTGTACACACACAGTAAAAATCTGGGAGTAACTGAAGACAGAGTTGGTAAGTGAAATAAGAAACAGTTATAAGAAATTAAACTATGGTATCAATAGGCACCTGGTAAAAGGTCAGTTGATGTTAGCTGCTACttttttgttgttttgagacagggtctcactctgtcacccaggctggagtgcagaggcctgatcatgactcactgcagtctcagcctccctgggctcaagtgatcctcccacctcagcctcccaagtagctgggactacaggaacatgccaccacactaggctaattcatgtatttttctgtagggatggtgactccccctttgttccaaggcctatcgcaaactcttggcctcaagccatcctcctgcctcagcctcccaaagtgttgcgattaccagtgtgagccaccacacctggccAGCTGCTACTTTTATCAATATTATTCTTATTCCACTCAATTAAAAATTATTATTTTCAAGGCTATGCAACAGTATGTATCCACAGCATAATTGTAAAAACATATagtcgtcgtcctcagtatacagaattagttccagccccccatctctgcatataccaaaatccatgcttactcacgtttgctgtcacccctctggaatccacgtatacgaaaattccaaatttagttgggcatagtggcaagcacctgtagtctcagccacgtgggaggttgaggtgggaggatcgcttcagcctggaaggttgaggctgcagtcagctgcgatagcactactacactccagccttggacaacagagggagaccctgtctcagaaaaaaaaaaaaataaaaCAGGTTAGAAACTGTAATGAGGTCTGCTGGGCAAAATTCCATATAAGCAAAGTATAAATTAATAAAGCAAATCGTGATAAATTAGTACGATTGGCTTTCTGGAGTTTCTGACAATAAAAGTAAGGAAAATGCAGAACACAAAGACAGAGAGTAAAAAGAGAAATTAGGAAAGCATTCTACATGTTGAATAGGAAGACACTGGCCATGTTCGTGCAGCGGCAGTATGTCGTGACATGACATACCTTGGAGAGAAGTTAACAGATGAGGAAGTTGATAAAAATCATCAGAGAAGCAAAATACTGGTAGCGACACTCAAGTAAACCATGAAATTTCCATAACTTATGTCAGCAAAGTGGGAATATTGTACAGTGTGTGTTGAAGTTCCTATACAACATTGTTTATCTGCCTTTTGTTTGTTTGTAAGGAATGTAATACTAAAAGTTCTTCTTGCTGTCAAAAGAATATGGTGAATAAGTCATTTTAACTTATTCTTCTGTTTTTCTTTATCTTCCTGCCATCATCCCACAGCCTTACTTTAGAAATTTTTTTTTTAGAAAATTGAACAAGTGCTCCTgtggtggcacatgcctcgaggatgggaggcaggggtggaagggtcacttgaggccattagtttgacaccagcctggccaacaaagtgagaccccgtgtctacaaaacaatttaaaaattagccaagtatcatcatgtatacctacagtcccagctacCTGAACTTACTGAGAAAGTTCAGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCTCATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGCCTGGAGAGGCTGCCGGGAGGTAGGAGCTGGGCCAAAAgatgtaagcacatttgcatttattaggcactttatttccattattacactgtaatatataataaaataattatagaactcaccataatgtagaatcagtgggcgtgttaagcttgttttcctgcaactggatgtcccacctgagcgtgatgggagaaagtaacagatcaataggtattagattctcataaggacagcgcaacctgatccctcacatgcacggttcacaacagggtgcgttctcctatgagaatctaacgctgctgctcatctgagaaggtggagctcaggcgggaatgtgagcaaaggggagtggctgtaaatacagacgaagcttccctcactccctcactcgacaccgctcacctcctgctgtgtgctccttgcggctccatggctcaggggttggggacccctgCTCAAGTGCATCCAAAGCGACCCTTCCCACACCAGTCTTCACAGTGGTCAA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 814625 INS0005 T <INS> 30 . END=814625;SVTYPE=INS;SEQ=GGAAATGTTAATTCTGAAAATAGGTTTCACATCTTTTTTTTAACTTATATAAAATTGACTGGATTTCTCTTCTGTGTGTTGTGTTAGATATTTAGGA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 831217 DEL0001 T <DEL> 30 . END=833736;SVTYPE=DEL;SEQ=TTGtcttatgtttaaaaatgtccttcagtcattgcaggtcacaagcaggctatcagctcagtaattaaaataattcggttcttcatagtgaatgtaattctaaattagattttaagttgtaactccctgcttcagcAATGGTGATGGGGCCTAGAAACCAGAGCACCTGAGCTCCATCCTACAGGGGGCCATACCGGGATCTTTCCATTTTCAGAGGCTTCTCTCTGACAGTGAAGTGTGATGACAGACTTGGGGGCAGGGCAATGGCTAGCTTCTGAAAGCCGCTGGCACTTTAGTGATAAATTTAAATTAAGTGACGGGTAGTGAGGTGTTTGTCAAGGAAAGTGCCGTCCAAATGCTAAATACTGATTATTTCTGCAGCAGTGACTGCAATACCTCACTCAATCTCTGTCTTTCTTGAAGAAGTCATAAATAAACACGATGAATCTATGTAGAAGCGGTAAGTCAGAAAAATCTGTGTGTTTCATTACATAAACAACGGTTTATCATTAATTGACAGGCTTGGATTGGGAGTTGTTAATGAAACTGATGAGATGTTGGACAGATGAGCTCCCTCTTATTTCGAAGAGCTTATCTAGGGCTGAGTCATGGGACCTGATAGCGTCTTGTGGTGCTGTCTTCTTGTAGATATATCCGTGTTTTAGAGGATTTAGTTTTTTAAAATTTCTCTTAGAATGTGAATTTTACAAAAAAGCACTTCCCAAATGGATGATTATTTGAAAAATGAATTGTCAGACAAAACTGACACATCAGTTATGGAGAAAACCCTTCAAGAACTGGCTTTAAATGTGTTTTAGTGGGAGCCACAGTGTGGAGAGAAACAGAAGAGGGAGGAGAGGGCGCCCCTTGTTTCTTCTCTCCACAGCCAGGCCTTCGCCACCTTTCTCAGTGTCTTCAAGAATAAAATGCCTCCGTTGTTGGTTTTAGCTGCTTTTCTCCCTCGGGGTAGGTAAAGTGGTTCCAAAACGACAAGCATCCTGTAAAGTCGGAAGAGCTGTGTCAACATTAAGCTGCGTGACTTTGGCTATGAGGGAAAAAAGGCTGGTGAGTGCAGAGAAGACAGAGCTGTGGCAGGGCTCCTCCCGCCAAGTCGCCATGGAGAGGGGCTGTGAGGTGTCCTTAAACGGCCTGGTCTCCAGGGTGACTCAGGAAGGGCTGAGAGTGGTCAGCTCCCTCACCTGCTAAACCCGCAGCGCCCCGCTCAGCACACACCCTCCACTCTCCAACCTTGCCCAAGTGCTGGTCCGTCACGGCACCAGGACAGGGCATGGAGACTTGGGCTGAttcttttctctcccttcctccctcttttttttcttctctcactcctccttttcctttcctgctgtttcctgctctcctgtttctGTCCTGCAGTGTCTGGAGCTCCAGAGAGGCTGGCCCTGGGGTGGGGTCCACATGGACATGGGCGTAAGCAGGTTTGATGGTCATGGGCATAGGCAGGTTCGATGGCCAGAGTTCTTTCAGCTCACAGTAAgttttgttttgttttgttttgttttgttttgttttgttttgttttagatggagtcttgctttgtcgcccaggctgtagtgcagtggcgtgatcttggctcactgcagcctccaccttagagcaatcctcttgcctcatcctcccgggtagttgggactacatgtgcatgccacatgcctggctaatttttgtatttttagtagagacacggtttcaccatgttggccaggctggtgtccaactcctgacctcaggtgatccatccgcctcagcctcccaaagtgccgggattacaggtatgagccactgcacctggccTCAGCTGACAGTAGGTTTTAGAGCCAGATATTTACACACTAACTTGCCAGAAACATATATGACTTTATTATTCTAATTGATTTTAAGAGATATTATGAACTCAAATCCAAAGTTACGTCCCACCTATCATGACAATTTCATTAAGGAAAAAGTCAAACCATTTTGGAAATGATTTAAGTGAGCAACTTGGAAAAATTTTCTACATTCCTAACTTACTTTCCAGGGGATCGTTCCTGACTTAACATCTATCAGGTGTCTTAGCTTAGCTCTCTTTTTACTTCAGGTTTTTCTTGCCTCCTCAGTGTGCTGGGAGTCCCACTCCACTCAAATGCCCTCAGGTCTAATAATTAACTTCATTGCAGGCTCCTGGCAGGCCTGGGTGGGCGGCAGCTGCATTGTGCTCCTGAAGAAGATTTAAGTTGGGTTTGGTGAACTGGTAGAATTTGCATTTTGCTGTTTCTTTCCCTCTCCCAGAATTTGTACCTTTAAATAGGTTTTTTAGTGTCATTAAGTATATCAAAAGGAAACCCAGTGGGGCAAATTGGCCGGGCTccatagaggtggccttgtctaagcctttcatcttatcgataaggaaagacaggaccagagaagtCGCCGACTGTCCCTGGTCCCACTGCTTGGTTTGGGGCAATTTCCTGAAAATAATATCCAAGATGCA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 1195963 INS0006 A <INS> 30 . END=1195963;SVTYPE=INS;SEQ=TGGGGTCTCACCATGTTGGCCAGGCTGgtctcaaactcctgagctcaagcgatcctcctgcctcagcctcccaaagtgctgggactacaggtgtgagccatgcgcccgaccaatttgtgtatttttagtagagatggggtctcaccatgttggccaggctggtctcaaactcctgagctcaagcgatcctcctgcctcagcctcccaaagtgctgggactacaggtgtgagccacgcgcctgaccAACTTGTGTATTTCTAGTAGAG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 1240675 INS0007 C <INS> 30 . END=1240675;SVTYPE=INS;SEQ=CAGCcccccgcccccattcaccccggccgtggtccctgccccagcccccgccgcccccattcaccccggccgtggtccctgccc;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 1248055 DEL0002 A <DEL> 30 . END=1248319;SVTYPE=DEL;SEQ=GGCTGGATCTCCAACTCTGACCTACAGGCAGGAAAGTGGGCAGCCCTGGGAGGCTGGACTGAGGGAGGCTGGACTTCCCACTCAGGCCTACACGCAGGAAAATGGGCAGCCCTGGGAGGCTGGACCGAGGGAGGCTGGGCCTCCCACTCCACCCTACAGGCCAGGACACGGGCAGCCCTGGGAGGCTAGACCGAGGGAGGCTGGGCCTCCCATCTACCCTACAGGCCGGGACACAGGCAGCCCTGGGAGGCTGTACCGAGGGAG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 1477854 INS0008 C <INS> 30 . END=1477854;SVTYPE=INS;SEQ=CccaccacgcctggctaatgttgtattttagtagagacggggtttctccatgttggtcaggctggtctctaactcccgacctcaggtgatccacccgcctcggcctctcaaactgttgggattacaggcatgT;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 1494665 INS0009 A <INS> 30 . END=1494665;SVTYPE=INS;SEQ=TGGTGTGCTGCTGCCCCTGCACCCCGTGAGATGAATCCTGCCTCTGGGAGGTACAGCTTCCTGGAGGGGTGGCCCTGTGAGCATCTGCGTAGCCCCTCTCCTCTGCTGGGCCCTGGGTGACGTGCAGCCACTCGGGTGGACCCTGAGGGTCCCTGCACCTGTTTGCCCTCTCTTGGGTGGGCTCAAGACCAAAAATGATGTTGAGCAGTCCTGGGCCCCTGAGCCACAGTGGCGGTGCGGCTCCGGTCAGTGTCTCCTGCGCTCCCGGGCCCCCGACCCACAGTGGCGGTCCGGCTCTGGTCAGTGTCTCCTGCGCTCCCGGGCCCCCGACCCACAGTGGCGGTCCGGCTCCGGTCGGTGTCTCCCCACACAGTGGCTCTTGGCGAGGGGTGGGCGCTGGCAGAGGGGACGGGCACCACGTGGTCATCCCCATGACAGGTTCTGTCATGGTGACAGTGTTGTGGAGGA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
chr1 1565684 INS00010 T <INS> 30 . END=1565684;SVTYPE=INS;SEQ=GGTGCAGGCAGAGAACAGACGTCGCGATGGGCCCGACGGTGCTGGCTCCATGGGAACCGAGACCCAACACCCAAAGGAGTCCCACAGGCTCAGGGG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./. ./. ./.
Lifted to GRCh37 will look like:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GENOTYPE
chr1 59605 INS0000 C <INS> 30 . END=59605;SVTYPE=INS;SEQ=tttcttttttttttttttttttttttgaggagttccttgtcgccgctgggtggcggcgcgattgctcctgcagctccgcccccgtccccattcctgcctcgcctcccaagtactggactcagcgccccctcgcccggctaatttttgtatttttagtaagacgtttccgtttagcggggttcgatctctgacttcgtgtcctccgcctcgctcccagtgtgattacagCTGACCACCCCCCCAG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 114350134 INS0001 A <INS> 30 . END=114350134;SVTYPE=INS;SEQ=TGTCTCTAGCACCTGGGATGGGCCTGATGTGTAACAGCTGCTGGCTGAACAGAAAGtgacagatgagcaaacatctcaaggaggtgatgaggatggtgatgagtgagaactcccgacatgtgaagataactgaagatgttctggctaaagatccgaagactctaagaatatgatcattccctttgaatatcaaatatcaaaagggctgtcaggtggagaagtgagtaaacttgtatcagaatagcggcagagTTGCAAGGAAACAGATCTCTGTTCTGTTAAAAAAAAAAAATTCCATAAACAACTGCATCACTTTGCATAGCAATTAGGTTCCAGCTCACAAGCGCCTTCCGGGGTGCCCCAAGGGTGAATCCTGCTAAGGTGGAGGTAGAAGACATGACCCTGGGGCTCTTTCCTTAGCCAAGAGCCCATGAGACTAAGGAACATCGTGCTTGTTGACAAAGACCCCGGACAGTCTATTCTCTTACGGTCACAGGCTATGGTGCCAAGGACAAGTGCAGACTCAGGATCAGAAAGCTTGCAGCATATCTGCTATCTCCATGGATAGCAGGATGGTCTGGAAGGCTGTGTCGGAAGGCCCTTAGGCCTCACTGGGGCCAGGCCGTTGATGAACAATGTCCACCCTGAGGGTCGGGAATGGTGCCATTTGTTTGTCATTCCTGGTCCAGACGCCCTTGGCTTGGTGGCTACTCAAGTAGGTCAGTTTACAAGCTCAGTGCTGAACCCATACCCTATGGCACGCTCGCCAGCACTAGAGAGGAAGCTGCCTCTGTGGACATCAGGGACGGAAGTGGCTCACCCAGCCTGTTCTGCGCGTGTCTCACTAAGGGTCCATCTTCCTCTATCTGCCCCGGAGGGGACCATCTCCAAGCATCCCTTGCTTTCCTTCTCCCCCCTCCACCCTCACTGTTCAATAACTTGAGTGCATCCCATTTGTAGAGCACATGCTGGGCCGTGGAGTGAAAGACAATCAGACGACACACATCCACATTCAAAGGGACTCAGGGCTCCTGGGAAGTAGAAATGAATATCAATAACCAAACATCCCacagcctgggtttcacatctgtttagcagcaggtgaccctggggaggtcactaactggtctctgcctcagcttcttccactgaaaaacaggaatggtcccttctacatcatggatactgtgaggTGAGAAGGAGCTGATCATGGCCCATCAACCTCAGCACACCAGTCCCCCTAGAGGCTGCTGGGAGAAGAAGCAGGGAGCACCCACTCCTGACCCAGATTCACATTCACTGCTCTCCTCCCCTGCCTCTGTCATGACCCCAGGGAAGCAGACGCTGAACCTGGGCTCTTGCCTTCATCTTTATCTTCTCCACTCTGGGATAATTAAGAATGACTTGCTAATTATGCAGATCTAGTGCAATGTGTAACTTCGGGCCACCAGTGCCAATCAGTAGAGCGGAGATGACGaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaTCAAATCAATTTAAAAAACAATAAACTCCACCACCTCCCCCCTCACCCTCCCGTCATCTGCACTGATTTGTTCTCCCGGGAGCTGGAGAGGAGGGGGGGGGGGCAGCG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 645959 INS0002 T <INS> 30 . END=645959;SVTYPE=INS;SEQ=AAAGAACTGCCCGCCggcgcggtggctcacgcctgtaatcccagcactttgggaggccgaggcgggcggatcacgaggtcaggagatcgagaccatcccggctaaaacggtgaaacccgtctctactaaaaatacaaaaattagccgggcgtagtggcggcgcctgtagtcccagctacttgggaggctgaggcaggagaatggcgtgaacccgggaggtggagcttgcagtgagccgagatcccgccactgcactccagcctgggcgacagagcgagactccgtctcaaaaaaaaaaaaaaaaaaaaaaaaaa;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 667757 INS0003 A <INS> 30 . END=667757;SVTYPE=INS;SEQ=GACAGAGAGTAAAAAGAGAAATTAGGAAAGCATTCTACATGTTGAATAGGAAGACACTGGCCATGTTCGTGCAGCAGCAGTATGTCGTGACATGACATACCTTGGAGAGAAGTTAACAGATGAGGAAGTTGATAAAAATCATCAGAGAAGCAAAATACTGGTAGCGACACTCAAGTAAACCATGAAATTTCCATAACTTATGTCAGCAAAGTGGGAATATTGTACAGTGTGTGTTGAAGTTCCTATACAACATTGTTTATCTGCCTTTTGTTTGTTTGTAAGGAATGTACATACTAAAAGTTCTTCTTGCTGTCAAAAGAATATGCGTGAATAAGTCATTTTAACTTATTCTTCTGTTTTTCTTTTATCTTCCTGCCATCATCCCACAGCCTTACTTTAGAAATTTCTTTTTTAGAAAATTGAACAAGTGCTCCCTGTGGTGGCACATACCTCGAGGAtgggaggcagggtggaagggtcacttgaggccattagtttgacaccagcctggccaacaaagtgagaccccgtgtctacaaacaatttaaaaattagccaagtatcgtcatgtatacctacagtcccagctaTCTGAACTTACTGAGAATGTTCAGGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGGCTCATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGGCCTGGAGAGGCTGCTGGGAGGTAGGAGCTGGGCCAAAAgatgtaagcacatttgcatttattaggcactttatttgcattattacactgtaatatataataaaataattatagaactcaccataatgtagaatcagtgggcgtgttaagcttgttttcctgcaactggatggtcccacctgagcgtgatgggagaaagtgacagatcaataggtattagattctcataaggacagcgcaacctagatccctcacatgcacggttcacaacagggtgcgttctcctatgagaatctaacgctgctgctcatctgagaaggtggagctcaggcgggaatgtgagcaaaggggagtggctgtaaatacagacgaagcttccctcactccctcactcgacaccgctcacctcctgctgtgtggctccttgcggctccatggctcaggggttggggacccctgCTCAAGTGCATCCAAAGCGACCCTTCCCACACCAGTCTTCACAGTGGTCAAGGGCAGCAACCACTTAGCTCCCAAGGCATGTGCCTCAGCTGGCATTTCGTCACAATCAACAGTAAGTGGTAGCTTGAGTCACTGTGAGGTCACCTACTGGAAATCACCAGCATCCCATTTCCCACTGGCAAAGAGCTCAGCACTGCCCCCGGGAAACCAAACCTATGCCCAAATCCATCTGTGTGGGTGTATCTCCTGGGACCCTTCCTAACAtattagtcagagtccaatcaggaagcataaaccactcaaaagtttaaagtggtaaaatttaatacagagaattattcattgtaacaggtgaacagcataatgagagattggctagcacaaagtaaacagaactctagagaatataggactagcCCAggccaggcatggtggctcaggcctgaaattccagcaatttgagaagctaatgcaggaggattgcttaaggccaggagctagagaccggtctggacgacacagtgagaccctgtctctatccaaaagaagaaaaaagttagctgggggtggtagtgcacacttgtagtcccagctactcggaatgcggaagtttgagcctgggaggtcaaggctgcagtgaggcatgattatgccactacagtccagcctggtgacagagcaagaccctgtctcaaagaacaaaaCAACAACAACCATTTACAGACAGAAAAGAAATAGAGCTAATAAGCTGAGGAAAGATGTTgaaatgtgacaagtaaagtaatatgagttcttttgtctatgtaaaataatcaaacaaaaaatgacttactaaattataataccctgtgctggcaaaggtgcagtgaaatgggcaccttcttatactatgaggggtgtttaaattgtgtataagccttcccgggtaaagcctgtcaattttttaaaataatggagacagggtctcaccatactgccatactgcctcctccaactcttggcctcaagcaatcctcctctcttagcctcccaaagtgctaagattatagctgggaggcaccCAAAACCCTGTCAATTTACATCAAGGGTAAGGAGAATGTCCATTCACCATGACTCACAGTAATCTTACTTCTGGGGAGACAATTCAATCTAAACAAAAGGTCATCTGTACACACACAGTAAAAATCTGGGAGTAACTGAAGACAGAGTTGGTAAGTGAAATAAGAAACAGTTATAAGAAATTAAACTATGGTATCAATAGGCACCTGGTAAAAGGTCAGTTGATGTTAGCTGCTACttttttgttgttttgagacagggtctcactctgtcacccaggctggagtgcagaggcctgatcatgactcactgcagtctcagcctccctgggctcaagtgatcctcccacctcagcctcccaagtagctgggactacaggaacatgccaccacactaggctaattcatgtatttttctgtagggatggtgactccccctttgtttccaaggcctatcgcaaactcttggcctcaagccatcctcctgcctcagcctcccaaagtgttgcgattaccagtgtgagccaccacacctggccAGCTGCTACTTTTATCAATATTATTCTTATTCCACTCAATTAAAAATTATTATTTTCAAGGCTATGCAACAGTATGTATCCCACAGCATAATTGTAAAAACATATAGTCgtcgtccctcagtatacagaattagttccagccccccatctctgcatataccaaaatccatgcttactcacgtttcgctgtcacccctctagaatccacgtatacgaaaattccaaatgttagttgggcatagtggcaagcacctgtagtctcagccacgtgggaggttgaggtgggaggatcgcttcagcctggaaggttgaggctgcagtcagctgcgatagcactactacactccagccttggacaacagagggagaccctgtctcagaaaaaaaacaaaataaaaCAGGTTAGAAATTGTAATGAGGTCTGCTGGGCAAAATTCCATATAAGCAAAGTATAAATTAATAAAGCAAATCGTGATAAATTAGTACGATTGACTTTCTGGAGTTTCTGACAATAAAAGTAAGGAAAATGCAGAACACAAA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 672482 INS0004 G <INS> 30 . END=672482;SVTYPE=INS;SEQ=GGCAGCAACCACTTAGCTCCCAAGGCATGTGCCTCAGCTGGCATTTCGTCACAATCAACAGTAAGTGGTAGCTTGAGTCACTGTGAGGTCACCTACTGGAAATCACCAGCATCCCATTTCCCACTGGCAAAGAGCTCAGCACTGCCCCCGGGAAACCAAACCTATGCCCAAATCCCATCTGTGTGGGTGTATCTCCTGGGACCCTTCCTAACAtattagtcagagtccaatcaggaagcataaaccactcaaaagtttaaagtggtaaaatttaatacagagaattattcattataacaggtgaacagcataatgagagattggctagcacaaagtaaacagaactctagagaatatggactagcCCAggccaggcatggtggctcagcctgaaattccagcaatttgagaagctaatgcaggaggattgcttaaggccaggagctagagaccggtctggacgacacagtgagaccctgtctctatccaaaagaagaaaaaagttagctgggggtggtagtgcacacttgtagtcccagctactcggaatgcgaagtttgagcctgggaggtcaaggctgcagtgaggcatgattatgccactacagtccagcctggtgacagagcaagacctgtctcaaagaacaaaacaacaacaaCCATTTACAGACAGAAAAGAAATAGAGCTAATAAGCTGAGGAAAGATGTTgaaatgtgacaagtaaagtaatatgagttcttttgtctatgtaaaataatcaaacaaaaaatgacttactaaattataataccctgtgctggcaaaggtgcagtgaaatgggcaccttcttatactatgaggggtgtttaaattgtgtataagccttccgggtaaagcCTGTCAATTTTTTAAAATAAtggagacagggtctcaccatactgccatactgcctcctccaactcttggcctcaagcaatcctcctctcttagcctcccaaagtgctaagattatagctgggaggcaccCAAAACCCTGTCAATTTACATCAAGGGTAAGGAGAATGTCCATTCACCATGACTCACAGTAATCTTACTTCTGGGGAGACAATTCAATCTAAACAAAAGGTCATCTGTACACACACAGTAAAAATCTGGGAGTAACTGAAGACAGAGTTGGTAAGTGAAATAAGAAACAGTTATAAGAAATTAAACTATGGTATCAATAGGCACCTGGTAAAAGGTCAGTTGATGTTAGCTGCTACttttttgttgttttgagacagggtctcactctgtcacccaggctggagtgcagaggcctgatcatgactcactgcagtctcagcctccctgggctcaagtgatcctcccacctcagcctcccaagtagctgggactacaggaacatgccaccacactaggctaattcatgtatttttctgtagggatggtgactccccctttgttccaaggcctatcgcaaactcttggcctcaagccatcctcctgcctcagcctcccaaagtgttgcgattaccagtgtgagccaccacacctggccAGCTGCTACTTTTATCAATATTATTCTTATTCCACTCAATTAAAAATTATTATTTTCAAGGCTATGCAACAGTATGTATCCACAGCATAATTGTAAAAACATATagtcgtcgtcctcagtatacagaattagttccagccccccatctctgcatataccaaaatccatgcttactcacgtttgctgtcacccctctggaatccacgtatacgaaaattccaaatttagttgggcatagtggcaagcacctgtagtctcagccacgtgggaggttgaggtgggaggatcgcttcagcctggaaggttgaggctgcagtcagctgcgatagcactactacactccagccttggacaacagagggagaccctgtctcagaaaaaaaaaaaaataaaaCAGGTTAGAAACTGTAATGAGGTCTGCTGGGCAAAATTCCATATAAGCAAAGTATAAATTAATAAAGCAAATCGTGATAAATTAGTACGATTGGCTTTCTGGAGTTTCTGACAATAAAAGTAAGGAAAATGCAGAACACAAAGACAGAGAGTAAAAAGAGAAATTAGGAAAGCATTCTACATGTTGAATAGGAAGACACTGGCCATGTTCGTGCAGCGGCAGTATGTCGTGACATGACATACCTTGGAGAGAAGTTAACAGATGAGGAAGTTGATAAAAATCATCAGAGAAGCAAAATACTGGTAGCGACACTCAAGTAAACCATGAAATTTCCATAACTTATGTCAGCAAAGTGGGAATATTGTACAGTGTGTGTTGAAGTTCCTATACAACATTGTTTATCTGCCTTTTGTTTGTTTGTAAGGAATGTAATACTAAAAGTTCTTCTTGCTGTCAAAAGAATATGGTGAATAAGTCATTTTAACTTATTCTTCTGTTTTTCTTTATCTTCCTGCCATCATCCCACAGCCTTACTTTAGAAATTTTTTTTTTAGAAAATTGAACAAGTGCTCCTgtggtggcacatgcctcgaggatgggaggcaggggtggaagggtcacttgaggccattagtttgacaccagcctggccaacaaagtgagaccccgtgtctacaaaacaatttaaaaattagccaagtatcatcatgtatacctacagtcccagctacCTGAACTTACTGAGAAAGTTCAGGCCTGGAGAGAAGGCTGGGAGGCAGGAGCTGGGTCTAAAGAGGCCATTGTAACGATGGAGCTGTGCCTGTGGAGGCTGTTGTGAGGCAGTAGCTCATCTGCGGAGGCTGCCGTGACGTAGGGTATGGGCCTAAATAGGCCATTGTGAGTCATGAGCTTGGTCTGTAGAGGCTGACTGGAGAAAGTTCTGGCCTGGAGAGGCTGCCGGGAGGTAGGAGCTGGGCCAAAAgatgtaagcacatttgcatttattaggcactttatttccattattacactgtaatatataataaaataattatagaactcaccataatgtagaatcagtgggcgtgttaagcttgttttcctgcaactggatgtcccacctgagcgtgatgggagaaagtaacagatcaataggtattagattctcataaggacagcgcaacctgatccctcacatgcacggttcacaacagggtgcgttctcctatgagaatctaacgctgctgctcatctgagaaggtggagctcaggcgggaatgtgagcaaaggggagtggctgtaaatacagacgaagcttccctcactccctcactcgacaccgctcacctcctgctgtgtgctccttgcggctccatggctcaggggttggggacccctgCTCAAGTGCATCCAAAGCGACCCTTCCCACACCAGTCTTCACAGTGGTCAA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 750005 INS0005 T <INS> 30 . END=750005;SVTYPE=INS;SEQ=GGAAATGTTAATTCTGAAAATAGGTTTCACATCTTTTTTTTAACTTATATAAAATTGACTGGATTTCTCTTCTGTGTGTTGTGTTAGATATTTAGGA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 766597 DEL0001 T <DEL> 30 . END=769116;SVTYPE=DEL;SEQ=TTGtcttatgtttaaaaatgtccttcagtcattgcaggtcacaagcaggctatcagctcagtaattaaaataattcggttcttcatagtgaatgtaattctaaattagattttaagttgtaactccctgcttcagcAATGGTGATGGGGCCTAGAAACCAGAGCACCTGAGCTCCATCCTACAGGGGGCCATACCGGGATCTTTCCATTTTCAGAGGCTTCTCTCTGACAGTGAAGTGTGATGACAGACTTGGGGGCAGGGCAATGGCTAGCTTCTGAAAGCCGCTGGCACTTTAGTGATAAATTTAAATTAAGTGACGGGTAGTGAGGTGTTTGTCAAGGAAAGTGCCGTCCAAATGCTAAATACTGATTATTTCTGCAGCAGTGACTGCAATACCTCACTCAATCTCTGTCTTTCTTGAAGAAGTCATAAATAAACACGATGAATCTATGTAGAAGCGGTAAGTCAGAAAAATCTGTGTGTTTCATTACATAAACAACGGTTTATCATTAATTGACAGGCTTGGATTGGGAGTTGTTAATGAAACTGATGAGATGTTGGACAGATGAGCTCCCTCTTATTTCGAAGAGCTTATCTAGGGCTGAGTCATGGGACCTGATAGCGTCTTGTGGTGCTGTCTTCTTGTAGATATATCCGTGTTTTAGAGGATTTAGTTTTTTAAAATTTCTCTTAGAATGTGAATTTTACAAAAAAGCACTTCCCAAATGGATGATTATTTGAAAAATGAATTGTCAGACAAAACTGACACATCAGTTATGGAGAAAACCCTTCAAGAACTGGCTTTAAATGTGTTTTAGTGGGAGCCACAGTGTGGAGAGAAACAGAAGAGGGAGGAGAGGGCGCCCCTTGTTTCTTCTCTCCACAGCCAGGCCTTCGCCACCTTTCTCAGTGTCTTCAAGAATAAAATGCCTCCGTTGTTGGTTTTAGCTGCTTTTCTCCCTCGGGGTAGGTAAAGTGGTTCCAAAACGACAAGCATCCTGTAAAGTCGGAAGAGCTGTGTCAACATTAAGCTGCGTGACTTTGGCTATGAGGGAAAAAAGGCTGGTGAGTGCAGAGAAGACAGAGCTGTGGCAGGGCTCCTCCCGCCAAGTCGCCATGGAGAGGGGCTGTGAGGTGTCCTTAAACGGCCTGGTCTCCAGGGTGACTCAGGAAGGGCTGAGAGTGGTCAGCTCCCTCACCTGCTAAACCCGCAGCGCCCCGCTCAGCACACACCCTCCACTCTCCAACCTTGCCCAAGTGCTGGTCCGTCACGGCACCAGGACAGGGCATGGAGACTTGGGCTGAttcttttctctcccttcctccctcttttttttcttctctcactcctccttttcctttcctgctgtttcctgctctcctgtttctGTCCTGCAGTGTCTGGAGCTCCAGAGAGGCTGGCCCTGGGGTGGGGTCCACATGGACATGGGCGTAAGCAGGTTTGATGGTCATGGGCATAGGCAGGTTCGATGGCCAGAGTTCTTTCAGCTCACAGTAAgttttgttttgttttgttttgttttgttttgttttgttttgttttagatggagtcttgctttgtcgcccaggctgtagtgcagtggcgtgatcttggctcactgcagcctccaccttagagcaatcctcttgcctcatcctcccgggtagttgggactacatgtgcatgccacatgcctggctaatttttgtatttttagtagagacacggtttcaccatgttggccaggctggtgtccaactcctgacctcaggtgatccatccgcctcagcctcccaaagtgccgggattacaggtatgagccactgcacctggccTCAGCTGACAGTAGGTTTTAGAGCCAGATATTTACACACTAACTTGCCAGAAACATATATGACTTTATTATTCTAATTGATTTTAAGAGATATTATGAACTCAAATCCAAAGTTACGTCCCACCTATCATGACAATTTCATTAAGGAAAAAGTCAAACCATTTTGGAAATGATTTAAGTGAGCAACTTGGAAAAATTTTCTACATTCCTAACTTACTTTCCAGGGGATCGTTCCTGACTTAACATCTATCAGGTGTCTTAGCTTAGCTCTCTTTTTACTTCAGGTTTTTCTTGCCTCCTCAGTGTGCTGGGAGTCCCACTCCACTCAAATGCCCTCAGGTCTAATAATTAACTTCATTGCAGGCTCCTGGCAGGCCTGGGTGGGCGGCAGCTGCATTGTGCTCCTGAAGAAGATTTAAGTTGGGTTTGGTGAACTGGTAGAATTTGCATTTTGCTGTTTCTTTCCCTCTCCCAGAATTTGTACCTTTAAATAGGTTTTTTAGTGTCATTAAGTATATCAAAAGGAAACCCAGTGGGGCAAATTGGCCGGGCTccatagaggtggccttgtctaagcctttcatcttatcgataaggaaagacaggaccagagaagtCGCCGACTGTCCCTGGTCCCACTGCTTGGTTTGGGGCAATTTCCTGAAAATAATATCCAAGATGCA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 1131343 INS0006 A <INS> 30 . END=1131343;SVTYPE=INS;SEQ=TGGGGTCTCACCATGTTGGCCAGGCTGgtctcaaactcctgagctcaagcgatcctcctgcctcagcctcccaaagtgctgggactacaggtgtgagccatgcgcccgaccaatttgtgtatttttagtagagatggggtctcaccatgttggccaggctggtctcaaactcctgagctcaagcgatcctcctgcctcagcctcccaaagtgctgggactacaggtgtgagccacgcgcctgaccAACTTGTGTATTTCTAGTAGAG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 1176055 INS0007 C <INS> 30 . END=1176055;SVTYPE=INS;SEQ=CAGCcccccgcccccattcaccccggccgtggtccctgccccagcccccgccgcccccattcaccccggccgtggtccctgccc;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 1183435 DEL0002 A <DEL> 30 . END=1183699;SVTYPE=DEL;SEQ=GGCTGGATCTCCAACTCTGACCTACAGGCAGGAAAGTGGGCAGCCCTGGGAGGCTGGACTGAGGGAGGCTGGACTTCCCACTCAGGCCTACACGCAGGAAAATGGGCAGCCCTGGGAGGCTGGACCGAGGGAGGCTGGGCCTCCCACTCCACCCTACAGGCCAGGACACGGGCAGCCCTGGGAGGCTAGACCGAGGGAGGCTGGGCCTCCCATCTACCCTACAGGCCGGGACACAGGCAGCCCTGGGAGGCTGTACCGAGGGAG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 1413234 INS0008 C <INS> 30 . END=1413234;SVTYPE=INS;SEQ=CccaccacgcctggctaatgttgtattttagtagagacggggtttctccatgttggtcaggctggtctctaactcccgacctcaggtgatccacccgcctcggcctctcaaactgttgggattacaggcatgT;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 1430045 INS0009 A <INS> 30 . END=1430045;SVTYPE=INS;SEQ=TGGTGTGCTGCTGCCCCTGCACCCCGTGAGATGAATCCTGCCTCTGGGAGGTACAGCTTCCTGGAGGGGTGGCCCTGTGAGCATCTGCGTAGCCCCTCTCCTCTGCTGGGCCCTGGGTGACGTGCAGCCACTCGGGTGGACCCTGAGGGTCCCTGCACCTGTTTGCCCTCTCTTGGGTGGGCTCAAGACCAAAAATGATGTTGAGCAGTCCTGGGCCCCTGAGCCACAGTGGCGGTGCGGCTCCGGTCAGTGTCTCCTGCGCTCCCGGGCCCCCGACCCACAGTGGCGGTCCGGCTCTGGTCAGTGTCTCCTGCGCTCCCGGGCCCCCGACCCACAGTGGCGGTCCGGCTCCGGTCGGTGTCTCCCCACACAGTGGCTCTTGGCGAGGGGTGGGCGCTGGCAGAGGGGACGGGCACCACGTGGTCATCCCCATGACAGGTTCTGTCATGGTGACAGTGTTGTGGAGGA;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
chr1 1501064 INS00010 T <INS> 30 . END=1501064;SVTYPE=INS;SEQ=GGTGCAGGCAGAGAACAGACGTCGCGATGGGCCCGACGGTGCTGGCTCCATGGGAACCGAGACCCAACACCCAAAGGAGTCCCACAGGCTCAGGGG;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;MAPQ=60 GT ./.
The error isn't really helpful. What am I doing wrong?
Hi all,
I've been running into what I believe is a pysam error message when running multigrmpy.py. Before I go into too much detail, here is some background info on what I'm trying to do.
I've been working on an experiment where I take a mixture of two separate BAMS which have been sampled using samtools view -s
and mixed with samtools merge
. Then I try to genotype some variants from one of the two samples to see how robust Paragraph is to varying allele balance (by varying the mixture ratios of the two sample bams).
Paragraph seems to run just fine on a mixture BAM, since it seems to output and populate the genotypes.json.gz. In grmpy.log, the log messages do not indicate any errors and it seems to get through the entire genotyping process. It seems that when the time comes to create the corresponding genotypes.vcf.gz, I get a pysam error.
2020-11-03 06:18:01,921 ERROR Traceback (most recent call last): 2020-11-03 06:18:01,921 ERROR File "/mnt/local/paragraph/build/bin/multigrmpy.py", line 340, in run vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names) 2020-11-03 06:18:01,921 ERROR File "/mnt/local/paragraph/build/lib/python3/grm/vcfgraph/vcfupdate.py", line 232, in update_vcf_from_grmpy set_record_for_sample(record, sample, grmpyRecord, alleleMap) 2020-11-03 06:18:01,921 ERROR File "/mnt/local/paragraph/build/lib/python3/grm/vcfgraph/vcfupdate.py", line 310, in set_record_for_sample record.samples[sample]["PL"] = pls_to_set 2020-11-03 06:18:01,921 ERROR File "pysam/libcbcf.pyx", line 3455, in pysam.libcbcf.VariantRecordSample.__setitem__ 2020-11-03 06:18:01,921 ERROR File "pysam/libcbcf.pyx", line 859, in pysam.libcbcf.bcf_format_set_value 2020-11-03 06:18:01,921 ERROR File "pysam/libcbcf.pyx", line 597, in genexpr 2020-11-03 06:18:01,921 ERROR File "pysam/libcbcf.pyx", line 597, in genexpr 2020-11-03 06:18:01,921 ERROR File "pysam/libcutils.pyx", line 129, in pysam.libcutils.force_bytes 2020-11-03 06:18:01,921 ERROR TypeError: Argument must be string, bytes or unicode. Traceback (most recent call last): File "/mnt/local/paragraph/build/bin/multigrmpy.py", line 353, in <module> main() File "/mnt/local/paragraph/build/bin/multigrmpy.py", line 349, in main run(args) File "/mnt/local/paragraph/build/bin/multigrmpy.py", line 340, in run vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names) File "/mnt/local/paragraph/build/lib/python3/grm/vcfgraph/vcfupdate.py", line 232, in update_vcf_from_grmpy set_record_for_sample(record, sample, grmpyRecord, alleleMap) File "/mnt/local/paragraph/build/lib/python3/grm/vcfgraph/vcfupdate.py", line 310, in set_record_for_sample record.samples[sample]["PL"] = pls_to_set File "pysam/libcbcf.pyx", line 3455, in pysam.libcbcf.VariantRecordSample.__setitem__ File "pysam/libcbcf.pyx", line 859, in pysam.libcbcf.bcf_format_set_value File "pysam/libcbcf.pyx", line 597, in genexpr File "pysam/libcbcf.pyx", line 597, in genexpr File "pysam/libcutils.pyx", line 129, in pysam.libcutils.force_bytes TypeError: Argument must be string, bytes or unicode.
My initial thought was that there could have been something wrong with the input vcf format, but I'm not sure where to start based on the error alone.
Thanks for your help, and please let me know if you need more details to clear anything up.
I have been testing Paragraph and have been experiencing longer than expected runtimes. The stated performance is:
It typically takes up to a few seconds to genotype a single event in one sample (single-threaded). It took us 30 minutes to genotype ~20,000 SVs using 20 CPU cores (with I/O).
This works out to roughly 1000 SVs per Core per 30min (1.8 seconds/sv single core).
My setup:
Results:
Threads | Runtime (Min) | Seconds/SV/core |
---|---|---|
1 | 123 | 7.08 |
2 | 64 | 7.37 |
4 | 34 | 7.83 |
8 | 26 | 11.98 |
Do you have any feedback as to why these runtimes seem significantly slower than your suggested times?
Cheers,
Wayne
I am running paragraph for GIAB dataset and I used the following command
python3 .../bin/multigrmpy.py -i HG002_SVs_Tier1_v0.6_chr22.vcf -m samples.txt -r hg19.chr22.fa -o test
My samples.txt file looks like the following
id path depth read length
sample1 /stornext/snfs5/next-gen/scratch/fritz/projects/Sairam/Proj1_nibSV/TEST/HG002.hg19.chr22.bam 60 250
Could you please check the following error the paragraph throws ?
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".../bin/multigrmpy.py", line 353, in <module>
main()
File ".../bin/multigrmpy.py", line 349, in main
run(args)
File ".../bin/multigrmpy.py", line 261, in run
graph_files = load_graph_description(args)
File ".../bin/multigrmpy.py", line 52, in load_graph_description
header, records, event_list = convert_vcf_to_json(args, alt_paths=True)
File ".../lib/python3/grm/vcf2paragraph/__init__.py", line 156, in convert_vcf_to_json
variants = pool.map(run_vcf2paragraph, zip(to_process, itertools.repeat(params)))
File ".../lib/python3.7/multiprocessing/pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File ".../lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
Exception: Different padding base for REF and ALT at 22:18588640
This error below occurs when processing this INV called by MANTA. Any suggestions? The variant record is supplying a REF. ALT, SVLEN and an SVINSEQ.
chr6 2893187 MantaINV:4:20498:20498:4:0:0;MantaINV:61660:0:0:1:0:0 C <INV> 548 PASS END=2893191;SVTYPE=INV;SVLEN=4;CIPOS=0,4;CIEND=-4,0;HOMLEN=4;HOMSEQ=TATA;SVINSLEN=51;SVINSSEQ=ACGTATATATATACGTATATATAATATATATATTATATATACGTATATATA;INV5;AC=2;AN=9578;FIBC_P=-0.000417444;HWE_SLP_P=-0.195618;FIBC_I=-0.000417444;HWE_SLP_I=-0.195618;MAX_IF=0.749985;MIN_IF=0.749985;LLK0=-281075;BETA_IF=0.49997,-3.8073e-06,1.89345e-06,3.9655e-06,2.17351e-06;ANN=<INV>|intron_variant|MODIFIER|SERPINB9|ENSG00000170542|transcript|ENST00000380698.4|protein_coding|5/6|c.567+220_567+223inv||||||;NS=4789;AF=0.000208812;MAF=0.000208812;AC_Het=2;AC_Hom=0;AC_Hemi=0;HWE=1;ExcHet=0.999896
It would be nice to get the chromosome in the below error message when running when trouble shooting.
2020-03-09 10:26:55,417 ERROR Exception when running vcf2paragraph on /scratch/tmppmn0a0d6.vcf.gz
2020-03-09 10:26:55,421 ERROR Traceback (most recent call last):
2020-03-09 10:26:55,422 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 286, in run_vcf2paragraph alt_paths=params["alt_paths"])
2020-03-09 10:26:55,422 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 86, in convert_vcf ref, indexed_vcf.name, ins_info_key, chrom, start, end, ref_node_padding, allele_graph)
2020-03-09 10:26:55,422 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 128, in create_from_vcf graph.add_record(record, allele_graph, varId, ins_info_key)
2020-03-09 10:26:55,422 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 202, in add_record self.add_alt(vcf.pos, vcf.stop, ref_sequence, alt_sequence, alt_samples, refSamples)
2020-03-09 10:26:55,423 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 296, in add_alt raise Exception("{}:{} missing REF or ALT sequence.".format(start, end))
2020-03-09 10:26:55,423 ERROR Exception: 2893187:2893191 missing REF or ALT sequence.
2020-03-09 10:26:55,506 ERROR VCF to JSON conversion failed.
2020-03-09 10:26:55,509 ERROR multiprocessing.pool.RemoteTraceback: """Traceback (most recent call last): File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 286, in run_vcf2paragraph alt_paths=params["alt_paths"]) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 86, in convert_vcf ref, indexed_vcf.name, ins_info_key, chrom, start, end, ref_node_padding, allele_graph) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 128, in create_from_vcf graph.add_record(record, allele_graph, varId, ins_info_key) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 202, in add_record self.add_alt(vcf.pos, vcf.stop, ref_sequence, alt_sequence, alt_samples, refSamples) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 296, in add_alt raise Exception("{}:{} missing REF or ALT sequence.".format(start, end))Exception: 2893187:2893191 missing REF or ALT sequence."""
2020-03-09 10:26:55,509 ERROR The above exception was the direct cause of the following exception:
2020-03-09 10:26:55,510 ERROR Traceback (most recent call last):
2020-03-09 10:26:55,510 ERROR File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 52, in load_graph_description header, records, event_list = convert_vcf_to_json(args, alt_paths=True)
2020-03-09 10:26:55,510 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 156, in convert_vcf_to_json variants = pool.map(run_vcf2paragraph, zip(to_process, itertools.repeat(params)))
2020-03-09 10:26:55,510 ERROR File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get()
2020-03-09 10:26:55,510 ERROR File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value
2020-03-09 10:26:55,511 ERROR Exception: 2893187:2893191 missing REF or ALT sequence.
2020-03-09 10:26:55,511 ERROR multiprocessing.pool.RemoteTraceback: """Traceback (most recent call last): File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar return list(map(*args)) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 286, in run_vcf2paragraph alt_paths=params["alt_paths"]) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 86, in convert_vcf ref, indexed_vcf.name, ins_info_key, chrom, start, end, ref_node_padding, allele_graph) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 128, in create_from_vcf graph.add_record(record, allele_graph, varId, ins_info_key) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 202, in add_record self.add_alt(vcf.pos, vcf.stop, ref_sequence, alt_sequence, alt_samples, refSamples) File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 296, in add_alt raise Exception("{}:{} missing REF or ALT sequence.".format(start, end))Exception: 2893187:2893191 missing REF or ALT sequence."""
2020-03-09 10:26:55,511 ERROR The above exception was the direct cause of the following exception:
2020-03-09 10:26:55,511 ERROR Traceback (most recent call last):
2020-03-09 10:26:55,512 ERROR File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 261, in run graph_files = load_graph_description(args)
2020-03-09 10:26:55,512 ERROR File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 52, in load_graph_description header, records, event_list = convert_vcf_to_json(args, alt_paths=True)
2020-03-09 10:26:55,512 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 156, in convert_vcf_to_json variants = pool.map(run_vcf2paragraph, zip(to_process, itertools.repeat(params)))
2020-03-09 10:26:55,512 ERROR File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get()
2020-03-09 10:26:55,512 ERROR File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value
2020-03-09 10:26:55,513 ERROR Exception: 2893187:2893191 missing REF or ALT sequence.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/share/pkg.7/python3/3.6.9/install/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
return list(map(*args))
File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 286, in run_vcf2paragraph
alt_paths=params["alt_paths"])
File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcf2paragraph/__init__.py", line 86, in convert_vcf
ref, indexed_vcf.name, ins_info_key, chrom, start, end, ref_node_padding, allele_graph)
File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 128, in create_from_vcf
graph.add_record(record, allele_graph, varId, ins_info_key)
File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 202, in add_record
self.add_alt(vcf.pos, vcf.stop, ref_sequence, alt_sequence, alt_samples, refSamples)
File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfgraph.py", line 296, in add_alt
raise Exception("{}:{} missing REF or ALT sequence.".format(start, end))
Exception: 2893187:2893191 missing REF or ALT sequence.
"""
In v2.4a some SVs with missing genotypes were still labeled as "PASS" in VCF filter field. We're going to revisit missing genotypes in v2.5 and adjust this filter properly...
I have paragraph working using multigrmpy.py
on test data using one VCF as input. My use case is actually having two different VCF files for the same sample, made using independent methods. Is it possible for paragraph to use both VCFs, ie build a graph from the union of all the variants from both VCF files, and then genotype? I've been trying out the vcf2paragraph.py
and addVariants.py
scripts, but not managed to do it.
Hello, thanks a lot for making paragraph!
I am just figuring out how it works now, so I started with just taking one .vcf
file generated by manta, and I used the exact same .bam
file to genotype the variants I called (just to see the consistency of manta and paragraph), but I got back an error
Exception: Distance between vcf position and chrom start is smaller than read length.
tried to dig a bit, but the lines of code were not very indicative of why manta does not have troubles to call variants close to the scaffold edges and paragraph does. I removed all variants that started < 150 bases from the start of scaffolds and restarted genotyping and now it seems it runs.
So, I wanted to ask. What is the point? Why it is not possible to genotype SVs close borders? And would it be worth making manta and paragraph compatible?
Thanks
background
I have a bunch of Illumina reseq data (1 reference, 5 reseq individuals) with reasonable coverage (~60x ref, ~15x reseq). I have a non-model species, i.e. without a good library of SVs, but I still think that genotyping individuals is by far smarter idea than just merging SV calls. I am just figuring out what is the best way to create a library of SVs out of SV calls that I will feed to Paragraph to get the same data genotyped on the pool variants I found in the population.
I was thinking before about using SURVIVOR, but the merging does not explicitly resolve the sequences of SVs (discussed here), now I am thinking about just pasting .vcf
files of all 6 individuals while filtering out only the exact overlaps. Not sure what is the best approach here, any input welcome.
An option to set the temp directory manually when running multigrmpy.py
would be useful. Currently it defaults to putting files in /tmp
and the server I'm using does not have enough space in that location so I keep getting out of disk space errors.
I tried setting --scratch-dir
but if that is what it is intended for, it did not solve the issue - the temporary files still get placed in /tmp, though files get placed in the specified scratch directory as well.
Furthermore, when multigrmpy.py
errors out, the /tmp
and --scratch-dir
directories need to be cleared of their temp files manually.
Running paragraph using hg38 genomes and ran into this error
Exception: chr3:90549400:<INV> illegal character in reference sequence
Traceback:
Traceback (most recent call last):
File "/home/dantakli/paragraph/lib/python3/grm/vcfgraph/vcfgraph.py", line 199, in add_record
alt_sequence = ref_sequence[0] + reverse_complement(inv_ref)
File "/home/dantakli/paragraph/lib/python3/grm/vcfgraph/vcfgraph.py", line 436, in reverse_complement
return ''.join([complement[x] for x in seq[::-1]])
File "/home/dantakli/paragraph/lib/python3/grm/vcfgraph/vcfgraph.py", line 436, in <listcomp>
return ''.join([complement[x] for x in seq[::-1]])
KeyError: 'W'
$ samtools faidx /home/dantakli/ref/GRCh38_full_analysis_set_plus_decoy_hla.fa chr3:90549400-91081922 | grep "W"
AAGTTTCTGAGAATCATTCTCTCTTGTTTTTCTGTGAAGWTATTGCCTTTTCTACCATAG
For now I will likely skip this SV, but just letting you all know that it seems that Paragraph doesn't support noncanonical bases.
I'm trying to run Paragraph on a small VCF file (~100 entries) with no Insertions, but I'm getting the below error:
File "/share/Codes/binaries/Paragraph/bin/multigrmpy.py", line 34, in <module>
from grm.vcf2paragraph import convert_vcf_to_json
File "/share/Codes/binaries/Paragraph/lib/python3/grm/vcf2paragraph/__init__.py", line 32, in <module>
from grm.vcfgraph import VCFGraph, NoVCFRecordsException
File "/share/Codes/binaries/Paragraph/lib/python3/grm/vcfgraph/__init__.py", line 21, in <module>
from grm.vcfgraph.vcfgraph import VCFGraph, NoVCFRecordsException
File "/share/Codes/binaries/Paragraph/lib/python3/grm/vcfgraph/vcfgraph.py", line 178
f"Missing key {ins_info_key} for <INS> at {self.chrom}:{vcf.pos}; ")
^
SyntaxError: invalid syntax
All VCF entires have END and SEQ in the INFO field. Here's the header and one entry in the file:
##fileformat=VCFv4.2
##source=LUMPY
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=STRANDS,Number=.,Type=String,Description="Strand orientation of the adjacency in BEDPE format (DEL:+-, DUP:-+, INV:++/--)">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=<ID=CIPOS95,Number=2,Type=Integer,Description="Confidence interval (95%) around POS for imprecise variants">
##INFO=<ID=CIEND95,Number=2,Type=Integer,Description="Confidence interval (95%) around END for imprecise variants">
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakends">
##INFO=<ID=EVENT,Number=1,Type=String,Description="ID of event associated to breakend">
##INFO=<ID=SEQ,Number=1,Type=String,Description="Sequence of the structural variation">
##INFO=<ID=SECONDARY,Number=0,Type=Flag,Description="Secondary breakend in a multi-line variants">
##INFO=<ID=SU,Number=.,Type=Integer,Description="Number of pieces of evidence supporting the variant across all samples">
##INFO=<ID=PE,Number=.,Type=Integer,Description="Number of paired-end reads supporting the variant across all samples">
##INFO=<ID=SR,Number=.,Type=Integer,Description="Number of split reads supporting the variant across all samples">
##INFO=<ID=BD,Number=.,Type=Integer,Description="Amount of BED evidence supporting the variant across all samples">
##INFO=<ID=EV,Number=.,Type=String,Description="Type of LUMPY evidence contributing to the variant call">
##INFO=<ID=PRPOS,Number=.,Type=String,Description="LUMPY probability curve of the POS breakend">
##INFO=<ID=PREND,Number=.,Type=String,Description="LUMPY probability curve of the END breakend">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=DUP:TANDEM,Description="Tandem duplication">
##ALT=<ID=INS,Description="Insertion of novel sequence">
##ALT=<ID=CNV,Description="Copy number variable region">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=SU,Number=1,Type=Integer,Description="Number of pieces of evidence supporting the variant">
##FORMAT=<ID=PE,Number=1,Type=Integer,Description="Number of paired-end reads supporting the variant">
##FORMAT=<ID=SR,Number=1,Type=Integer,Description="Number of split reads supporting the variant">
##FORMAT=<ID=BD,Number=1,Type=Integer,Description="Amount of BED evidence supporting the variant">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00514
chr1 180321 . C <DEL> 30 PASS END=180372;SVTYPE=DEL;SVLEN=-51;SEQ=CATAACCCTAAAACGCTAACCCTCATCCTCACCCTCACACCTCACCCTCAC;CIPOS=-10,10;CIEND=-10,10;IMPRECISE;SU=0;PE=0;SR=0 GT:SU:PE:SR ./.:0:0:0
Hello, I got the error information as below when I 'make' the Paragraph as Installation.md said:
[ 64%] Building CXX object external/graphtools-
/gpfs/home/heyaoxi/boost_1_65_0/paragraph-tools-build/external/graphtools-src/src/graphIO/../../external/include/nlohmann/json.hpp:8678:43: error: logical ‘and’ of mutually exclusive tests is always false [-Werror=logical-op]
const bool is_negative = (x <= 0) and (x != 0); // see issue #755
cc1plus: all warnings being treated as errors
make[2]: *** [external/graphtools-build/src/graphIO/CMakeFiles/graphIO.dir/build.make:83: external/graphtools-build/src/graphIO/CMakeFiles/graphIO.dir/GraphJson.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:571: external/graphtools-build/src/graphIO/CMakeFiles/graphIO.dir/all] Error 2
make: *** [Makefile:150: all] Error 2
I saw the issue #755 and changed some code as recommended: changed "const bool is_negative = x < 0;" to "const bool is_negative = std::is_same<NumberType, number_integer_t>::value and (x < 0);" but I got new error information:
collect2: error: ld returned 1 exit status
make[2]: *** [src/c++/main/CMakeFiles/grmpy.dir/build.make:115: bin/grmpy] Error 1
make[1]: *** [CMakeFiles/Makefile2:653: src/c++/main/CMakeFiles/grmpy.dir/all] Error 2
make: *** [Makefile:150: all] Error 2.
Is anyone could give some help here?
--Yaoxi
Dear Paragraph team,
Do you have any plan to develop a tool which can provide the alternative reference from Paragraph results?
Something similar to FastAlternateReferenceMaker from GATK but need to support structure variations.
Thanks.
Won
I use svtyper in smoove. After reading your paper, I thought I might replace svtyper with paragraph.
I did a separate evaluation using the GiaB truthset from here
and using truvari.
I evaluated only on deletions > 300 bases.
When genotyping this large-DEL truthset. I get 81% recall from paragraph and 91% with svtyper.
I realize that you used a different call-set and not limiting to Tier 1 regions, but I am surprised the results are so different. I am wondering if you have any insight on this.
I used paragraph via the docker image (as updated with my pending pull-request) and the code below:
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz
bcftools view -f "PASS,." -O z -o HG002_SVs_Tier1_v0.6.DEL.vcf.gz -i 'SVTYPE == "DEL" & SVLEN < -40' HG002_SVs_Tier1_v0.6.vcf.gz
# paragraph complains about reference so manually change:
zcat HG002_SVs_Tier1_v0.6.DEL.vcf.gz | awk 'BEGIN{FS=OFS="\t"} ($0 ~ /^#/) { print } ($0 !~ /^#/ ) { $4="N"; $5 = "<DEL>"; print }' | bgzip -c > tmp
mv tmp HG002_SVs_Tier1_v0.6.DEL.vcf.gz
docker run -v $(pwd):/pwd -v /data/human:/data/human 5a75c4ae6ebc -m /pwd/manifest.txt -r /data/human/g1k_v37_decoy.fa --threads 4 -o /pwd/ -i /pwd/HG002_SVs_Tier1_v0.6.DEL.vcf.gz
wget ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.bed
truth_del=HG002_SVs_Tier1_v0.6.DEL.vcf.gz
ODIR=evaluate/
sizemax=15000000
sizemin=300
bed=HG002_SVs_Tier1_v0.6.bed
set -euo pipefail
rm -r $ODIR
tabix -f genotypes.vcf.gz
python ~/src/truvari/truvari.py --sizemax $sizemax -s $sizemin -S $((sizemin - 30)) -b $truth_del -c genotypes.vcf.gz -o $ODIR/ --passonly --pctsim=0 -r 20 --giabreport -f /data/human/g1k_v37_decoy.fa --no-ref --includebed $bed -O 0.6
cat $ODIR/summary.txt
zcat $truth_del | ./add_ci
svtyper -B /data/human/hg002.cram -T /data/human/g1k_v37_decoy.fa \
-i with-ci.vcf \
--max_ci_dist 0 \
-o svtyper.genotyped.vcf
perl -pi -e 's/""/"/' svtyper.genotyped.vcf
bgzip svtyper.genotyped.vcf
tabix -f svtyper.genotyped.vcf.gz
ODIR=evaluate-svtyper/
rm -r $ODIR
python ~/src/truvari/truvari.py --sizemax $sizemax -s $sizemin -S $((sizemin - 30)) -b $truth_del -c svtyper.genotyped.vcf.gz -o $ODIR/ --passonly --pctsim=0 -r 20 --giabreport -f /data/human/g1k_v37_decoy.fa --no-ref --includebed $bed -O 0.6
cat $ODIR/summary.txt
I am trying to use the idxdepth utility, as I'm running Paragraph on a large number of already produced cram files. The human reference used included HLA decoy sequences, which have * and : in their names (I did not choose the reference, nor can I change it at this point). I think idxdepth is failing due to the * or : in the names -- it seems to run fine then errors when it gets to the HLA sequences.
...
[2019-06-19 15:04:39.748] [idxdepth] [3725] [info] Thread 47113698678528 estimating depth for chrUn_JTFH01001997v1_decoy
[2019-06-19 15:04:39.752] [idxdepth] [3722] [info] Thread 47113692374784 done estimating depth for chrUn_JTFH01001976v1_decoy ; DP = 35.23 after 182.823 us
[2019-06-19 15:04:39.752] [idxdepth] [3722] [info] Thread 47113692374784 estimating depth for chrUn_JTFH01001998v1_decoy
[2019-06-19 15:04:39.774] [idxdepth] [3733] [info] Thread 47117047957248 done estimating depth for chrUn_JTFH01001980v1_decoy ; DP = 2.38 after 196.527 us
[2019-06-19 15:04:39.774] [idxdepth] [3733] [info] Thread 47117047957248 estimating depth for HLA-A*01:01:01:01
[2019-06-19 15:04:39.815] [idxdepth] [3736] [info] Thread 47117054260992 done estimating depth for chrUn_JTFH01001983v1_decoy ; DP = 0.01 after 233.495 us
[2019-06-19 15:04:39.815] [idxdepth] [3736] [info] Thread 47117054260992 estimating depth for HLA-A*01:01:01:02N
[2019-06-19 15:04:39.815] [idxdepth] [3729] [info] Thread 47113707083520 done estimating depth for chrUn_JTFH01001984v1_decoy ; DP = 0.02 after 230.673 us
[2019-06-19 15:04:39.815] [idxdepth] [3729] [info] Thread 47113707083520 estimating depth for HLA-A*01:01:38L
[2019-06-19 15:04:39.815] [idxdepth] [3730] [info] Thread 47113709184768 done estimating depth for chrUn_JTFH01001985v1_decoy ; DP = 1.47 after 229.729 us
[2019-06-19 15:04:39.815] [idxdepth] [3730] [info] Thread 47113709184768 estimating depth for HLA-A*01:02
[2019-06-19 15:04:39.815] [idxdepth] [3720] [info] Thread 47113688172288 done estimating depth for chrUn_JTFH01001982v1_decoy ; DP = 0.74 after 234.657 us
[2019-06-19 15:04:39.815] [idxdepth] [3720] [info] Thread 47113688172288 estimating depth for HLA-A*01:03
terminate called after throwing an instance of 'std::invalid_argument'
what(): stoll
Aborted
Hello,
I used the test-data
to run the multigrmpy.py
multigrmpy.py \
-i ./candidates.vcf \
-m ./samples.txt \
-r ./dummy.fa \
-o test
[E::idx_find_and_load] Could not retrieve index file for 'test/variants.vcf.gz'
finally, I can get the result file genotypes.vcf.gz
, does the error have any effect, or how to solve it?
Best wishes~
Hi @traxexx
I wonder how do you generate the candidate SVs that used as input for paragraph when you have many short-read samples and several representative long-read samples. In my mind, I will call SVs using both long-read and short-read samples and then merge them together. During merging, I think the breakpoints will become coarse. Does this affect the genotype results when I do not have a precise breakpoint?
Sincerely,
Zheng Zhuqing
The repository README recommends running multigrmpy.py independently for several samples when running an analysis at population scale.
My understanding is that multigrmpy.py first converts the input vcf to a set of .json files written to a temporary directory. These .json files are then used by the grmpy program to carry genotyping.
If my understanding is right, then the conversion step is conducted as many times as multigrmpy.py is launched, whereas we only really need conversion to happen once for a given vcf. This results in wasted computing time and a lot more temporary files than what is needed, which causes problems on my system because the large number of temporary files becomes hard to manage.
Therefore, I would like to know if there is a way that I could use the tools provided by Paragraph to first convert the vcf file to a set of .json files, and then use those as input for genotyping. I believe this would not be too complicated, but I can't figure out how to do this based on the information that is provided.
I am attempting to build Paragraph (not from the docker file). I am using centos7, python 3.6, gcc/g++ 6.4.0, and cmake 3.12.1. I have pointed to the correct boost installation, installed as instructed, by setting $BOOST_ROOT
and cmake seems to recognize this based on the version number it reports finding. Note that below I am also setting DCMAKE_INCLUDE_PATH
, as otherwise I get the error -- Could NOT find LibLZMA (missing: LIBLZMA_INCLUDE_DIR)
. The directory I include has lzma.h and an lzma folder, and including this allows cmake to find these.
export BOOST_ROOT=/home-4/[email protected]/lib/boost_1_65_0_install
cmake ../paragraph_v2.2 -DCMAKE_CXX_COMPILER=`which g++` -DCMAKE_C_COMPILER=`which gcc` -DBOOST_ROOT=$BOOST_ROOT -DCMAKE_SYSTEM_LIBRARY_PATH=/software/centos7/usr/lib64 -DCMAKE_INCLUDE_PATH=/software/centos7/usr/include
-- using compiler: g++ version 6.4.0
-- Found LibLZMA: /software/centos7/usr/include (found version "5.2.2")
Using included htslib
-- Configuring done
-- Generating done
-- Build files have been written to: /home-net/home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/htslib-build
[100%] Built target htslib
-- Configuring done
-- Generating done
-- Build files have been written to: /home-net/home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/googletest-build
Scanning dependencies of target googletest
[100%] Built target googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home-net/home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/graphtools-build
Scanning dependencies of target graphtools
[100%] Built target graphtools
-- Boost version: 1.65.0
-- Found the following Boost libraries:
-- program_options
-- filesystem
-- system
-- Configuring done
-- Generating done
-- Build files have been written to: /home-net/home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/spdlog-build
[100%] Built target spdlog
-- Boost version: 1.65.0
-- Found the following Boost libraries:
-- iostreams
-- program_options
-- filesystem
-- system
-- regex
-- Configuring done
-- Generating done
-- Build files have been written to: /home-4/[email protected]/bin/packages/paragraph_v2.2_build
However, when I do make
, I get the following errors (this is just a snippet -- the many errors are all similar and seem to me like perhaps the issue is that they are picking things up from my included /software/centos7/usr/include/
folder that they should not be, including an older boost installation located there):
[ 2%] Built target external
[ 3%] Building CXX object src/c++/lib/CMakeFiles/grmpy_common.dir/common/Alignment.cpp.o
In file included from /software/centos7/usr/include/boost/assert.hpp:50:0,
from /software/centos7/usr/include/boost/system/error_code.hpp:16,
from /software/centos7/usr/include/boost/filesystem/path_traits.hpp:23,
from /software/centos7/usr/include/boost/filesystem/path.hpp:25,
from /software/centos7/usr/include/boost/filesystem.hpp:16,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Error.hh:176,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/BCFHelpers.hh:58,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/variant/RefVar.hh:46,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Alignment.hh:43,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/lib/common/Alignment.cpp:36:
/software/centos7/usr/include/assert.h:68:13: error: redundant redeclaration of ‘void __assert_fail(const char*, const char*, unsigned int, const char*)’ in same scope [-Werror=redundant-decls]
extern void __assert_fail (const char *__assertion, const char *__file,
^~~~~~~~~~~~~
In file included from /software/apps/compilers/gcc/6.4.0/include/c++/6.4.0/cassert:44:0,
from /home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/spdlog-src/include/spdlog/fmt/bundled/format.h:31,
from /home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/spdlog-src/include/spdlog/fmt/fmt.h:21,
from /home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/spdlog-src/include/spdlog/fmt/ostr.h:11,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Error.hh:44,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/BCFHelpers.hh:58,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/variant/RefVar.hh:46,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Alignment.hh:43,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/lib/common/Alignment.cpp:36:
/software/centos7/usr/include/assert.h:68:13: note: previous declaration of ‘void __assert_fail(const char*, const char*, unsigned int, const char*)’
extern void __assert_fail (const char *__assertion, const char *__file,
^~~~~~~~~~~~~
In file included from /software/centos7/usr/include/boost/assert.hpp:50:0,
from /software/centos7/usr/include/boost/system/error_code.hpp:16,
from /software/centos7/usr/include/boost/filesystem/path_traits.hpp:23,
from /software/centos7/usr/include/boost/filesystem/path.hpp:25,
from /software/centos7/usr/include/boost/filesystem.hpp:16,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Error.hh:176,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/BCFHelpers.hh:58,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/variant/RefVar.hh:46,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Alignment.hh:43,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/lib/common/Alignment.cpp:36:
/software/centos7/usr/include/assert.h:73:13: error: redundant redeclaration of ‘void __assert_perror_fail(int, const char*, unsigned int, const char*)’ in same scope [-Werror=redundant-decls]
extern void __assert_perror_fail (int __errnum, const char *__file,
^~~~~~~~~~~~~~~~~~~~
In file included from /software/apps/compilers/gcc/6.4.0/include/c++/6.4.0/cassert:44:0,
from /home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/spdlog-src/include/spdlog/fmt/bundled/format.h:31,
from /home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/spdlog-src/include/spdlog/fmt/fmt.h:21,
from /home-4/[email protected]/bin/packages/paragraph_v2.2_build/external/spdlog-src/include/spdlog/fmt/ostr.h:11,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Error.hh:44,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/BCFHelpers.hh:58,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/variant/RefVar.hh:46,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/include/common/Alignment.hh:43,
from /home-4/[email protected]/bin/packages/paragraph_v2.2/src/c++/lib/common/Alignment.cpp:36:
/software/centos7/usr/include/assert.h:73:13: note: previous declaration of ‘void __assert_perror_fail(int, const char*, unsigned int, const char*)’
...
/software/centos7/usr/include/assert.h:80:13: note: previous declaration of ‘void __assert(const char*, const char*, int)’
extern void __assert (const char *__assertion, const char *__file, int __line)
^~~~~~~~
cc1plus: all warnings being treated as errors
make[2]: *** [src/c++/lib/CMakeFiles/grmpy_common.dir/common/Alignment.cpp.o] Error 1
make[1]: *** [src/c++/lib/CMakeFiles/grmpy_common.dir/all] Error 2
make: *** [all] Error 2
Documentation states:
The default entry point is the multigrmpy.py script
However the Dockerfile contains
CMD ["/opt/paragraph/bin/runGraphTyping.py", "-h"]
Dear paragraph developers,
I am trying to compile paragraph on Ubuntu 14.04 using g++ 7.3.0 and cmake 3.14.0.
When I run:
cd /mnt/cifs01/simone/software/paragraph-build
/mnt/cifs01/simone/software/cmake-3.14.0/bin/cmake ../paragraph
make
I get error message:
[ 22%] Building CXX object src/c++/lib/CMakeFiles/grmpy_common.dir/grm/GraphAligner.cpp.o
In file included from /home/simone/home_disk/software/paragraph/external/gssw/gssw.h:19:0,
from /home/simone/home_disk/software/paragraph/src/c++/lib/grm/GraphAligner.cpp:30:
/usr/lib/gcc/x86_64-linux-gnu/4.8/include/smmintrin.h:31:3: error: #error "SSE4.1 instruction set not enabled"
# error "SSE4.1 instruction set not enabled"
^
In file included from /home/simone/home_disk/software/paragraph/src/c++/lib/grm/GraphAligner.cpp:30:0:
/home/simone/home_disk/software/paragraph/external/gssw/gssw.h:67:5: error: ‘__m128i’ does not name a type
__m128i* pvE;
^
/home/simone/home_disk/software/paragraph/external/gssw/gssw.h:68:5: error: ‘__m128i’ does not name a type
__m128i* pvHStore;
^
/home/simone/home_disk/software/paragraph/external/gssw/gssw.h:138:2: error: ‘__m128i’ does not name a type
__m128i* profile_byte; // 0: none
^
/home/simone/home_disk/software/paragraph/external/gssw/gssw.h:140:2: error: ‘__m128i’ does not name a type
__m128i* profile_word; // 0: none
^
make[2]: *** [src/c++/lib/CMakeFiles/grmpy_common.dir/build.make:414: src/c++/lib/CMakeFiles/grmpy_common.dir/grm/GraphAligner.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:486: src/c++/lib/CMakeFiles/grmpy_common.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
I saw a post on Github (vgteam/vg#99) where a similar error was solved setting CXXFLAGS environment variable, so I tried it out.
export CXXFLAGS=-msse4.1
/mnt/cifs01/simone/software/cmake-3.14.0/bin/cmake ../paragraph
make
That error seems to be partially solved, but then it stops again:
[ 26%] Building CXX object src/c++/lib/CMakeFiles/grmpy_common.dir/grmpy/AlignSamples.cpp.o
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp: In function ‘void grmpy::writeAlignments(Json::Value&, const grmpy::Parameters&, const paragraph::Parameters&, const string&, genotyping::SampleInfo&)’:
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:76:105: error: no matching function for call to ‘regex_replace(const string&, const regex&, const char [2])’
const std::string safe_sample_name = std::regex_replace(sample.sample_name(), unsafe_characters, "_");
^
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:76:105: note: candidates are:
In file included from /usr/include/c++/4.8/regex:62:0,
from /home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:49:
/usr/include/c++/4.8/bits/regex.h:2162:5: note: template<class _Out_iter, class _Bi_iter, class _Rx_traits, class _Ch_type> _Out_iter std::regex_replace(_Out_iter, _Bi_iter, _Bi_iter, const std::basic_regex<_Ch_type, _Rx_traits>&, const std::basic_string<_Ch_type>&, std::regex_constants::match_flag_type)
regex_replace(_Out_iter __out, _Bi_iter __first, _Bi_iter __last,
^
/usr/include/c++/4.8/bits/regex.h:2162:5: note: template argument deduction/substitution failed:
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:76:105: note: deduced conflicting types for parameter ‘_Bi_iter’ (‘std::basic_regex<char>’ and ‘const char*’)
const std::string safe_sample_name = std::regex_replace(sample.sample_name(), unsafe_characters, "_");
^
In file included from /usr/include/c++/4.8/regex:62:0,
from /home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:49:
/usr/include/c++/4.8/bits/regex.h:2182:5: note: template<class _Rx_traits, class _Ch_type> std::basic_string<_Ch_type> std::regex_replace(const std::basic_string<_Ch_type>&, const std::basic_regex<_Ch_type, _Rx_traits>&, const std::basic_string<_Ch_type>&, std::regex_constants::match_flag_type)
regex_replace(const basic_string<_Ch_type>& __s,
^
/usr/include/c++/4.8/bits/regex.h:2182:5: note: template argument deduction/substitution failed:
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:76:105: note: mismatched types ‘const std::basic_string<_Ch_type>’ and ‘const char [2]’
const std::string safe_sample_name = std::regex_replace(sample.sample_name(), unsafe_characters, "_");
^
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:82:31: error: no matching function for call to ‘regex_replace(boost::iterators::iterator_value<boost::iterators::transform_iterator<boost::range_detail::default_constructible_unary_fn_wrapper<grmpy::writeAlignments(Json::Value&, const grmpy::Parameters&, const paragraph::Parameters&, const string&, genotyping::SampleInfo&)::__lambda3, std::basic_string<char> >, std::_List_const_iterator<common::Region>, boost::iterators::use_default, boost::iterators::use_default> >::type, const regex&, const char [2])’
unsafe_characters, "_");
^
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:82:31: note: candidates are:
In file included from /usr/include/c++/4.8/regex:62:0,
from /home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:49:
/usr/include/c++/4.8/bits/regex.h:2162:5: note: template<class _Out_iter, class _Bi_iter, class _Rx_traits, class _Ch_type> _Out_iter std::regex_replace(_Out_iter, _Bi_iter, _Bi_iter, const std::basic_regex<_Ch_type, _Rx_traits>&, const std::basic_string<_Ch_type>&, std::regex_constants::match_flag_type)
regex_replace(_Out_iter __out, _Bi_iter __first, _Bi_iter __last,
^
/usr/include/c++/4.8/bits/regex.h:2162:5: note: template argument deduction/substitution failed:
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:82:31: note: deduced conflicting types for parameter ‘_Bi_iter’ (‘std::basic_regex<char>’ and ‘const char*’)
unsafe_characters, "_");
^
In file included from /usr/include/c++/4.8/regex:62:0,
from /home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:49:
/usr/include/c++/4.8/bits/regex.h:2182:5: note: template<class _Rx_traits, class _Ch_type> std::basic_string<_Ch_type> std::regex_replace(const std::basic_string<_Ch_type>&, const std::basic_regex<_Ch_type, _Rx_traits>&, const std::basic_string<_Ch_type>&, std::regex_constants::match_flag_type)
regex_replace(const basic_string<_Ch_type>& __s,
^
/usr/include/c++/4.8/bits/regex.h:2182:5: note: template argument deduction/substitution failed:
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:82:31: note: mismatched types ‘const std::basic_string<_Ch_type>’ and ‘const char [2]’
unsafe_characters, "_");
^
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:99:90: error: no matching function for call to ‘regex_replace(std::string&, const regex&, const char [2])’
const std::string safe_graph_id = std::regex_replace(graph_id, unsafe_characters, "_");
^
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:99:90: note: candidates are:
In file included from /usr/include/c++/4.8/regex:62:0,
from /home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:49:
/usr/include/c++/4.8/bits/regex.h:2162:5: note: template<class _Out_iter, class _Bi_iter, class _Rx_traits, class _Ch_type> _Out_iter std::regex_replace(_Out_iter, _Bi_iter, _Bi_iter, const std::basic_regex<_Ch_type, _Rx_traits>&, const std::basic_string<_Ch_type>&, std::regex_constants::match_flag_type)
regex_replace(_Out_iter __out, _Bi_iter __first, _Bi_iter __last,
^
/usr/include/c++/4.8/bits/regex.h:2162:5: note: template argument deduction/substitution failed:
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:99:90: note: deduced conflicting types for parameter ‘_Bi_iter’ (‘std::basic_regex<char>’ and ‘const char*’)
const std::string safe_graph_id = std::regex_replace(graph_id, unsafe_characters, "_");
^
In file included from /usr/include/c++/4.8/regex:62:0,
from /home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:49:
/usr/include/c++/4.8/bits/regex.h:2182:5: note: template<class _Rx_traits, class _Ch_type> std::basic_string<_Ch_type> std::regex_replace(const std::basic_string<_Ch_type>&, const std::basic_regex<_Ch_type, _Rx_traits>&, const std::basic_string<_Ch_type>&, std::regex_constants::match_flag_type)
regex_replace(const basic_string<_Ch_type>& __s,
^
/usr/include/c++/4.8/bits/regex.h:2182:5: note: template argument deduction/substitution failed:
/home/simone/home_disk/software/paragraph/src/c++/lib/grmpy/AlignSamples.cpp:99:90: note: mismatched types ‘const std::basic_string<_Ch_type>’ and ‘const char [2]’
const std::string safe_graph_id = std::regex_replace(graph_id, unsafe_characters, "_");
^
make[2]: *** [src/c++/lib/CMakeFiles/grmpy_common.dir/build.make:492: src/c++/lib/CMakeFiles/grmpy_common.dir/grmpy/AlignSamples.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:486: src/c++/lib/CMakeFiles/grmpy_common.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
I know Ubuntu 14.04 is not amongst the distributions you have tested paragraph on.
Do you have any ideas about how I could solve it? Thanks.
Simone
Hi,
I have downloaded the lastest version, and use the configure file to install. I was a linux user and did not have root authority. It showed Configuring incomplete, errors occurred.
Is that normal?
Cheers
Hello folks,
I think your required version of python is too old. In the README you write
Python 3.4+ is required.
but on many places you use fstrings, which are a feature of python 3.6+.
We (with @ptranvan) tired to remove all the fstrings
and run it on python 3.5
, but we got a super-crazy-long error on the test example you provide. Selected lines of the error log:
2019-06-06 16:15:56,155 ERROR Exception when running vcf2paragraph on /tmp/tmppbauhqg4.vcf.gz
2019-06-06 16:15:56,156 ERROR Exception when running vcf2paragraph on /tmp/tmpk58pllu3.vcf.gz
2019-06-06 16:15:56,191 ERROR Traceback (most recent call last):
2
...
2019-06-06 16:15:56,264 ERROR VCF to JSON conversion failed.
...
Traceback (most recent call last):
File "bin/multigrmpy.py", line 353, in <module>
main()
File "bin/multigrmpy.py", line 349, in main
run(args)
File "bin/multigrmpy.py", line 261, in run
graph_files = load_graph_description(args)
File "bin/multigrmpy.py", line 52, in load_graph_description
header, records, event_list = convert_vcf_to_json(args, alt_paths=True)
File "/stn4/ptranvan/Software/paragraph/paragraph-tools-build/lib/python3/grm/vcf2paragraph/__init__.py", line 156, in convert_vcf_to_json
variants = pool.map(run_vcf2paragraph, zip(to_process, itertools.repeat(params)))
File "/software/lib64/python3.5/multiprocessing/pool.py", line 260, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/software/lib64/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
AssertionError: ref-{refSpan}
I tried to copy the relevant lines. Also, it could be that we broke something when we were removing that bloody fstrings. Regardless of which case it is, I suppose you should either remove fstrings and test paragraph
on python 3.4, or update readme to a newer version of python.
Cheers :-)
Dear @traxexx
I ran following command to genotype the candidate deletion variants. Here the P1_DEL.vcf was generated by Sniffles
, however the program exited with "Exception: Different padding base for REF and ALT at 1:233140". Maybe I need to some other custom scripts to convert the VCF file to be compatible with the paragraph.
multigrmpy.py -i P1_DEL.vcf -m manifest -o P1_DEL.genotype -r reference.fa
#CHROM POS ID REF ALT QUAL FILTER INFO
1 233140 . TGTCCTGTGTCCGTGTCCCATGGTGTCCGTGTCTCAGTCTGTCCTGTGTCCGGTCCCGTGTCCGTGTCCCGTGTCCCACGTCCATGTCCCGTGTCCGTGTCTCATGTCCGGGTCCCGTGTCCGTGTCCCACGTCCATGTCCCGTGTCCGTGTCTCATGTCTGGGTCCTGTGTCCATGTCCCATGTCCATGTCCCGTGTCCGTGTCTCATGTCTGGGTCCTGTGTCCGTGTCCCATGTCCATGTCCCGTGTCCGTGTCTCATGTCCGCGTCCGTGTCCATGTCCATGTCCGTGTCCGTGTCTCATGTCCGGTCCTGTCCGGTCCCCTGTCCGTGTCCCGTGTCCGTGTCTCATGTCCGTGTCTCATGTCCGGGTCCGTTCCGTGTCCCTGTCCATGTCCCGTGTCCGTGTCTCATGTCTGGGTCCTGTGTCCGTGTCCCGTCCATGTCCCGTGTCCGTGTCCTGTCCGGTCCTGTCCGTGTCCGTGTCCATGTCCCGTGTCCGTGTCCGTGTCCATGTCCCGTGTCCGTGTCTCATGTCCCG N 0 . PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=1;END=233681;ZMW=21;STD_quant_start=0;STD_quant_stop=0;Kurtosis_quant_start=7;Kurtosis_quant_stop=2.0032;SVTYPE=DEL;SUPTYPE=AL;SVLEN=-541;STRANDS=+-;STRANDS2=9,12,9,12;RE=21;REF_strand=13,14;AF=0.4375;MERGED_IDS=1,svim.DEL.5;NUM_JOINED_SVS=2;STDDEV_POS=0,2
1 233793 . CACGTCCATGTCCCGTGTCCGTGTCTCATGTCCGGGTCCTGTGTCCGGTCCGTGTCCCGTGTCCGTGTCCCACGTCCATGTCCCGTGTCCGTGTCTCATGTCCGGTCCCGTGTCCGTGTCCCACGTCCATGTCCCGTGTCCGTGTCTCATGTCTCCGTGTCCTGTGTCCATGTCCGGTCCG N 0 . PRECISE;SVMETHOD=Snifflesv1.0.11;CHR2=1;END=233868;ZMW=26;STD_quant_start=0;STD_quant_stop=0;Kurtosis_quant_start=10;Kurtosis_quant_stop=10;SVTYPE=DEL;SUPTYPE=AL;SVLEN=-75;STRANDS=+-;STRANDS2=13,13,13,13;RE=26;REF_strand=0,0;AF=1;MERGED_IDS=2,svim.DEL.7;NUM_JOINED_SVS=2;STDDEV_POS=0,0
sincerely,
Zheng Zhuqing
In rare cases Paragraph returns a genotype that does not match the VCF spec; returning a single '.' genotype on non-sex chromosomes. For example I have the following genotypes for 2 samples on chr3. My interpretation is that the second sample should be './.' since it is in a diploid region of the genome.
0/0:40:....:40,0:20,0:20,0:0,124,599 .:0:NO_VALID_GT,UNMATCHED:0,0:0,0:0,0:.,.,.
Hi,
I am just starting to investage using Paragraph to genotype some SV within a number of poulations. I shall be doing a lot of alignments of short reads to my references. Is there a preferred aligner to use to generate the bam file for input into paragraph?
thanks.
Hi,
Would the latest docker image be updated? The docker pull gives the following error:
Docker pull fails, Error: /usr/bin/python3: can't open file
'/opt/paragraph/bin/runGraphTyping.py': [Errno 2] No such file or directory
Thanks!
for example,
cmake ../paragraph-tools
as in the README has no target for paragraph-tools. Should this be: cmake ../
?
when I fix that, I see:
/home/brentp/src/paragraph/build-paragraph/external/graphtools-src/src/graphIO/../../external/include/nlohmann/json.hpp:8678:43: error: logical ‘and’ of mutually exclusive tests is always false [-Werror=logical-op]
const bool is_negative = (x <= 0) and (x != 0); // see issue #755
if I manually edit that file, I see:
[ 67%] Linking CXX executable ../../../bin/idxdepth
/usr/bin/ld: cannot find -lBoost::boost
collect2: error: ld returned 1 exit status
but I have build boost as described and exported BOOST_ROOT
:
ls $BOOST_ROOT/lib
libboost_atomic.a libboost_exception.a libboost_log_setup.a libboost_math_tr1l.a libboost_signals.a libboost_test_exec_monitor.a libboost_wserialization.a
libboost_chrono.a libboost_filesystem.a libboost_math_c99.a libboost_prg_exec_monitor.a libboost_stacktrace_addr2line.a libboost_thread.a
libboost_container.a libboost_graph.a libboost_math_c99f.a libboost_program_options.a libboost_stacktrace_backtrace.a libboost_timer.a
libboost_context.a libboost_iostreams.a libboost_math_c99l.a libboost_random.a libboost_stacktrace_basic.a libboost_type_erasure.a
libboost_coroutine.a libboost_locale.a libboost_math_tr1.a libboost_regex.a libboost_stacktrace_noop.a libboost_unit_test_framework.a
libboost_date_time.a libboost_log.a libboost_math_tr1f.a libboost_serialization.a libboost_system.a libboost_wave.a
the docker build .
from this repo also fails due to 404. Any ideas?
Hi, I would be interesting in trialling this software. Has it been released yet ?
Seems I would need a password to download. Thanks.
git clone https://github.com/Illumina/paragraph-tools.git
Cloning into 'paragraph-tools'...
Username for 'https://github.com': cxxxxxx
Password for 'https://xxxxxxxxxxxxxxxxxxxxxx':
remote: Repository not found.
fatal: repository 'https://github.com/Illumina/paragraph-tools.git/' not found
Hi,
I'm trying paragraph for genotyping DUP & TDUP with the following command:
python3 ~/miniconda3/pkgs/paragraph-2.3-h8908b6f_0/bin/multigrmpy.py -i TDUP.vcf -m samples.txt -r ~/reference/genome.fa -o TDUP
Here are some of the contents in samples.txt & TDUP.vcf file:
But 75% of genotypes are missing when genotyping DUP & TDUP.
Could you give me some advice? Thanks!
Zhiliang
I'm having trouble with install; on the htslib build step; I'm using Linux CentOS 6.9 and gcc/g++ 5.1.0.
cmake (I'm using version 3.5.0) seems to find my installed lzma library files just fine:
-- Found ZLIB: /home-4/[email protected]/bin/packages/miniconda2/lib/libz.so (found version "1.2.11")
-- Found BZip2: /home-4/[email protected]/bin/packages/miniconda2/lib/libbz2.so (found version "1.0.6")
-- Looking for BZ2_bzCompressInit
-- Looking for BZ2_bzCompressInit - found
-- Looking for lzma_auto_decoder in /home-4/[email protected]/bin/packages/miniconda2/lib/liblzma.so
-- Looking for lzma_auto_decoder in /home-4/[email protected]/bin/packages/miniconda2/lib/liblzma.so - found
-- Looking for lzma_easy_encoder in /home-4/[email protected]/bin/packages/miniconda2/lib/liblzma.so
-- Looking for lzma_easy_encoder in /home-4/[email protected]/bin/packages/miniconda2/lib/liblzma.so - found
-- Looking for lzma_lzma_preset in /home-4/[email protected]/bin/packages/miniconda2/lib/liblzma.so
-- Looking for lzma_lzma_preset in /home-4/[email protected]/bin/packages/miniconda2/lib/liblzma.so - found
-- Found LibLZMA: /home-4/[email protected]/bin/packages/miniconda2/include (found version "5.2.3")
However, it is then unable to find lzma.h.
[ 75%] Performing build step for 'htslib'
cram/cram_io.c:61:18: fatal error: lzma.h: No such file or directory
compilation terminated.
make[3]: *** [cram/cram_io.o] Error 1
gmake[2]: *** [htslib-prefix/src/htslib-stamp/htslib-build] Error 2
gmake[1]: *** [CMakeFiles/htslib.dir/all] Error 2
gmake: *** [all] Error 2
CMake Error at src/cmake/GetHtslib.cmake:37 (message):
Build step for htslib failed: 2
Call Stack (most recent call first):
CMakeLists.txt:33 (include)-- Configuring incomplete, errors occurred!
Pointing DCMAKE_INCLUDE_PATH and DCMAKE_SYSTEM_LIBRARAY_PATH to the correct locations seems to have no effect. Manually modifying the include <lzma.h> in cram_io.c to point to the correct location does fix the immediate problem, but then the following step just fails to find the lzma library.
/usr/bin/ld: cannot find -llzma
collect2: error: ld returned 1 exit status
make[3]: *** [libhts.so] Error 1
gmake[2]: *** [htslib-prefix/src/htslib-stamp/htslib-build] Error 2
gmake[1]: *** [CMakeFiles/htslib.dir/all] Error 2
gmake: *** [all] Error 2
CMake Error at src/cmake/GetHtslib.cmake:37 (message):
Build step for htslib failed: 2
Call Stack (most recent call first):
CMakeLists.txt:33 (include)
Any suggestions on how to get cmake to recognize these installed libraries? I can't figure out why it seems to find them in one step, and then not link to them in subsequent compilation steps.
When running paragraph on a test vcf with just one variant row, this error is triggered. Any suggestions?
2020-03-05 19:38:20,888 ERROR Traceback (most recent call last):
2020-03-05 19:38:20,889 ERROR File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 340, in run vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names)
2020-03-05 19:38:20,889 ERROR File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfupdate.py", line 218, in update_vcf_from_grmpy record.samples[sample][k] = v
2020-03-05 19:38:20,889 ERROR File "pysam/libcbcf.pyx", line 3455, in pysam.libcbcf.VariantRecordSample.__setitem__
2020-03-05 19:38:20,889 ERROR File "pysam/libcbcf.pyx", line 859, in pysam.libcbcf.bcf_format_set_value
2020-03-05 19:38:20,890 ERROR File "pysam/libcbcf.pyx", line 583, in pysam.libcbcf.bcf_check_values
2020-03-05 19:38:20,890 ERROR TypeError: values expected to be 3-tuple, given len=1
Traceback (most recent call last):
File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 353, in <module>
main()
File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 349, in main
run(args)
File "/share/pkg.7/paragraph/2.4a/install/bin/multigrmpy.py", line 340, in run
vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names)
File "/share/pkg.7/paragraph/2.4a/install/lib/python3/grm/vcfgraph/vcfupdate.py", line 218, in update_vcf_from_grmpy
record.samples[sample][k] = v
File "pysam/libcbcf.pyx", line 3455, in pysam.libcbcf.VariantRecordSample.__setitem__
File "pysam/libcbcf.pyx", line 859, in pysam.libcbcf.bcf_format_set_value
File "pysam/libcbcf.pyx", line 583, in pysam.libcbcf.bcf_check_values
TypeError: values expected to be 3-tuple, given len=1
Hi, I'm trying paragraph for genotyping with the following command:
python ~/benchmark/tools/paragraph/paragraph-tools-build/bin/multigrmpy.py -i ~/benchmark/all_sv_grc37.vcf -m samples.txt -r ~/dataset/human_g1k_v37_gatk.fasta -o 50x
but I receive the following error:
Traceback (most recent call last):
File "/home/asoylev/benchmark/tools/paragraph/paragraph-tools-build/bin/multigrmpy.py", line 34, in
from grm.vcf2paragraph import convert_vcf_to_json
File "/mnt/compgen/homes/asoylev/benchmark/tools/paragraph/paragraph-tools-build/lib/python3/grm/vcf2paragraph/init.py", line 32, in
from grm.vcfgraph import VCFGraph, NoVCFRecordsException
File "/mnt/compgen/homes/asoylev/benchmark/tools/paragraph/paragraph-tools-build/lib/python3/grm/vcfgraph/init.py", line 21, in
from grm.vcfgraph.vcfgraph import VCFGraph, NoVCFRecordsException
File "/mnt/compgen/homes/asoylev/benchmark/tools/paragraph/paragraph-tools-build/lib/python3/grm/vcfgraph/vcfgraph.py", line 178
f"Missing key {ins_info_key} for at {self.chrom}:{vcf.pos}; ")
Below is an INS line in the input VCF:
1 10028610 nssv14474350 A INS:ME:ALU . . DBVARID;SVTYPE=INS;END=10028610;SVLEN=140;EXPERIMENT=1;SAMPLE=HG00733;REGIO
NID=nsv3326290;SEQ=aggtcaggagtttgagaccagcctggccaacgtggtgaaaccccgactctactaaaaaaaaaagaacaaaaattaggcctggcgcggtggctcacgcctgtaatcccagcactttgggaggccgaggcgggcagat
cacG;Eichler
any idea?
Thanks,
Arda
Hi All,
I try to run paragraph to my test dataset. but got an error below:
$ python3 ../../../Tools/paragraph/bin/multigrmpy.py -i ../pbsv_sample_Bonobo_sv.vcf.gz -m sample.txt -r ../ref.fa -o test &
[1] 39497
$ 2020-02-04 16:37:34,250 ERROR VCF to JSON conversion failed.
2020-02-04 16:37:34,303 ERROR Traceback (most recent call last):
2020-02-04 16:37:34,303 ERROR File "../../../Tools/paragraph/bin/multigrmpy.py", line 52, in load_graph_description header, records, event_list = convert_vcf_to_json(args, alt_paths=True)
2020-02-04 16:37:34,303 ERROR File "/net/eichler/vol26/projects/primate_sv/nobackups/Tools/paragraph/lib/python3/grm/vcf2paragraph/init.py", line 133, in convert_vcf_to_json header, records, block_ids = parse_vcf_lines(args.input, args.read_length, args.split_type)
2020-02-04 16:37:34,304 ERROR File "/net/eichler/vol26/projects/primate_sv/nobackups/Tools/paragraph/lib/python3/grm/vcf2paragraph/init.py", line 209, in parse_vcf_lines raise Exception("Distance between vcf position and chrom start is smaller than read length.")
2020-02-04 16:37:34,304 ERROR Exception: Distance between vcf position and chrom start is smaller than read length.
2020-02-04 16:37:34,305 ERROR Traceback (most recent call last):
2020-02-04 16:37:34,305 ERROR File "../../../Tools/paragraph/bin/multigrmpy.py", line 261, in run graph_files = load_graph_description(args)
2020-02-04 16:37:34,305 ERROR File "../../../Tools/paragraph/bin/multigrmpy.py", line 52, in load_graph_description header, records, event_list = convert_vcf_to_json(args, alt_paths=True)
2020-02-04 16:37:34,305 ERROR File "/net/eichler/vol26/projects/primate_sv/nobackups/Tools/paragraph/lib/python3/grm/vcf2paragraph/init.py", line 133, in convert_vcf_to_json header, records, block_ids = parse_vcf_lines(args.input, args.read_length, args.split_type)
2020-02-04 16:37:34,306 ERROR File "/net/eichler/vol26/projects/primate_sv/nobackups/Tools/paragraph/lib/python3/grm/vcf2paragraph/init.py", line 209, in parse_vcf_lines raise Exception("Distance between vcf position and chrom start is smaller than read length.")
2020-02-04 16:37:34,306 ERROR Exception: Distance between vcf position and chrom start is smaller than read length.
Traceback (most recent call last):
File "../../../Tools/paragraph/bin/multigrmpy.py", line 353, in
main()
File "../../../Tools/paragraph/bin/multigrmpy.py", line 349, in main
run(args)
File "../../../Tools/paragraph/bin/multigrmpy.py", line 261, in run
graph_files = load_graph_description(args)
File "../../../Tools/paragraph/bin/multigrmpy.py", line 52, in load_graph_description
header, records, event_list = convert_vcf_to_json(args, alt_paths=True)
File "/net/eichler/vol26/projects/primate_sv/nobackups/Tools/paragraph/lib/python3/grm/vcf2paragraph/init.py", line 133, in convert_vcf_to_json
header, records, block_ids = parse_vcf_lines(args.input, args.read_length, args.split_type)
File "/net/eichler/vol26/projects/primate_sv/nobackups/Tools/paragraph/lib/python3/grm/vcf2paragraph/init.py", line 209, in parse_vcf_lines
raise Exception("Distance between vcf position and chrom start is smaller than read length.")
Exception: Distance between vcf position and chrom start is smaller than read length.
Here is my manifest file:
$ cat sample.txt
id path idxdepth
bonobo_10 aln_realigned_reads.bam aln_realigned_reads.bam.json
$ ls aln_realigned_reads.bam
aln_realigned_reads.bam
$ ls aln_realigned_reads.bam.json
aln_realigned_reads.bam.json
Do you have any idea about it?
Thank you so much.
Best,
Yafei
Hi @traxexx
Thank you for this nice tool. I tried to run following command to generate the general statistics of the BAM file, but the program exited. Also, the warning is strange as the BAM file was generated by mapping to the reference.fa which was passed as an argument to option -r
. Moreover, can we add an option to filter out the reads with low mapping quality? Thank you.
idxdepth -b $input.sort.dedup.bam --bam-index $input.sort.dedup.bai -r reference.fa --autosome-regex '[1-9][0-9]?' --sex-chromosome-regex '[XY]?' --threads 1 -o $input --log-level info
[2020-07-13 19:44:55.015] [idxdepth] [7450] [info] BAM: $input.sort.dedup.bam
[2020-07-13 19:44:55.023] [idxdepth] [7450] [info] Reference: reference.fa
[2020-07-13 19:44:55.023] [idxdepth] [7450] [info] Output path: $input
[2020-07-13 19:44:55.202] [idxdepth] [7450] [warning] BAM header only has a subset of the reference chromosomes -- please make sure they match!
[2020-07-13 19:44:55.209] [idxdepth] [7450] [critical] Assertion failed: index
Sincerely,
Zheng Zhuqing
I realigned the 300x Genome in a Bottle AJ Trio samples with bwa since the Illumina paired-end alignments they host were done with novoalign (booo).
When running idxdepth for the samples I get
in the "depth" entries
However using bedtools genome coverage the values are
which is closer to the "advertised" coverage from GiaB.
Hi,
Some SV callers can identify unresolved insertions and add the partially assembled insertions around the breakpoint using the LEFT_SVINSSEQ and RIGHT_SVINSSEQ tags in the VCF. paragraph doesn't seem to able to genotype such insertions, but since the algorithm uses reads around the breakpoint it seems like it should be able to do so. Could I request such a feature in the next release?
Thanks,
Mo
Not so much an issue as a suggestion, but it would be great to have some sort of error message in the following cases regarding references not matching, all of which I have encountered and don't give the most informative errors:
If the reference for the short read bam and vcf use differing notation (eg "chr1" vs "1"), it'd be great if this could either be handled by Paragraph, or if Paragraph could do an initial check and report an error to the user.
If the reference used in multigrmpy.py with -r doesn't match the reference sequences from the header, it yields a very uninformative error of "subprocess.CalledProcessError: grmpy --response-file [tempFile] returned non-zero exit status 1". Having a check which tells the user that the reference used with -r does not match would be extremely helpful; it took me quite a while to figure out I was accidentally using the wrong reference (which used "1" instead of "chr1" etc).
Thanks for your contribution. I found the python script (convertManta2Paragraph_compatible_vcf.py) you uploaded ommitted the INSs whose INFO fields contain RIGHT_SVINSSEQ and LEFT_SVINSSEQ. Maybe paragraph can not genotype these variants so you ommitted them. Although the short reads mapping can not give us enough information about the complete inserted sequences, I have checked some insertions in IGV and found some with above two INFO fileds are actually make sense.
To @traxexx
It's difficult to assemble the complete inserted sequences (based on mapping to reference genome) when the INSs are longer than the read lengths. Thus, Manta reported the RIGHT_SVINSSEQ and LEFT_SVINSSEQ in these INSs. I wonder whether paragraph can handle these SVs properly.
Moreover, I found that the deviations of breakpoint, maybe over 100bp or even large as I used SURVIVOR to merge all individual VCFs into one population VCF using a maximum allowed distance of 1kb measured pairwise between breakpoints (begin1 vs begin2, end1 vs end2), affect the genotype results greatly. Do you have any suggestions about the discovery of population SVs before genotyping?
Sincerely,
Zheng Zhuqing
The example has effectively this:
python3 bin/multigrmpy.py -i $sites_vcf \
-m $manifest \
-r $reference_fasta \
-o $out_dir
if $manifest has hundreds of samples, can I genotype each sample seperately--i.e. will paragraph give different results if I split by sample (but the sites_vcf is the same?)
or should parallelization only be by site?
also, what does paragraph
do with BND elements?
I have noted a few problems with the way temporary files are handled by multigrmpy.py:
1- vcf.gz files are still written to /tmp or /scratch even when the option --scratch-dir is explicitly set to another directory (the .json files are written to that directory, but not the .vcf.gz ones)
2- The index files of the .vcf.gz files (.vcf.gz.csi files) are not cleaned from the temporary directories, even when multigrmpy.py exits successfully
3- The .json files are also not cleaned from the temporary directory after running multigrmpy.py
I assume this behavior is not the one expected from the program. In my case, I need to clean up the temporary directories after each run, but this prevents me from running several multigrmpy instances in parallel so as not to delete files that are used by another instance.
I saw that there has been an issue raised on this topic in the past and it has been closed, however the behavior of the program has not changed since.
Hi,
I am trying to use the idxdepth to calculate the depth for the manifest file, but it always gives me a warning:
[warning] BAM header only has a subset of the reference chromosomes -- please make sure they match!
The issues falls on many datasets that I tried. I use bwa for alignment, and gatk for read groups adding/duplicates removing.
Any hint for how might this happened?
Best,
Monica
Hello again,
My testing on few individuals has passed (#42 ), but when I run it on all the data I got, I run into one more issue:
[2020-05-08 15:03:10.357] [Genotyping] [16979] [info] [Done with alignment step 1250 total aligned (path: 0 [0 anchored] kmers: 0 / ksw: 0 / gssw: 1037) ; 213 were filtered]
[2020-05-08 15:03:10.358] [Genotyping] [16979] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:10.360] [Genotyping] [16973] [info] [Done with alignment step 1250 total aligned (path: 0 [0 anchored] kmers: 0 / ksw: 0 / gssw: 1051) ; 199 were filtered]
[2020-05-08 15:03:10.360] [Genotyping] [16973] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:10.362] [Genotyping] [16986] [info] [Done with alignment step 1250 total aligned (path: 0 [0 anchored] kmers: 0 / ksw: 0 / gssw: 1052) ; 198 were filtered]
[2020-05-08 15:03:10.363] [Genotyping] [16986] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:10.373] [Genotyping] [16977] [info] [Done with alignment step 1250 total aligned (path: 0 [0 anchored] kmers: 0 / ksw: 0 / gssw: 1047) ; 203 were filtered]
[2020-05-08 15:03:10.373] [Genotyping] [16977] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:10.379] [Genotyping] [16985] [info] [Done with alignment step 1250 total aligned (path: 0 [0 anchored] kmers: 0 / ksw: 0 / gssw: 1051) ; 199 were filtered]
[2020-05-08 15:03:10.379] [Genotyping] [16985] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:10.481] [Genotyping] [16976] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:10.693] [Genotyping] [16975] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:10.863] [Genotyping] [16974] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:11.080] [Genotyping] [16987] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:11.097] [Genotyping] [16982] [info] [Done with alignment step 1250 total aligned (path: 0 [0 anchored] kmers: 0 / ksw: 0 / gssw: 1042) ; 208 were filtered]
[2020-05-08 15:03:11.100] [Genotyping] [16982] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:11.100] [Genotyping] [16980] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:11.302] [Genotyping] [16972] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:11.524] [Genotyping] [16984] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:11.752] [Genotyping] [16981] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:11.961] [Genotyping] [16978] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:12.149] [Genotyping] [16979] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:12.322] [Genotyping] [16973] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:12.543] [Genotyping] [16986] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:12.791] [Genotyping] [16977] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:13.006] [Genotyping] [16985] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:13.246] [Genotyping] [16982] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:13.514] [Genotyping] [16980] [critical] ERROR: This thread also caught an exception
[2020-05-08 15:03:13.514] [Genotyping] [16972] [warning] WARNING: rethrowing a thread exception
[2020-05-08 15:03:13.906] [Genotyping] [16972] [critical] Assertion failed: !reference_sequence.empty()
It complains about an empty reference, but I am not sure which SV is causing the trouble.
When I grepped 16972
out of the log file, the last-mentioned variant
tmpumt4us9n.json.zip and indeed, the json contains some NNNNNs.
The sequence of the corresponding is:
>4_Tte_b3v08_scaf031309
TAGCGTAATTAGTACTTAAGCGTATAACCCGAACGGGTAGTAAGACGCGGTGCTAAATAA
TAATAATAATAATAATAATCTAAACAATATACAAATTAAGAAGCGTACTATTCAATTCTT
AGTGGCATGGATTTACAAATGATTTGTTGGGTACAAGACAAACTATAATTCAAAGTTGAA
CTAATTTCTAATTGATGTAATTATGTAATATAAGTTATTACAAAAAAAAAAAGTGTGGCG
TCATTAAAATTAAGTTAGATCGTGTGATAAAATAAGCGTGTGTGTGTGTGTGCAAGACGA
CGCAATAAATCACGGTGTTAAAAATGACCGTTACGTCACTGTTTGTCGGCCAATGAATGA
ATCGCCATAGTCATATTTCTGCTAACGTCCGTGAGATCGGATGAAATATAACCTGAGAAC
CGTAGCCTGTTATTTTAGAACTGGAGACAAATAGATGTTAGTTCAACTCTGATGTAACTT
CTAGTACAGAAACAGGTGGTCAGAACATTCACTAGAATCAGGAGGTCCCTGCCGCTAATA
GCCCCCCTCCCGCGAAAACAAACCTTAACATAATAATACAAAGCGGTGTCTCCCAAACTG
GGGTACGCGTGACCCTGGAGGTATGCTACGCCACATGTGTGTGTTTTTTTTAGCACCGCG
TCTTACTACCAGTTCGGGGTATACGCTTGTATACTAATCATTACAGTAGTGGTTTTAAGG
ATTAGGAAGATTATATTTAGAGGAAGTGTACCCGCATTAACGTGGAGAGAGAGTGAAAAA
CCATTTTGGAAAAATAACCTTGGTACATCCAACAAATATTCGAACTTCGATCGCCGCGTC
ATCGGAAATCTAGTCTATTGCGAGAGTAGAGTAGCGACTTAGACCATGCGGCCAATCGTT
GTATTAAACATCTTTTGAATAACTTGTTGGTATTATGTAACTTTAATTCGATTTTGTTTG
GGTCTGTCAAATTCCACCGCGCGTGATTAAAGTGTCGATAGATTAAATCTAGAAGAATTG
ATGTTTTGTGTAATTTCGCTTTCAATCTTAAGCTTTTTTAAAAAAAAAAAAGATGTTGTA
ATGAAGTAGGACTAAGTATTAGGCCATAATACTGTCCAAACAATAATTTTAAACTAGTCA
TCGGACCTTGGGGGGAGAATCACAAGAGTCAACANTAGGAGTTTTATGCATTTCTACAAA
TCAATGCCTTACTTTAGAGATCATTCCCGATGTTTTATGACATTGGGAACTCTCATAATA
ACATCCATATATATTCGGTGATTAACGTAAACTTTATATGTATGTATACATTTTATATAA
CTAGGCATATATATGTATAATTTTACTATATAAAAATAAAAGGAATTGTTTGTCTGTGTT
TGTTTGTGTGCGATGCCCAGCCAAATTTACGGCACGCAGAGATCTAAAAAAATTTAACAT
AGGTGGACGGAAGGGGGTCCGAATGCACCTCGAAGCAGGATTTTTAAATTTTTAATTAGC
TTTTTAATTAGG
and the nucleotide range of the variant (1176-1325) should be (probably):
TAGGAGTTTTATGCATTTCTACAAATCAATGCCTTACTTTAGAGATCATTCCCGATGTTTTATGACATTGGGAACTCTCATAATAACATCCATATATATTCGGTGATTAACGTAAACTTTATATGTATGTATACATTTTATATAACTAGG
certainly does not seem full of Ns. Originally this sequence was masked (I tried paragraph both with masked and unmasked reference, but I did not try to remap reads on the unmasked ref).
Sorry to bother you again, but I think there is just something rather small I am missing now.
Hi,
how to construct a graph for complex structural variants like TRA or INVDUP from sniffles or nanosv. Could you give me an example? And is it possible to genotype translocation between chroms now ?
Thanks!
In the documentation, it is indicated that the format of the output of paragraph-to-csv.py genotypes.json.gz --genotype-only
should be an ID consisting of the chromosome and position, as below:
#FORMAT=GT
#ID SWAPS
chrA:1500-1509 REF/REF
chrB:1500-1509 S1/S1
chrC:1500-1699 REF/S1
Instead, the output looks like this:
#FORMAT=GT
#ID SWAPS
swaps.vcf@5a0b775f60ed1cd0b938ae09b753ad0207c5ba9f83679f894f17d3d1fd352b6f:2 swap2:1/swap2:1
swaps.vcf@5a0b775f60ed1cd0b938ae09b753ad0207c5ba9f83679f894f17d3d1fd352b6f:3 REF/swap3:1
swaps.vcf@5a0b775f60ed1cd0b938ae09b753ad0207c5ba9f83679f894f17d3d1fd352b6f:1 REF/REF
which, while traceable to the chromosome/position via the vcf, is, I think, not the expected format?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.