mkirsche / jasmine Goto Github PK

View Code? Open in Web Editor NEW

173.0 173.0 16.0 3.53 MB

Jasmine: SV Merging Across Samples

License: MIT License

Java 85.77% Python 13.05% Shell 1.18%

jasmine's People

Contributors

Stargazers

Watchers

Forkers

aseetharam tw7649116 bgruening genomicsnx jjzhengsysu unique379r janeyang123 bioteksampath jxlabwzz jimlund asleonard distilledchild vthanhquang alexanrna

jasmine's Issues

SV not correctly merged

Hello author:
jasmine is a great software, but I encountered some problems:
I have SVs of 33 samples, and I have merged the SVs of these samples using the following parameters:

jasmine file_list=test.vcflist out_file=out.vcf max_dist=200 kd_tree_norm=2 min_seq_id=0.9 min_support=1 max_dup_length=100k min_overlap=0.5 --output_genotypes --nonlinear_dist

But I found these two strange INS in the results.

In both INSs, the insertion sequences of ALT is very similar (whether observed or aligned by NW-align).
However, it is confusing that the two SVS are not merged into one.

Support for different variant caller?

Hi ,

Sorry for the silly question - but I was wondering if Jasmine supported the input of other SV callers beside Sniffles.

changing number of threads from 2 to 4 or 8

Hi
I am to change number of threads to 4 or 8 using threads=4 or threads=8 option but jasmine is using default 2 threads only. threads is not overriding default option.
My machine has 16GB memory per processor, to increase memory in need to use more threads but even when I use 4 threads or 8 threads and memory accordingly 64GB or 128GB, Jasmine fails as soon as it reaches 32Gb memory. It never uses addition memory even when it is available.
Could you please help me with correct way to use threads option or how to increase memory usage.

SV not merged correctly

Dear author,

I am trying to merge SVs detected from assemblies and reads.

Here is one call from assemblies:

Here is another call from reads:

The merge command is:

Jasmine file_list={caller_merge_file} out_file={caller_merge_vcf} max_dist=1000 --dup_to_ins genome_file={iMACREF} samtools_path={SAMTOOLS} spec_len=50 spec_reads=1

The output is:

I am confused why SUPP_VEC=01 for the two same SVs.

Really appreciate your help!

Keep singletons in the merged file

Hi. I want to have a VCF file with multiple individuals from individually genotyped each sample. However, as I genotyped them separately, loci with 0|0 genotypes (in reality) in an individual are not recorded in the original vcf file.

Can I incorporate singletons, polymorphic regions with 0|0 genotypes in some individuals? (not all individuals, of course). I guess 0|0 locus (in reality) could be recorded as ./. (in merged VCF), but I want to convert (or re-genotype, restore) ./. into 0|0 if they are likely to be 0|0, not low-quality locus (not genotyped) in that individual? The individual vcf files have quality information (as below). I also have BAM files and ref.fa.

I thought "--mark_specific" does what I want to do, but the following command did not work as I imagined.

The command I used is:
jasmine file_list=$vcf_list out_file=./jasmine_six.vcf genome_file=$genome_file bam_list=$bam_list threads=15 --output_genotypes --normalize_type --dup_to_ins --mark_specific
vcftools --vcf jasmine_six.vcf --singletons --out test

It gave me an empty file.
CHROM POS SINGLETON/DOUBLETON ALLELE INDV

I did not see any unique variants in my manual inspection as well, though I see some loci with "0/0" in every individual.

The output file is:

ssa01 181295 0_64 N TCCTGCTACTATAAATATCATAGCTGGTATAATAGCCGCT . PASS IMPRECISE;SVMETHOD=JASMINE;CHR2=ssa01;END=181295;STD_quant_start=13.509256;STD_quant_stop=12.509996;Kurtosis_quant_start=-1.994528;Kurtosis_quant_stop=-1.993620;SVTYPE=INS;RNAMES=c24ce49b-c47c-46a0-b195-7a5507b2a4f3,cf4506b6-c1c0-401f-8acd-e22bee0c5659;SUPTYPE=AL;SVLEN=39;STRANDS=+-;RE=2;REF_strand=1,0;AF=0.666667;CONFLICT=1;OLDTYPE=INS;IS_SPECIFIC=0;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=39.000000;AVG_START=181295.000000;AVG_END=181334.000000;SUPP_VEC_EXT=111111;IDLIST_EXT=64,64,64,64,64,64;SUPP_EXT=6;SUPP_VEC=111111;SUPP=6;IDLIST=64,64,64,64,64,64;REFINEDALT=. GT:IS:OT:OS:DV:DR 0/1:0:INS:.:2:1 0/1:0:INS:.:2:1 0/1:0:INS:.:2:1 0/1:0:INS:.:2:1 0/1:0:INS:.:2:1 0/1:0:INS:.:2:1
ssa01 181496 0_65 N . PASS IMPRECISE;SVMETHOD=JASMINE;CHR2=ssa01;END=216640;STD_quant_start=9.513149;STD_quant_stop=60.502066;Kurtosis_quant_start=-1.988981;Kurtosis_quant_stop=-1.999727;SVTYPE=INV;RNAMES=a09a4a83-767d-4474-8930-5f7893a57e1b,f653e6a1-65f2-48c6-8c6e-72dc654e9adf;SUPTYPE=SR;SVLEN=35144;STRANDS=--;RE=2;REF_strand=27,27;AF=0.0357143;OLDTYPE=INV;IS_SPECIFIC=0;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=35144.000000;AVG_START=181496.000000;AVG_END=216640.000000;SUPP_VEC_EXT=111111;IDLIST_EXT=65,65,65,65,65,65;SUPP_EXT=6;SUPP_VEC=111111;SUPP=6;IDLIST=65,65,65,65,65,65;REFINEDALT=. GT:IS:OT:OS:DV:DR 0/0:0:INV:.:2:54 0/0:0:INV:.:2:54 0/0:0:INV:.:2:54 0/0:0:INV:.:2:54 0/0:0:INV:.:2:54 0/0:0:INV:.:2:54

Thank you very much!

How are LEN and AVG_LEN calculated ?

Dear Melanie,
I'm trying to use Jasmine to merge SVs found across different tools. It is promising but I have some troubles dealing with the length output as it may have been encoded a bit differently in my assembly-based SV detection (from graph for instance).
To better understand what happens, I was wondering how the field LEN and AVG_LEN were calculated in the output of Jasmine?
More specifically, is this based on the length of the sequence in the REF/ALT field? On the given file SVLEN? On the difference between ENd and START? And does it make a difference to use the --use_end option?

Thanks for your help and thanks for this very useful software!!
Claire

a sequence name but not an actual sequence

Hi,
when i run Jasmine, i got an error
Outputting original deletion for GL000009.2:20426:DEL:Sniffles2.DEL.1DS1A (time = 00:00:01:25.032)
Exception in thread "main" java.lang.Exception: samtools faidx produced a sequence name but not an actual sequence: samtools faidx /home/a1lian/hg38.fa GL000009.2:20427-20488
at IrisGenomeQuery.genomeSubstring(IrisGenomeQuery.java:62)
at VcfEditor.run(VcfEditor.java:227)
at Iris.runIris(Iris.java:21)
at PipelineManager.runIris(PipelineManager.java:129)
at Main.preprocess(Main.java:47)
at Main.main(Main.java:17)

Parameter "out_dir" does not work with absolute path

Hi,

I would like to write my temporary files to a directory other than my current working directory, however, when I pass an absolute path to "out_dir", the working directory's path is copied in front and Jasmine throws an error.
If I use just the name of a directory, the directory is created in the working directory and the program progresses as normal but the temporary files are written where I'm running my job.

Thanks,
Best regards,

Installation issues

This tool looks very interesting, and I would like to try Jasmine on my data. I however have some installation issues which are not immediately obvious for me how to solve...

(...lots of output omitted)
mkdir -p -m 755 /usr/local/bin /usr/local/include /usr/local/include/htslib /usr/local/lib /usr/local/share/man/man1 /usr/local/share/man/man5 /usr/local/lib/pkgconfig
mkdir: cannot create directory ‘/usr/local/include/htslib’: Permission denied
mkdir: cannot create directory ‘/usr/local/lib/pkgconfig’: Permission denied
make: *** [installdirs] Error 1
/home/wdecoster/repositories/Jasmine/Iris/rebuild_default_external.sh: line 28: autoheader: command not found
/home/wdecoster/repositories/Jasmine/Iris/rebuild_default_external.sh: line 29: autoconf: command not found
/home/wdecoster/repositories/Jasmine/Iris/rebuild_default_external.sh: line 30: ./configure: No such file or directory
gcc  -L./lz4  -o samtools bam_index.o bam_plcmd.o sam_view.o bam_fastq.o bam_cat.o bam_md.o bam_reheader.o bam_sort.o bedidx.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o bam2bcf.o bam2bcf_indel.o sample.o cut_target.o phase.o bam2depth.o coverage.o padding.o bedcov.o bamshuf.o faidx.o dict.o stats.o stats_isize.o bam_flags.o bam_split.o bam_tview.o bam_tview_curses.o bam_tview_html.o bam_lpileup.o bam_quickcheck.o bam_addrprg.o bam_markdup.o tmp_file.o ./lz4/lz4.o libbam.a libst.a ../htslib/libhts.a -lz -lm -lbz2 -llzma -lcurl -lcurses -lm -lz  -lpthread
/usr/bin/ld: cannot find -lcurl
/usr/bin/ld: cannot find -lcurses
collect2: error: ld returned 1 exit status
make: *** [samtools] Error 1
gcc  -L./lz4  -o samtools bam_index.o bam_plcmd.o sam_view.o bam_fastq.o bam_cat.o bam_md.o bam_reheader.o bam_sort.o bedidx.o bam_rmdup.o bam_rmdupse.o bam_mate.o bam_stat.o bam_color.o bamtk.o bam2bcf.o bam2bcf_indel.o sample.o cut_target.o phase.o bam2depth.o coverage.o padding.o bedcov.o bamshuf.o faidx.o dict.o stats.o stats_isize.o bam_flags.o bam_split.o bam_tview.o bam_tview_curses.o bam_tview_html.o bam_lpileup.o bam_quickcheck.o bam_addrprg.o bam_markdup.o tmp_file.o ./lz4/lz4.o libbam.a libst.a ../htslib/libhts.a -lz -lm -lbz2 -llzma -lcurl -lcurses -lm -lz  -lpthread
/usr/bin/ld: cannot find -lcurl
/usr/bin/ld: cannot find -lcurses
collect2: error: ld returned 1 exit status
make: *** [samtools] Error 1
/home/wdecoster/repositories/Jasmine/Iris/rebuild_default_external.sh: line 33: cd: bin: No such file or directory
cp: cannot stat ‘samtools’: No such file or directory
/home/wdecoster/repositories/Jasmine/Iris/rebuild_default_external.sh: line 37: cd: racon: No such file or directory
mkdir: cannot create directory ‘build’: File exists
/home/wdecoster/repositories/Jasmine/Iris/rebuild_default_external.sh: line 40: cmake: command not found
make: *** No targets specified and no makefile found.  Stop.
cp: cannot stat ‘bin/racon’: No such file or directory
cp: cannot stat ‘rebuilt_external_scripts/*’: No such file or directory

I guess it expects I have root permission? Would it be possible to install this as a user? I see the commands often mention Iris, which confuses me as that is your other tool?
It would also be nice if this could be added to bioconda to make installation easier, if possible.

Thanks!
Wouter

Stack Overflow Error

Hello,

When merging 23 VCFs with about 10,000 CNVs each, I get:

Merging graph ID: Chr01_DEL_
Exception in thread "main" java.lang.StackOverflowError
at KDTree.build(KDTree.java:54)
at KDTree.build(KDTree.java:55) # this one repeats a bunch

The recursion on the K-D tree doesn't seem to work with 23 samples and default values, if there are a lot of CNVs. Running fewer samples resolves the issue, and bumping the stack size up also resolves the overflow (-Xss1G). But you'll probably want to either document this issue or recode the recursion to handle the data differently. Note: I also ran a test with nearly 300 samples at about 10k CNVs each (-Xmx60G -Xss1G threads=16) and that appeared to work properly. So, it can be circumvented. The SUPP_VEC_EXT and SUPP_VEC values are crazy high, though.

What means "GT" in the FILTER field

Hi, I am using Jasmine to merge SVs according to the suggested pipeline, and it works well. But in the final dataset after "Remove low-confidence or imprecise calls" with the command cat <mergedvcf> | grep -v 'IMPRECISE;' | grep -v 'IS_SPECIFIC=0', a large number of variants flagged with "GT" in the FILTER field. The header defines "GT" as "Genotype filter", could you please to explain the detail description of "Genotype filter". And do these variants should be included for downstream analysis?
Thanks very much!

Zhongqu

Exception in thread "Thread-6" java.lang.OutOfMemoryError: Java heap space

Dear Dr，
When I merge the 25 SV files，there is some errors about my memory？My commad line as below:
jasmine file_list=merge1.txt out_file=merge.n.vcf out_dir=./ threads=6
And the error information as show follow:

Exception in thread "Thread-6" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3689)
        at java.base/java.util.PriorityQueue.grow(PriorityQueue.java:305)
        at java.base/java.util.PriorityQueue.offer(PriorityQueue.java:344)
        at java.base/java.util.PriorityQueue.add(PriorityQueue.java:326)
        at KDTree.search(KDTree.java:244)
        at KDTree.kNearestNeighbor(KDTree.java:197)
        at VariantMerger.runMerging(VariantMerger.java:205)
        at ParallelMerger$MyThread.run(ParallelMerger.java:89)

Can you give some advice to deal with this?
Thank your time!
Best wishes

Ji gaoxiang

merge not behaving as expected for DUP

Hi Melanie,

I have the following sample input from the GIAB trio processed using DRAGEN:

chr1    144905843       DRAGEN:GAIN:chr1:144905844-144907253    N       <DUP>   .       PASS    SVTYPE=CNV;END=144907253;REFLEN=1410    GT:SM:CN:BC:PE:QS:FT:DN ./1:1.33898:3:1:47,51:8:cnvQual ./1:1.29469:3:1:101,103:6:cnvQual       ./1:1.48387:3:1:50,58:17:PASS:Inherited
chr1    144907253       DRAGEN:GAIN:chr1:144907254-144987033    N       <DUP>   .       PASS    SVTYPE=CNV;END=144987033;REFLEN=79780   GT:SM:CN:BC:PE:QS:FT:DN ./1:2.06779:4:38:51,19:95:PASS  ./1:2.06284:4:38:103,43:105:PASS        ./1:2.06813:4:38:58,25:96:PASS:Inherited
chr1    144987033       DRAGEN:GAIN:chr1:144987034-144998774    N       <DUP>   .       PASS    SVTYPE=CNV;END=144998774;REFLEN=11741   GT:SM:CN:BC:PE:QS:FT:DN ./1:1.41905:3:11:19,7:24:PASS   ./1:1.27235:3:11:43,14:8:cnvQual        ./1:1.52716:3:11:25,11:35:PASS:Inherited

I installed jasmine today using conda, and I am using the following command:

jasmine file_list=GIAB.cnv_filtered_noLowDQ.vcf --normalize_type --allow_intrasample --output_genotypes --comma_filelist --nonlinear_dist --max_dist=5000 --ignore_strand out_file=jasmine_GIAB.cnv_filtered_noLowDQ.vcf

Merging is occurring (I am able to see events that were successfully merged in the output) but the only events that are merged are DEL. For example:

chr1    207541223       0_DRAGEN:LOSS:chr1:207541224-207542261  N       <DEL>   .       PASS    SVTYPE=;END=207542261;REFLEN=1038;SVLEN=-1038;STARTVARIANCE=2146376.000000;ENDVARIANCE=2301368.000000;AVG_LEN=-1560.333333;AVG_START
=207542733.666667;AVG_END=207544294.000000;VARCALLS=3;ALLVARS_EXT=(DRAGEN:LOSS:chr1:207541224-207542261,DRAGEN:LOSS:chr1:207542262-207544717,DRAGEN:LOSS:chr1:207544718-207545904);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:LOSS:chr1:207541
224-207542261;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:LOSS:chr1:207541224-207542261;INTRASAMPLE_IDLIST=DRAGEN:LOSS:chr1:207541224-207542261,DRAGEN:LOSS:chr1:207542262-207544717,DRAGEN:LOSS:chr1:207
544718-207545904    GT:IS:OT:DV:DR  0/1:.:CNV:0:.   ./.:.:CNV:0:.   0/1:.:CNV:0:.

All of my DUPs that should be merged are not being merged in the output, for example:

chr1    144905843       0_DRAGEN:GAIN:chr1:144905844-144907253  N       <DUP>   .       PASS    SVTYPE=;END=144907253;REFLEN=1410;SVLEN=1410;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=1410.000000;AVG_START=144905843.000
000;AVG_END=144907253.000000;VARCALLS=1;ALLVARS_EXT=(DRAGEN:GAIN:chr1:144905844-144907253);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:GAIN:chr1:144905844-144907253;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:GAI
N:chr1:144905844-144907253;INTRASAMPLE_IDLIST=DRAGEN:GAIN:chr1:144905844-144907253      GT:IS:OT:DV:DR  ./1:.:CNV:0:.   ./1:.:CNV:0:.   ./1:.:CNV:0:.
chr1    144907253       0_DRAGEN:GAIN:chr1:144907254-144987033  N       <DUP>   .       PASS    SVTYPE=;END=144987033;REFLEN=79780;SVLEN=79780;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=79780.000000;AVG_START=144907253.
000000;AVG_END=144987033.000000;VARCALLS=1;ALLVARS_EXT=(DRAGEN:GAIN:chr1:144907254-144987033);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:GAIN:chr1:144907254-144987033;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:
GAIN:chr1:144907254-144987033;INTRASAMPLE_IDLIST=DRAGEN:GAIN:chr1:144907254-144987033   GT:IS:OT:DV:DR  ./1:.:CNV:0:.   ./1:.:CNV:0:.   ./1:.:CNV:0:.
chr1    144987033       0_DRAGEN:GAIN:chr1:144987034-144998774  N       <DUP>   .       PASS    SVTYPE=;END=144998774;REFLEN=11741;SVLEN=11741;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=11741.000000;AVG_START=144987033.
000000;AVG_END=144998774.000000;VARCALLS=1;ALLVARS_EXT=(DRAGEN:GAIN:chr1:144987034-144998774);SUPP_VEC_EXT=1;IDLIST_EXT=DRAGEN:GAIN:chr1:144987034-144998774;SUPP_EXT=1;STRANDS=??;SUPP_VEC=1;SUPP=1;SVMETHOD=JASMINE;IDLIST=DRAGEN:
GAIN:chr1:144987034-144998774;INTRASAMPLE_IDLIST=DRAGEN:GAIN:chr1:144987034-144998774   GT:IS:OT:DV:DR  ./1:.:CNV:0:.   ./1:.:CNV:0:.   ./1:.:CNV:0:.

Any ideas? Thank you in advance!

Possible incorrect merging of breakend variants

Hello, thanks for creating and publishing Jasmine!

I've encountered a situation where a breakend variant in one sample does not merge with the identical variant in a second sample but instead merges with the mate/partner e.g. variant_a and variant_b represent a paired set of beakends, and are present in two VCFs/samples:

VCF 1: variant_a (chr1:200) - variant_b (chr2:5000)
VCF 2: variant_a (chr1:200) - variant_b (chr2:5000)

I have found that Jasmine can produce the resulting merged variants:

variant_a_merged (chr1:200): variant_a,variant_b
variant_b_merged (chr2:5000): variant_b,variant_a

I dont think this is intended behaviour. I've created and attached a minimal example to reproduce below.

Minimal example

Expected behaviour

first.vcf:

##fileformat=VCFv4.2
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakend">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##contig=<ID=chr1,length=100000>
##contig=<ID=chr2,length=200000>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SEQC-II_Tumor_50pc
chr1    200     chr1_200_break_first_vcf        G       [chr2:5000[G    .       PASS    MATEID=chr2_5000_break_first_vcf;SVTYPE=BND     .       .
chr2    5000    chr2_5000_break_first_vcf       A       [chr1:200[A     .       PASS    MATEID=chr1_200_break_first_vcf;SVTYPE=BND      .       .

second.vcf:

##fileformat=VCFv4.2
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakend">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##contig=<ID=chr1,length=100000>
##contig=<ID=chr2,length=200000>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SEQC-II_Tumor_50pc
chr1    200     chr1_200_break_second_vcf       G       [chr2:5000[G    .       PASS    MATEID=chr2_5000_break_second_vcf;SVTYPE=BND    .       .
chr2    5000    chr2_5000_break_second_vcf      A       [chr1:200[A     .       PASS    MATEID=chr1_200_break_second_vcf;SVTYPE=BND     .       .

input_files.txt:

first.vcf
second.vcf

Merge variants:

$ jasmine file_list=input_files.txt out_dir=./ out_file=merged_expected.vcf 1>/dev/null 2>&1
$ { 
    echo -e "CHRM\tPOS\tID\tIDLIST";
    bcftools query -f '%CHROM\t%POS\t%ID\t[%IDLIST]\n' merged_expected.vcf;
  } | column -t -s$'\t'
CHRM  POS   ID                           IDLIST
chr1  200   0_chr1_200_break_first_vcf   chr1_200_break_first_vcf,chr1_200_break_second_vcf
chr2  5000  0_chr2_5000_break_first_vcf  chr2_5000_break_first_vcf,chr2_5000_break_second_vcf

Unexpected behaviour

Repeat exactly as above but including the INFO/DEBUG_1234 field in first.vcf to trigger the unexpected behaviour.

first.vcf:

##fileformat=VCFv4.2
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakend">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=DEBUG_1234,Number=1,Type=String,Description="For debug purposes">
##contig=<ID=chr1,length=100000>
##contig=<ID=chr2,length=200000>
#CHROM  POS ID  REF ALT QUAL  FILTER  INFO  FORMAT  SEQC-II_Tumor_50pc
chr1  200 chr1_200_break_first_vcf  G [chr2:5000[G  . PASS  MATEID=chr2_5000_break_first_vcf;SVTYPE=BND;DEBUG_1234=NOVALUE  . .
chr2  5000  chr2_5000_break_first_vcf A [chr1:200[A . PASS  MATEID=chr1_200_break_first_vcf;SVTYPE=BND;DEBUG_1234=NOVALUE . .

second.vcf:

##fileformat=VCFv4.2
##INFO=<ID=MATEID,Number=.,Type=String,Description="ID of mate breakend">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##contig=<ID=chr1,length=100000>
##contig=<ID=chr2,length=200000>
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SEQC-II_Tumor_50pc
chr1    200     chr1_200_break_second_vcf       G       [chr2:5000[G    .       PASS    MATEID=chr2_5000_break_second_vcf;SVTYPE=BND    .       .
chr2    5000    chr2_5000_break_second_vcf      A       [chr1:200[A     .       PASS    MATEID=chr1_200_break_second_vcf;SVTYPE=BND     .       .

input_files.txt:

first.vcf
second.vcf

$ jasmine file_list=input_files.txt out_dir=./ out_file=merged_unexpected.vcf 1>/dev/null 2>&1
$ { 
    echo -e "CHRM\tPOS\tID\tIDLIST";
    bcftools query -f '%CHROM\t%POS\t%ID\t[%IDLIST]\n' merged_unexpected.vcf;
  } | column -t -s$'\t'
CHRM  POS   ID                           IDLIST
chr2  5000  0_chr2_5000_break_first_vcf  chr2_5000_break_first_vcf,chr1_200_break_second_vcf
chr1  200   0_chr1_200_break_first_vcf   chr1_200_break_first_vcf,chr2_5000_break_second_vcf

Install info

Above was done using Jasmine 1.1.4 on macOS from conda and installed via:

conda create -p $(pwd -P)/conda_env/ -y -c bioconda -c conda-forge jasminesv
conda activate conda_env/

$ conda list
# packages in environment at /Users/stephen/projects/jasmine_variant_merging/3_debug_variants/conda_env:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                hc929b4f_4    conda-forge
c-ares                    1.17.2               h0d85af4_0    conda-forge
ca-certificates           2021.10.8            h033912b_0    conda-forge
certifi                   2021.5.30                pypi_0    pypi
htslib                    1.13                 hc38c3fb_0    bioconda
irissv                    1.0.4                hdfd78af_2    bioconda
jasminesv                 1.1.4                hdfd78af_0    bioconda
k8                        0.2.5                h87af4ef_1    bioconda
krb5                      1.19.2               hcfbf3a7_2    conda-forge
libcurl                   7.79.1               hf45b732_1    conda-forge
libcxx                    12.0.1               habf9029_0    conda-forge
libdeflate                1.7                  h35c211d_5    conda-forge
libedit                   3.1.20210714         h9ed2024_0  
libev                     4.33                 haf1e3a3_1    conda-forge
libffi                    3.4.2                he49afe7_4    conda-forge
libnghttp2                1.43.0               h6f36284_1    conda-forge
libssh2                   1.10.0               h52ee1ee_2    conda-forge
libzlib                   1.2.11            h9173be1_1013    conda-forge
minimap2                  2.22                 h188c3c3_0    bioconda
ncurses                   6.2                  h2e338ed_4    conda-forge
openjdk                   11.0.9.1             hcf210ce_1    conda-forge
openssl                   1.1.1l               h0d85af4_0    conda-forge
pip                       21.2.4             pyhd8ed1ab_0    conda-forge
python                    3.10.0          h1248fe1_1_cpython    conda-forge
racon                     1.4.20               h87af4ef_1    bioconda
readline                  8.1                  h05e3726_0    conda-forge
samtools                  1.13                 h7596a89_0    bioconda
setuptools                58.0.4                   pypi_0    pypi
sqlite                    3.36.0               h23a322b_2    conda-forge
tk                        8.6.11               h5dbffcc_1    conda-forge
tzdata                    2021c                he74cb21_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xz                        5.2.5                haf1e3a3_1    conda-forge
zlib                      1.2.11            h9173be1_1013    conda-forge

Index 1 out of bounds for length 1

Thanks for this tool!

I was trying to merge two different SV calls and I got following errors:

So both vcf don't have records for chromosome2 (and many others as well) and I think that's why I see this error message.
Is there any way to get around?

Thanks!

Merge accross tools instead of merging accross samples

Hello, I really like the process and accuracy of Jasmine.
I see in the manual and doc that this was designed to merge SVs across samples.

We would like (and we are trying ) to use it to merge SVs detected on the same sample but with different tools. Would you say that a different set of parameters should be used in such a case? What about merging SVs detected for different samples and different tools? Has anyone tried that? what do you think?

It seems to work for some merging (e.g. SVs detected by different tools for long-reads) but not so much for other merging (SVs detected from long-reads, and Sv detected from short-reads).

Thanks for your help!
Claire

Index -1 out of bounds for length 0

Thanks for this tool!
I had the same problem. i try to merge deletion from different tools.
i use a new release of version 1.1.5.

Problem with output_genotypes parameter

Hi,

I've noticed that when I'm running Jasmine with parameter: --output_genotypes I get an error as following:

Exception in thread "main" java.lang.NullPointerException
    at AddGenotypes.addGenotypes(AddGenotypes.java:150)
    at PipelineManager.addGenotypes(PipelineManager.java:191)
    at Main.postprocess(Main.java:91)
    at Main.main(Main.java:27)

And Jasmine does not finish the job, and writes just part of the calls in the output file. Without --output_genotypes Jasmine doesn't have any problems and finishes, but doesn't output genotypes in the merged output VCF file.

I only have these genotypes in my VCF files (generated by Sniffles):

0/0
0/1
1/1

Any idea what can be the problem?
Thanks,

documentation help

Hi, thanks for developing this tool. I have several issues which I think may just be documentation issues:

It seems the tool does not accept (b)gzipped VCF's. This is unusual, so it would be nice if the error message was clearer (or if gzip was supported.
It's unclear how to use the output. from #7, it seems maybe one should specify: --output_genotypes ? I would like to merge SVs, then re-genotype all samples at the merged SVs. do I need --output_genotypes for that? Or should I just sort the output from jasmin without --output_genotypes ?
(this seems to be a real bug, not doc issue). the output includes a PRECISE flag that is not included in the header (though IMPRECISE is). This causes problems with BCF output.

thanks for any help.
-Brent

Fail to merge SVs when STRANDS not set in one file

Hi,

Thanks for the great tool!

I have multiple representation of SVs for the same sample generated using Manta and Lumpy (through Smoove). When I try to use Jasmine to merge calls from the different tools, no overlapping calls are found and the resulting output file is empty. My command is

jasmine --normalize_type --normalize_chrs min_support=2 file_list=filelist out_file=intersection.vcf

If I repeat the same with min_support=1 I can clearly see that there are very similar vars, even identical ones, that are not merged by Jasmine. For example these 2 DEL records. They are exactly the same, reported by both tools with the same POS, SVTYPE and END annotations, but Jasmine fails to merge them:

Manta

1	1207339	1_MantaDEL:105478:0:1:0:0:0	GGAGACTGTCCTATGTCTTTCTGAGCCTCAGTTTCCCCTGTGGGCACCGAGGGGTTCTGGGACCCTGCCTCCACCAGGAAGCCTCCCTGGATTGCCCAGCCCTGCTTCTGCGCCGTCCAGCACAGGTGGAGACCCCCATGAATGCTGGGGGTGGGGGCTCTCGGGAACGTGAGCGTGGATGTGGTTCAACACCCTTTTGAGACCTGCAGCCACCGCCTCACCCCGTAAGGCGGTTCCTCCTTTTCCAAGGTAAATGACAGGAATTAGCTGTTTGTGACACCCCGGAGTTCTCAAATCCAAGATGTAGGAGCCTGCCTTGGAGAGGCAGCCCTCAGACACTGCAGAGAAGGAAGGGGTCTCTGCAGCTCCAGGCCGCCCCGACGCTCGGAAGGAAAGGGGTGGGGCCAGCTGGGCCTGGGGGC	G	185	PASS	END=1207760;SVTYPE=DEL;SVLEN=-421;CIGAR=1M421D;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=-421.000000;AVG_START=1207339.000000;AVG_END=1207760.000000;SUPP_VEC_EXT=01;IDLIST_EXT=MantaDEL:105478:0:1:0:0:0;SUPP_EXT=1;SUPP_VEC=01;SUPP=1;SVMETHOD=JASMINE;IDLIST=MantaDEL:105478:0:1:0:0:0	GT:FT:GQ:PL:PR:SR	0/1:PASS:140:235,0,137:9,5:6,5

Lumpy

1	1207339	0_1	N	<DEL>	206.71	.	SVTYPE=DEL;SVLEN=-421;END=1207760;STRANDS=+-:5;CIPOS=-10,9;CIEND=-10,9;CIPOS95=0,0;CIEND95=0,0;SU=5;PE=0;SR=5;PRPOS=9.80198e-21,9.80198e-19,9.80198e-17,9.80198e-15,9.80198e-13,9.80198e-11,9.80198e-09,9.80198e-07,9.80198e-05,0.00980198,0.980198,0.00980198,9.80198e-05,9.80198e-07,9.80198e-09,9.80198e-11,9.80198e-13,9.80198e-15,9.80198e-17,9.80198e-19;PREND=9.80198e-21,9.80198e-19,9.80198e-17,9.80198e-15,9.80198e-13,9.80198e-11,9.80198e-09,9.80198e-07,9.80198e-05,0.00980198,0.980198,0.00980198,9.80198e-05,9.80198e-07,9.80198e-09,9.80198e-11,9.80198e-13,9.80198e-15,9.80198e-17,9.80198e-19;AC=1;AN=2;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=-421.000000;AVG_START=1207339.000000;AVG_END=1207760.000000;SUPP_VEC_EXT=10;IDLIST_EXT=1;SUPP_EXT=1;SUPP_VEC=10;SUPP=1;SVMETHOD=JASMINE;IDLIST=1	GT:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB	0/1:181:206.71:-24,-3,-21:43:31:12:30:11:14:6:0:16:4:0.27

After some tests, I've realised that the issue is that STRANDS is set to +- by Lumpy and absent from Manta INFO fields. I suggest to ignore strand or raise a warning when one of the file do not contain STRAND annotation and --ignore-strand is not set. Not sure how Jasmine treats +- STRANDS values, which means that there is actually no strand determined for the call.
Maybe better to ignore strands by default? I'm not familiar with long-read SVs callers, but usually SV callers from short-reads do not output strand annotation.

end conversion to start+length for INS

Hello! Can you please help me understand how the end attribute for an insertion is converted to start+length if the --use_end flag is not set? I can't see it yet and I'm not sure what I'm missing.

According to src/Variant.java, the end should already be converted by the time a Variant object is created.

If the Variant objects are created in src/VariantInput.java and that class uses getSecondCoord for the end attribute, isn't the result of getLength returned? For INS, getLength returns the SVLEN info field or seq.length.

Thanks in advance for your time.

About Jasmine pipeline

Dear Melanie
Thank you for this amazing pipeline.
I have some questions regarding how to run the pipeline (I am new to using Snakemake).

For the pipeline, do you recommend including all samples in the data.yaml file?
How can I run from a specific step? (Considering that I have the sorted bam files of the first step)
If I run the pipeline for each sample independently, does the step "mark_specific_in_raw" need the list of other vcf files?

When I tested the pipeline with two samples, the final output was not the merged vcf. Is there any flag I need to use to run this step, or should I run it independently?

Finally, If I run all the steps independently, should I merge all ism vcf files and all ism.specific vcf files into a unique vcf?

Thank you in advance!

--normalize_type creates invalid VCF for translocations

Hi,

I think --normalize_type on a BND representing a translocation creates an invalid VCF, as it fills in the END INFO field with a value that is on a different chromosome, which may end up being END < POS. While that seems, according to the VCF4.2 specifications, not explicitly forbidden, it does lead to errors with bcftools and has been clarified in the VCF4.3 specifications "End position on CHROM ".

See for example also samtools/hts-specs#436

Cheers,
Wouter

min_overlap

jasmine can merge SVs across samples and tools.
i want to merge SVs detected for different samples and different tools, and two overlapping CNVs with RO > 50% were treated as the same region. so i write this command.
jasmine file_list=/bak01/filelist.txt out_file=/bak01/merge_metasv_delly/merge.vcf --min_overlap 0.5 --output_genotypes --default_zero_genotype --leave_breakpoints
I wonder if --min-overlap can be used alone. because you reply in this question, when using min_overlap, Jasmine still take "max_dist_linear" or "max_dist" into account to decide to merge. #13.
thank you !

SUPP INFO tag should be Integer

In jasmine's output VCF file, both SUPP and SUPP_EXT are set as String:

##INFO=<ID=SUPP,Number=1,Type=String,Description="Number of samples supporting the variant">
##INFO=<ID=SUPP_EXT,Number=1,Type=String,Description="Number of samples supporting the variant, potentially extended across multiple merges">

However, shouldn't they be set to Integer?

Jasmine doesn't work with Smoove

Hello!
I'm using Jasmine to merge vcf files obtained from different tools of the same sample. Those are the tools that I'm currently using: Manta, Delly, Whamg, Svaba and Smoove.
I can merge all the tools without problems except for smoove. When I put smoove with the others, Jasmine can merge but smoove ramains always as a single call even if there are variants in common.
I tried to merge smoove aginst itself (with another sample) and it's working. I also tried to merge Smoove against each one of the other tool and it's not working.
Those are the parameter that I'm using:

jasmine
file_list=ID.tool.txt
out_file=ID.vcf
genome_file=Homo_sapiens_assembly38.fasta
bam_list=ID.bam.txt
max_dist_linear=1.0
threads=16
min_overlap=0.75
min_support=2
--ignore_type
--normalize_type

The INFO field of each VCF is a little bit different but still, the others can be merged together.
If you want I can send you the different vcf via email.

Help to set the best jasmine parameters

Hello Melanie,

Thank you for providing this excellent tool ! Love it so much!

I'm currently using Jasmine to evaluate the presence/absence of SV identified with different SV callers in a small WGS cohort.
Ultimately, I would like to obtain a single non redundant SV VCF with all the samples of my cohort:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1    sample2    sample3

I'm using Jasmine in 2 steps:

Step 1

For each sample of my cohort :
I merge the SVs called with different SV callers (the idea is to determine all SV in a sample, considering all SV type, but without redundancy due to fuzzy coordinates) in order to obtain a unique VCF file for each sample.

sample1_SV_smoove.vcf
sample1_SV_delly.vcf
sample1_SV_CNVpytor.vcf
sample1_SV_Manta.vcf
sample1_SV_Mobster.vcf
sample1_SV_ExpansionHunter.vcf

=========Jasmine======> sample1_SV_merge.vcf

Wanted SV clustering criteria:
=> e.g. a maximum of 300bp distance between breakpoints and at least 80% reciprocal overlap by size.

Those are the parameters that I'm using:

jasmine
file_list=files_list.txt 
out_file=sample1_SV_merge.vcf
out_dir=jasmine_tmp
threads=8
min_dist=-1 
max_dist=150 
--nonlinear_dist 
min_overlap=0.8
--allow_intrasample
--ignore_strand 
--normalize_type

I don't use --output_genotypes to obtain only 1 sample column in the VCF.
But I would like to keep the most frequent GT.
Question 1:
How can I obtain that? Is it possible or should I use the --output_genotypes and parse the results myself?
Else (with --output_genotypes ), I obtain several sample columns:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  0_sample1       1_sample1   2_sample1     3_sample1     4_sample1     5_sample1

Question 2:
Is there a way to report a tag associated to the input file where the SV comes from?
Something like:

#CHROM  POS      ID      REF     ALT     QUAL    FILTER  INFO                  FORMAT    sample1 
chr6    123605   .       .       <DEL>   .       PASS    SVTYPE=DEL;SVLEN=200  GT:TAG    0/1:smoove,delly

If no, would it be possible to add an option --tag = a file listing tags associated to VCF files to merge (on separate lines).

File	Tag
sample1_SV_smoove.vcf	smoove
sample1_SV_delly.vcf	delly
sample1_SV_CNVpytor.vcf	CNVpytor
sample1_SV_Manta.vcf	Manta
sample1_SV_Mobster.vcf	Mobster
sample1_SV_ExpansionHunter.vcf	ExpansionHunter

This will be really useful!

Step 2

Then I merge the VCF files from all samples (sample1_SV_merge.vcf, sample2_SV_merge.vcf...).
Of course, this time, I add the --output_genotypes option.

Thank you very much for any advice/thinking you can provide me,

Best regards,

Véronique

Matches to the reference instead of missing data

Hi,

I would like to use Jasmine to merge Sniffles vcf outputs. The major problem I see is that, Jasmine most of the times it only outputs missing data for the samples that did not identify SV entries.

Is there a way to force Jasmine to check if it is a real missing data or matches to the reference 0/0? Or should I treat any missing data as matches to the reference after using Jasmine?

Like this example: Chr3 5797105 0_4943 GTTCAATCTTCGTCTTGTTTCTATTTGAGATTGAAGTTGAAATTATGCTCGATCTGTTCTGTAGCTAGAATT N . PASS PRECISE;SVMETHOD=JASMINE;CHR2=Chr3;END=5797176;STD_quant_start=0.000000;STD_quant_stop=0.000000;Kurtosis_quant_start=4.523800;Kurtosis_quant_stop=1.082952;SVTYPE=DEL;SUPTYPE=AL;SVLEN=-71;STRANDS=+-;RE=81;REF_strand=0,0;AF=1;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=-72.000000;AVG_START=5797105.000000;AVG_END=5797177.000000;SUPP_VEC_EXT=1010000010000;IDLIST_EXT=4943,5641,6674;SUPP_EXT=3;SUPP_VEC=1010000010000;SUPP=3;IDLIST=4943,5641,6674 GT:IS:OT:OS 1/1:.:DEL:. ./.:NA:NA:NA 1/1:.:DEL:. ./.:NA:NA:NA ./.:NA:NA:NA ./.:NA:NA:NA ./.:NA:NA:NA ./.:NA:NA:NA 1/1:.:DEL:. ./.:NA:NA:NA ./.:NA:NA:NA ./.:NA:NA:NA ./.:NA:NA:NA

Best Regards,
Mehmet

VCF (caller) concordance analysis using jasmine

Hello,

Would you recommend this tool for SV VCF caller concordance analysis on Truth Set ? I have actually used this
with max_dist_linear=0.1 min_dist=50 parameters, and used the SUPP_VEC field of the output vcf. Could this be a good approach beside other tools like this https://github.com/spiralgenetics/truvari ?

Thanks !

Possible error in handling iris temp output paths

jasmine v. 1.1.5 conda installation.

Iris temp files resultsstore.txt and results.tsv are output to current directory and not jasmine ... out_dir=jasmine_out ... .

Defining iris_args="out_dir=jasmine_out" moves temp files to correct directory but results in an error:

Exception in thread "main" java.io.FileNotFoundException: <absolute path>/jasmine_out/<absolute path>/jasmine_out/resultsstore.txt (No such file or directory)

Thank you for a great tool.

br Soeren

the parameter of --dup_to_ins

Hi
I want to use Jamine to merge SVs generated by cuteSV from multiple individuals. Could you please give me some parameter suggestions for SV calls from cuteSV? In addition, I notice that the parameter of "--dup_to_ins" is used in the demo dataset (HG002), so I would like to know when should this parameter be used?

Thanks in advance
Jiao

parameter to add variant caller name in the merged file ?

--Hi,

is there a parameter to add the variant caller name in the merged file ?

thank you --

Merge SVs with high percentage of overlap

Hi,

I'm trying your pipeline to merge my SVs, which were generated by whole genome comparisons among several de novo assemblies, into a single vcf file. I'm wondering:

if it is possible to merge SVs which are with high percentage of overlap but fail to meet the requirement of "max_dist" using Jasmine? Below lists two examples which Jasmine (only use "--output_genotypes" parameter, others are default) didn't merge. The two examples are related to the SVs in the figure.

Example 1: Same end breakpoint, 93% overlap

C3 10180346 0_INV27953 N <INV> . PASS END=10361415;SVLEN=181069;SVTYPE=INV;AVG_LEN=181069.000000;AVG_START=10180346.000000;AVG_END=10361414.000000;SUPP_VEC_EXT=10;IDLIST_EXT=INV27953;SUPP_EXT=1;SUPP_VEC=10;SUPP=1;SVMETHOD=JASMINE;IDLIST=INV27953
C3 10192856 1_INV34939 N <INV> . PASS END=10361415;SVLEN=168559;SVTYPE=INV;AVG_LEN=168559.000000;AVG_START=10192856.000000;AVG_END=10361414.000000;SUPP_VEC_EXT=01;IDLIST_EXT=INV34939;SUPP_EXT=1;SUPP_VEC=01;SUPP=1;SVMETHOD=JASMINE;IDLIST=INV34939

Example 2: Same start breakpoint, 96% overlap

C3 29342378 0_INV27963 N <INV> . PASS END=29948423;SVLEN=606045;SVTYPE=INV;AVG_LEN=606045.000000;AVG_START=29342378.000000;AVG_END=29948422.000000;SUPP_VEC_EXT=10;IDLIST_EXT=INV27963;SUPP_EXT=1;SUPP_VEC=10;SUPP=1;SVMETHOD=JASMINE;IDLIST=INV27963
C3 29342378 1_INV34950 N <INV> . PASS END=29973346;SVLEN=630968;SVTYPE=INV;AVG_LEN=630968.000000;AVG_START=29342378.000000;AVG_END=29973345.000000;SUPP_VEC_EXT=01;IDLIST_EXT=INV34950;SUPP_EXT=1;SUPP_VEC=01;SUPP=1;SVMETHOD=JASMINE;IDLIST=INV34950

For insertions, how to indicate the length of the variant with the SVLEN INFO field? Does SVLEN equal the length of inserted sequence? Is the example below correct?
C1 498768 INS37 N <INS> . PASS END=498768;ChrB=C1;StartB=496550;EndB=496651;Parent=SYN44;VarType=ShV;DupType=.;SVLEN=102;SVTYPE=INS;STRANDS=+

Thank you very much in advance for your help.

Best regards,
Chengcheng

Run iris report error（nvalid read names field）

I want to understand what caused the problem. The error is as follows:

Skipping Bomo_Chr1:59811:INS:4 because of invalid read names field: (time = 00:00:00:04.165)
Skipping Bomo_Chr1:71777:INS:5 because of invalid read names field: (time = 00:00:00:04.173)

The command is：
jasmine --output_genotypes file_list=head5.vcf.list out_file=head5.vcf genome_file=genome_assembly.fa --dup_to_ins samtools_path=/export2/software/Bases/samtools/v1.4/bin/samtools ---run_iris bam_list=head5.bam.list out_dir=test02_iris > log 2>&1

Corrupted VCF when first VCF is empty

Hi,

I reproducibly encountered corrupted VCF headers (example below) which baffled me for a while, but I think I found a clue: if the first VCF in the file_list argument has no variants (but an otherwise intact header) the merged file from jasmine ends up corrupted.

These files are from CuteSV SV calling and have been processed with iris. Files can be empty as this is just a small genomic locus I am calling in.

##FORMAT=<ID=GT,Number=1,Type=String,Description="The genotype of the variant">
##FORMAT=<ID=IS,Number=1,Type=String,Description="Whether or not the variant call was marked as specific due to high read support and length">
##FORMAT=<ID=OT,Number=1,Type=String,Description="The original type of the variant">
##FORMAT=<ID=DV,Number=1,Type=String,Description="The number of reads supporting the variant sequence">
##FORMAT=<ID=DR,Number=1,Type=String,Description="The number of reads supporting the reference sequence">
chr11   19197368        1_cuteSV.DEL.0  CCCTCCCTCCCTCCCTCCCTCCCTTCCTTCCTTCCTTCCTTCCTTC  C       162.5   PASS    PRECISE;SVTYPE=DEL;SVLEN=-45;END=19197413;CIPOS=-1,1;CILEN=-1,1;RE=24;RNAMES=03f165e6-ce6e-4878-8dad-ded5f3e01672,0d6393b1-1e0a-44ba-a870-6d2f1aa0faff,c8d946f1-41fa-4861-a416-38cee5dd0c6b,51639ac1-a885-4d04-b695-9090d48253f4,2135924b-f991-4fdf-80f7-e501d5c53a08,0e220f01-b9fb-41e6-8934-8fbdd263dbfc,813c72e8-b1cf-4a73-baff-044c15914da7,b08bc8ca-7dbd-4118-b188-672595fe18c8,b153c406-3511-44fd-822c-992e8679ddf4,d58f6f58-a198-4b90-9b94-8d7c3a14f243,e48aaa29-7fc2-4dbb-994d-6444d84d8ed9,486f270f-4169-4421-a8bc-7764f7829a0d,15e699b4-90c5-4b0a-869d-0fb88a8c0255,f3ead806-bcf1-405b-9067-0a8c6336c0a2,835bb96f-ff70-4f0a-bb4d-b0ca828cfc32,11504736-369a-493e-a239-54aaf176b140,165bbeab-666f-4fee-b57e-60338adb0892,19ba3300-9cf7-476d-ad8f-f8b8f3c119d6,0d806d4c-7696-495a-9098-d7858320850c,6dfeb9e0-8664-4d56-bc00-caabfd453e88,cc6ca9fe-e42a-4bc8-98b8-b76f76e53c00,27356926-be53-4540-b3e4-632db18112ab,6ffca776-54bb-42b2-9b91-bb8481eb6022,17f570c0-7393-43cb-a007-f99d6303ac34;STRAND=+-;IRIS_PROCESSED=1;IRIS_REFINED=0;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=-45.000000;AVG_START=19197368.000000;AVG_END=19197413.000000;SUPP_VEC_EXT=010000000000;IDLIST_EXT=cuteSV.DEL.0;SUPP_EXT=1;SUPP_VEC=010000000000;SUPP=1;SVMETHOD=JASMINE;IDLIST=cuteSV.DEL.0  GT:IS:OT:DV:DR  ./.:NA:NA:NA:NA 1/1:.:DEL:24:7  ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA
chr11   19073317        2_cuteSV.DEL.0  AATATTGATTGTGCACTCCTTGCTACTTGATAGGCCTTGTTTTAGGTGCTGGAACTGGCACACTGGGTGAGGTCTGAAATTCAGCTCCTGACTCCTGGCCCAGTGCTCTTCCCAACAGATTCCCATCCCCACCTACCTCACCCAGAGCCCAGCCTCCTGGATATAGGTCAGGGAAGCAGTGCCCAGAGCACCACCTTCCCCTCAGGACCCTGATAGGATATTGCCTGCAGCATTTTCCAGCAGGCACAAAGGGCTGAGAGGTACATCTTTGCAGTTTCCTTCTTCCCCTAGGACATTCTGGGAACCACAGGGATATGCACTACCCTTTCCTTCCTTTTGCAGGGTGGCGACAGGGACCAGGGCCACACTCACTGGGCCTAGGGGCAACCCTGGTTCAGACACTCCAGGCTACATGACCTCAGATGACTCCTGTAACCCCCTGAAGCCTTGGTTTCCTCATCTCTAAAAAGGTGATAGTGATACAGGAACTAGAAAGAAATTATTTAGGCAGATAGTGAGGGTAAGAGAGTCCTCAGTAAGGTTTCCTTTTAATAAAAAGCAGCCCCTAACTTGTTTCTTTTCTAAGAAAAAGCAACCTGAAAAATCAAGCTGCAAGCATAGATAAGCAAGCTAAAAGCTCACATAGGTAAATACTGGCAGCTGTGGCAATAGAAAAGCGATATCTGGAAGCCAGGTATATTCAACACGGAGGTTCCCTCTTCCCTTTCCTTTGTCACCACATGTGCAGTAAAAAGCAGGCAACATGGCAC      A       32.8    PASS    PRECISE;SVTYPE=DEL;SVLEN=-769;END=19074086;CIPOS=-1,1;CILEN=-1,1;RE=12;RNAMES=fa78ce1b-eb7f-4e62-8d44-5b9754533442,b2f18d6a-be6b-4d31-b387-f35ec9d1b1ef,361eab6d-7051-4345-9d3b-016d9c7b8ce6,28acb24d-a2ea-43f2-bef9-430e35f2f51d,812548f4-be03-4154-827d-a7e1aa0da954,871a0736-f0a6-44c9-a1b4-30fbc4b91a29,b8e0fd9e-717b-472a-a2c3-103e619bf385,be441563-02c0-42bd-b357-f7b6c48b414f,d412fbe0-38c4-4f87-9b93-3238a245a209,e7a2525e-697d-4e3d-8de6-4765e758350e,dca637c4-dec6-4d93-8e1a-bd16b9fe6ada,275d9909-9f80-40d2-8433-9666dc0111b5;STRAND=+-;IRIS_PROCESSED=1;IRIS_REFINED=0;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=-769.000000;AVG_START=19073317.000000;AVG_END=19074086.000000;SUPP_VEC_EXT=001000000000;IDLIST_EXT=cuteSV.DEL.0;SUPP_EXT=1;SUPP_VEC=001000000000;SUPP=1;SVMETHOD=JASMINE;IDLIST=cuteSV.DEL.0    GT:IS:OT:DV:DR  ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA 0/1:.:DEL:12:20 ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA ./.:NA:NA:NA:NA
chr11   19133944        1_cuteSV.INS.0  C       CATATATATATATATATATATATATATACAC 297.1   PASS    PRECISE;SVTYPE=INS;SVLEN=30;END=19133944;CIPOS=-1,1;CILEN=-0,0;RE=52;RNAMES=02221b02-eee5-4518-8adc-2e8565f7eb9e,11f677cd-0547-4542-97d3-b4757e5f1d8e,1a2bee2a-f5f5-4a86-a8a7-fa2483180653,1a54c7c4-b281-4c73-8390-24948a4cd3ab,25fb91e5-75c3-443f-91f9-5ce182ba574f,39889763-bb2f-45f4-bc67-7850a5ebcf9d,3a52eca4-1304-4d2b-a6a1-d9df904eba2e,3de5150f-f6bc-49f5-a91d-051dee11fdd7,40687879-70ee-4551-8919-ea8e69489405,45f8979d-d899-414b-a231-33c747cb275b,515f3861-581c-45bd-9ffe-d27dab099f92,54f72ab5-7dd0-4aa6-aa07-7563b7044a4f,5998cad6-0c12-4d3a-89eb-3bc0fad178b1,5a0c6cc2-acac-4f09-af2b-4fa25a06f2b4,5cc6bf1e-0ae4-419b-bcbb-e35d4239f8f7,5ce9bedd-f37a-4a25-9078-b3c3d560b787,6dcbbf51-96cf-491a-970c-b1e154e68e92,7121883b-9249-46df-9e19-76829fea8a71,87c091be-3b97-4158-b4a2-cff359848917,908fe6fa-a765-40bc-8ee5-9844dd143ad6,92f26169-3026-41d3-af72-5da0a7f88417,9c36ffbe-94f3-4a1a-999d-ed16d13ded35,c4ce6b77-ada9-469e-a2ca-7b62c57eae0e,c966d4be-6178-4658-95e2-2c85b5927aa6,cc04f9cd-184d-4936-a29b-e07662c2388a,cc6ca9fe-e42a-4bc8-98b8-b76f76e53c00,d40b94d0-c0ad-4718-bfc1-388163b717ad,d7edc355-ffdd-490b-bfb8-571ce32f33e6,da307122-9d3b-443c-a376-798165478e7a,db392384-f53b-44fc-b09c-9f6ccb21027a,eae91963-f8ad-492b-bc13-2e25b3890d76,276106ab-4795-4df8-b33c-bde16f088bd5,517e7258-ea50-46f2-a82f-8a17be1d8765,6e39f1ad-4d8c-44d9-bfd8-4e3e6672f38d,0fb60110-3956-4933-8fcd-82063cecd52b,946e9c4f-dbfa-4849-a65e-c05fef867a95,1e8dd61d-c7b8-431f-bd74-3986e9d7c568,418b6332-60d0-499c-abad-5e0da48795b8,5d5e2f00-f2e8-4371-bbb1-e48ac35dd242,6dec7bc8-269e-46d4-b94d-907eb8109b0e,8722574f-78aa-4605-9029-90767c859c75,afa118d5-a7bb-4f92-aa34-6eaf04d6bdc7,c99bdad8-dec2-4316-aa4e-3d587328f14b,af7f2a7b-2997-41ce-ac39-096ba0c7c170,26b82949-0260-4d91-951f-e191e7724706,4da6b0e0-6b85-4762-95b1-09466732780f,8b67bbad-8d4a-4689-a537-826c18330ea7,ccf9fae1-10f0-452f-8479-fd1b154a0d34,8856fa2b-a0b5-4fc3-8abd-3c9ae1be02f5,389a4e9c-55bc-478e-a190-e67d1fb0189e,a512ef9b-e691-4cc6-bd67-9ed839b7c595,16eccd79-5064-4155-829e-ab0779e017ac;IRIS_PROCESSED=1;IRIS_REFINED=1;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=31.900000;AVG_START=19133944.000000;AVG_END=19133944.000000;SUPP_VEC_EXT=010111111111;IDLIST_EXT=cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0;SUPP_EXT=10;SUPP_VEC=010111111111;SUPP=10;SVMETHOD=JASMINE;IDLIST=cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0,cuteSV.INS.0      GT:IS:OT:DV:DR  ./.:NA:NA:NA:NA 0/1:.:INS:52:26 ./.:NA:NA:NA:NA 0/0:.:INS:13:40 0/1:.:INS:18:31 0/1:.:INS:16:14 0/0:.:INS:10:31 0/1:.:INS:10:22 0/0:.:INS:10:31 0/1:.:INS:14:25 0/1:.:INS:18:36 0/1:.:INS:11:13

Cheers,
Wouter

Did the editDistanceSimilarity miscalculate?

With Jaccard distance, a smaller stringSimilarity means that the two sequences are less similar, which is logical in passesStringSimilarity.
But if use edit distance, the smaller stringSimilarity(defined in editDistanceSimilarity) means that two sequences are more similar, which is conflicts with passesStringSimilarity.
Is it my misunderstanding or script bug?

--dup_to_ins

When I use Jasmine to merge SVs according to the suggested pipeline, two questions about --dup_to_ins:

After running the command jasmine --dup_to_ins --preprocess_only vcf_filelist=<vcf> --comma_filelist, not all duplications were converted to insertions, in one case, for one sample, 107 duplications were converted to "INS", and 32 duplications were kept as "DUP". So what the criterion that convert the duplications to insertions?
After running the command jasmine --dup_to_ins --postprocess_only out_file=<mergedvcf>, a vcf file with the suffix _dupToIns.vcf generated in the directory output. But it seems the _dupToIns.vcf is the vcf file before conversion, and <mergedvcf> was updated as the vcf file after conversion. It's very confusing.

Thanks.

Zhongqu

Genotype columns

Hello,

We are analyzing PacBio vcf files merged via Jasmine, and were wondering how the genotype fields (columns 9 & 10) are determined from the original files. Do you know if the genotype columns are taken from one of the original vcf files, or is there any special handling to average the values?

I am asking because PacBio PBSV vcfs have SVTYPE=cnv entries, and we are interested in the copy number information from the genotype columns. By spot-checking a couple of examples, it seems that these columns are taken from the first matching vcf. For example, the following calls from two vcfs

Call from first vcf:
chr9 95697579 pbsv.CNV.1043 G . PASS SVTYPE=cnv;END=95703612;SVLEN=6033 CN 7

call from second vcf:
chr9 95697579 pbsv.CNV.1128 G . PASS SVTYPE=cnv;END=95703634;SVLEN=6055 CN 4

are merged by Jasmine (along with 500 vcfs without matching calls) and has a copy number identical to the first vcf (CN=7):

chr9 95697579 106_pbsv.CNV.1044 G . PASS SVTYPE=cnv;END=95703612;SVLEN=6033;STARTVARIANCE=0.000000;ENDVARIANCE=122.000000;AVG_LEN=6044.000000;AVG_START=95697579.000000;AVG_END=95703623.000000;SUPP_VEC_EXT=0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000;IDLIST_EXT=pbsv.CNV.1044,pbsv.CNV.1128;SUPP_EXT=2;SUPP_VEC=0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000;SUPP=2;SVMETHOD=JASMINE;IDLIST=pbsv.CNV.1044,pbsv.CNV.1128 CN 7

Thanks in advance for your response!

Best,
Carl

merging quits at the last step

Hello,

I'm trying to combine the SV's called using NGMLR+SNIFFLES of 27 individuals. WIth iris enabled, everything runs smoothly till it reaches the outputting results step, when it abruptly quits (after writing results for 1 or 2 chr). Any idea what's wrong?

My command:

java -Xmx200g -Djava.io.tmpdir=$TMPDIR -cp ${JASMINE}/src:${JASMINE}/Iris/src Main file_list=vcf.fofn out_file=jasmine.vcf bam_list=bam.fofn genome_file=reference.fasta threads=16  out_dir=jasmine-temp --run_iris --output_genotypes

The error:

...
Merging graph ID: scaf_37_INS_+-
Merging complete - outputting results
Exception in thread "main" java.lang.NullPointerException
        at VariantOutput$VariantGraph.updateOutputVariant(VariantOutput.java:293)
        at VariantOutput$VariantGraph.processVariant(VariantOutput.java:561)
        at VariantOutput.writeMergedVariants(VariantOutput.java:103)
        at Main.runJasmine(Main.java:76)
        at Main.main(Main.java:22)

(files are sorted and has not been modified after running SNIFFLES except to remove non-chr SVs)

Thanks in advance!

Support for vcf.gz files

It seems Jasmine does not support vcf.gz or bcf files:

Warning: input.vcf.gz ends with .gz, but (b)gzipped VCFs are not accepted
Exception in thread "main" java.lang.Exception: input.vcf.gz is a gzipped file, but only unzipped VCFs are accepted

Since it is quite a standard format, would it be possible for Jasmine to support both vcf.gz and bcf files?
thanks,

Expected output?

Hi,

I expected to get a VCF with per variants-file a new column with genotypes ('wide'-format?), but what I get is a VCF with just a single "sample" (identifier taken from the first VCF), and a long list of variants which seem to iterate through all variants I had in my files. It starts with chr1-2-3-4-5 etc for sample1, then restarts at chr1 for the second sample,... etc. The only way for me to connect variants in the merged file with the original sample is by using the SUPP_VEC?

Or did I do something wrong?

Thanks,
Wouter

snakemake pipeline rule merge_sorted

Hi,

Thanks for providing the snakemake pipeline. I am unfamiliar with the snakemake syntax. I encountered a problem when running the pipeline at the intra-sample merging step. I imagine this is for samples sequenced in different platforms? For my samples, I only have one fastq input file for each sample. Do you have recommendations on how to disable all the intra-sample merging step? This will create problems in samtools merge step because it expects more than one sample. I tried to remove the rule directly but it caused more problems.

Thanks,
Wei

java.lang.NullPointerException

Hi, thanks for this awesome tool! I'm currently busy implementing it into my pipeline but I'm running into this error using a minimal test data:

delly_PosCon3.vcf has 3 variants
manta_PosCon3.vcf has 1 variants
whamg_PosCon3.vcf has 1 variants
Number of threads: 2
Merging graph ID: null
Merging graph ID: chr14_DEL_
Exception in thread "main" java.lang.NullPointerException
        at java.base/java.util.TreeMap.getEntry(TreeMap.java:345)
        at java.base/java.util.TreeMap.get(TreeMap.java:277)
        at ParallelMerger$MyThread.run(ParallelMerger.java:86)
        at ParallelMerger.run(ParallelMerger.java:62)
        at Main.runJasmine(Main.java:71)
        at Main.main(Main.java:22)

The commands I used are:

$ ls *.vcf > vcfs.txt
$ jasmine \
    file_list=vcfs.txt \
    out_file=PosCon3.vcf \
    threads=2 \
    min_support=3

And the test data used is inside this folder (3 small VCF files):
jasminesv.tar.gz

I hope you are able to recreate the error with this information.

Cheers,
Nicolas

Jasmine "missing" a variant?

Thank you for writing this great tool! We have somatic variant calls from cancer cell lines that were called as unpaired tumor samples with Manta, and would like to filter them to remove variants found in a panel of normal germline samples run in Manta's diploid calling mode.

I found a variant which Jasmine thinks is only present in the cell line and none of the normals, but merging with SURVIVOR showed that the same variant is NOT present in the cell line, but is present in three of the normals. Looking at the VCFs by hand, the variant appears to be present in both the cell line and some of the normals. I'm not sure why Jasmine isn't seeing it in the normals? Could it be because it is an incompletely assembled insertion, and Manta does not seem to output an SVLEN for it in the tumor-only VCF, whereas it does output SVLEN in the diploid VCF? I thought maybe it was because the "ALT" allele isn't listed the same way in the cell line vs. the normals, but I tested my theory by changing the normal file's "ALT" to be "" as it is in the cell line, and Jasmine still did not "see" the variant in the normal.

Original call in cell line:
chr1 1033410 MantaINS:22087:7:7:0:2:0 T <INS> . PASS END=1033410;SVTYPE=INS;CIPOS=0,19;CIEND=0,19;HOMLEN=19;HOMSEQ=CAATAGTCGGGTAGTTCTT;LEFT_SVINSSEQ=CAATAGTCGGGTAGTTCTTTTATTTTTTTTTTTATTTATTTATGATAGTCACACAGAGAGAGAGAGAGAGGCAGAGACATAGGCAGAGGGAGAAGCAGGCTCCATGTACCGGGAGCCCGACGTGGGATTCGATCCTGGG;RIGHT_SVINSSEQ=GGAGAAGCAGGCGCCATGTACCGGGAGCCCGACGTGGGATTCGATCCTGGGTCTCCAGGATCACGCCCTAGGCCAAAGGCAGGCGCTAAACCGCTGCGCCACCCAGGGATCC GT:PR:SR 0/1:8,2:14,34

Original call in one of the normal VCFs:
chr1 1033410 MantaINS:141331:5:5:0:0:0 T TCAATAGTCGGGTAGTTCTTTTATTTTTTTTTTTATTTATTTATGATAGTCACACAGAGAGAGAGAGAGAGGCAGAGACATAGGCAGAGGGAGAAGCAGGCTCCATGTACCGGGAGCCCGACGTGGGATTCGATCCTGGGTCTCCAGGATCACGCCCTAGGCCAAAGGCAGGCGCTAAACCGCTGCGCCACCCAGGGATCC 999 PASS END=1033410;SVTYPE=INS;SVLEN=200;CIGAR=1M200I;CIPOS=0,19;HOMLEN=19;HOMSEQ=CAATAGTCGGGTAGTTCTT GT:FT:GQ:PL:PR:SR 1/1:PASS:151:999,154,0:0,15:0,55

Jasmine merge output. Not showing genotypes here, but you can see by the SUPP_VEC that the variant is only noted in the first sample (the cell line):
chr1 1033410 MantaINS:22087:7:7:0:2:0 T <INS> . PASS END=1033410;SVTYPE=INS;CIPOS=0,19;CIEND=0,19;HOMLEN=19;HOMSEQ=CAATAGTCGGGTAGTTCTT;LEFT_SVINSSEQ=CAATAGTCGGGTAGTTCTTTTATTTTTTTTTTTATTTATTTATGATAGTCACACAGAGAGAGAGAGAGAGGCAGAGACATAGGCAGAGGGAGAAGCAGGCTCCATGTACCGGGAGCCCGACGTGGGATTCGATCCTGGG;RIGHT_SVINSSEQ=GGAGAAGCAGGCGCCATGTACCGGGAGCCCGACGTGGGATTCGATCCTGGGTCTCCAGGATCACGCCCTAGGCCAAAGGCAGGCGCTAAACCGCTGCGCCACCCAGGGATCC;IS_SPECIFIC=1;SVLEN=0;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=0.000000;AVG_START=1033410.000000;AVG_END=1033410.000000;SUPP_VEC_EXT=100000000000000000000000;IDLIST_EXT=MantaINS:22087:7:7:0:2:0;SUPP_EXT=1;SUPP_VEC=100000000000000000000000;SUPP=1;SVMETHOD=JASMINE;IDLIST=MantaINS:22087:7:7:0:2:0

The Jasmine command I used was:
jasmine file_list=${id}_list.txt --nonlinear_dist max-dist=1000 --mark_specific spec_len=0 spec_reads=0 --normalize_type --output_genotypes --keep_var_ids out_file=${id}_pon_merged.vcf
Thank you for your help with this!

Best,
Kate

Java OutOfMemoryError

Hi,
When i try to merge 33 vcf from SyRI (about 2,000,000 variants per sample), i got this errror:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects
And it output like this:
syri.vcf has 2051564 variants syri.vcf has 1968258 variants syri.vcf has 3679710 variants syri.vcf has 3842100 variants syri.vcf has 3772814 variants syri.vcf has 1718254 variants syri.vcf has 3279550 variants syri.vcf has 1932339 variants syri.vcf has 3027315 variants syri.vcf has 3343637 variants syri.vcf has 3623126 variants syri.vcf has 2107628 variants syri.vcf has 3640813 variants syri.vcf has 2628045 variants
How can i do?
jasmine file_list=sv.file out_file=merged.vcf max_dist_linear=0.5 min_dist=100 kd_tree_norm=2 spec_len=50 min_overlap=0.9

Altered genotype?/handling translocations

Hi, thank you very much for developing a nice tool. I have two questions regarding Jasmine.
I used Bam -> Sniffle -> Jasmine to obtain a master VCF file with multiple individuals.

(1) Altered genotype?
This locus looks like all heterozygous in six individuals after Jasmine VCF., but in IGV of the Bam files. It looks like del/del in one and ref/ref in another. How can we interpret it? Am I misinterpreting the VCF result?
I first asked the Sniffle group, but they answered that Jasmine can alter genotypes - do you have any idea how it happens?

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  LLsal   Barry   tanner  Bond    Klopp   Brian
ssa01   139041651   0_24650 N   <DEL>   .   PASS    PRECISE;SVMETHOD=JASMINE;CHR2=ssa01;END=139043334;
0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2
#ssa01  139041651   0_24650 N   <DEL>   .   PASS    PRECISE;SVMETHOD=JASMINE;CHR2=ssa01;END=139043334;STD_quant_start=3.33542;STD_quant_stop=8.2991;Kurtosis_quant_start=1.31966;Kurtosis_quant_stop=2.30177;SVTYPE=DEL;RNAMES=4250ef10-dff4-4c5c-9c75-8ad085dcf7a9,43cd9070-cfef-48e3-af31-9ab39d2b93e7,49e571b9-1c82-405c-907e-f019f88f37de,8ba13242-89a8-46b0-8bbe-9ded59f71356,9674455c-0317-4798-9419-cb9e3c6db356,969e1d1e-d696-4d44-8c07-892fbfa14bd7,f6d1babd-83a1-4876-859b-0bff91cad0ce,f7a73e18-1dd5-4032-ac7e-55f61225e5c8;SUPTYPE=SR;SVLEN=-1683;STRANDS=+-;RE=8;REF_strand=1,1;AF=0.8;CONFLICT=0;OLDTYPE=DEL;IS_SPECIFIC=0;STARTVARIANCE=-4.000000;ENDVARIANCE=0.000000;AVG_LEN=-1683.000000;AVG_START=139041651.000000;AVG_END=139043334.000000;SUPP_VEC_EXT=111111;IDLIST_EXT=24650,24650,24650,24650,24650,24650;SUPP_EXT=6;SUPP_VEC=111111;SUPP=6;IDLIST=24650,24650,24650,24650,24650,24650;REFINEDALT=. GT:IS:OT:OS:DV:DR   0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2 0|1:0:DEL:.:8:2

(2) Translocations

Translocations cannot be indexed by Tabix, because the described “ending position" is smaller than the starting position.
Currently, I remove translocations from the VCF file. If you are nicely handling translocations with Tabix, please let me know how to do so.

(base) [mariesai@cn-1 Jasmine]$ tabix -p vcf  jasmine_six_phased.head.vcf.gz
[E::hts_idx_push] Invalid record on sequence #1: end 1803791 < begin 28704346
ssa01   28704346    0_512026    .   <TRA>   .SVMETHOD=JASMINE;SVTYPE=TRA;CHR2=ssa29;END=1803791;

Thank you very much for your help!

SUPP category in the output

Hi! Thanks for developing Jasmine - it looks really useful! :)

I have two quick questions; 1) what is the difference between "SUPP" and "SUPP_EXT" in the output, as they appear to be identical? 2) At the moment "SUPP" is defined as string - do you plan on changing this to float in the future for easier filtering?

Many thanks in advance!
Miyako

Problem with --output_genotypes parameter

Hi,

I've noticed that when I'm running Jasmine with parameter: --output_genotypes I get an error as following:

Exception in thread "main" java.lang.NullPointerException
    at AddGenotypes.addGenotypes(AddGenotypes.java:150)
    at PipelineManager.addGenotypes(PipelineManager.java:191)
    at Main.postprocess(Main.java:91)
    at Main.main(Main.java:27)

I only have these genotypes in my VCF files (generated by Sniffles):

0/0
0/1
1/1

Any idea what can be the problem?
Thanks,

Error: A JNI error has occurred,

Dear doctor:
sorry to bother you. I installed jasminesv use conda install jasminesv. but when I conduct jasmine -h ,these is an error happened as the follow showing. And I checked the version of my java and javac, they are same. I have no idea to deal with this issue, hope for your suggestion.

java -version
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (Zulu 8.52.0.23-CA-linux64) (build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM (Zulu 8.52.0.23-CA-linux64) (build 25.282-b08, mixed mode)

javac -version
javac 1.8.0_282

this the error meassage:

Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: Main has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:757)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:601)

Continue Jasmine from where it failed

Dear Melanie,

Hi, based on your advice, I was running Jasmine on my eleven samples. I have two questions regarding my job.

(1) The job stopped before finishing. There is no obvious error in the output file. There are several completed (I guess) "XX(0-9)_minimap.SVs.phased_dupToIns_irisRefined.vcf" files, and one unfinished "10_minimap.SVs.phased_dupToIns_irisRefined.vcf" file in the output file.

Slurm Job_id=13504607 Name=jasmine629 Ended, Run time 14-14:16:44, FAILED, ExitCode 1

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
13504607     jasmine629    hugemem       nmbu         60     FAILED      1:0 
13504607.ba+      batch                  nmbu         60     FAILED      1:0

Is it possible to continue the job where it stopped (the individual 10), rather than starting everything over? As it took so long so far.

(2) It took more than 14 days to process 10 samples and produce "minimap.SVs.phased_dupToIns_irisRefined.vcf" files so far. Is that expected? is there a way to accelerate the process? I put my script below.

#!/bin/bash
#SBATCH --ntasks=60               # 1 core(CPU)
#SBATCH --nodes=1                # Use 1 node
#SBATCH --job-name=jasmine629
#SBATCH --mem=30G                 # Default memory per CPU is 3GB
#SBATCH --partition=hugemem,orion
#SBATCH [email protected] # Email me when job is done.
#SBATCH --mail-type=END
#SBATCH --output=%x_%J_%a.out

module load Anaconda3
conda activate jasmine
module load SAMtools
module load HTSlib
module load VCFtools

vcf_list='./vcf.list'
bam_list='./bam.list'
genome_file='/CHR_selected.fa'

jasmine file_list=vcf.list out_file=./jasmine11.629.vcf genome_file=/CHR_selected.fa bam_list=bam.list threads=60 --output_genotypes --normalize_type --dup_to_ins --run_iris iris_args=--keep_long_variants  --default_zero_genotype --ignore_strand

I thought
threads=60
works throughout the job, but perhaps I have misused this option.

Thank you very much, again.

mkirsche / jasmine Goto Github PK

jasmine's People

Contributors

Stargazers

Watchers

Forkers

jasmine's Issues

Minimal example

Expected behaviour

Unexpected behaviour

Install info

Step 1

Step 2

Recommend Projects

Recommend Topics

Recommend Org