Giter Club home page Giter Club logo

truvari's Introduction

PyPI version pylint FuncTests coverage develop Downloads

Logo
Toolkit for benchmarking, merging, and annotating Structural Variants

πŸ“š WIKI page has detailed documentation.
πŸ“ˆ See Updates on new versions.
πŸ“ Read our Papers (#1, #2) to learn more.

πŸ’» Installation

Truvari uses Python 3.6+ and can be installed with pip:

  python3 -m pip install Truvari 

For details and more installation options, see Installation on the wiki.

⏩ Quick Start

Each sub-command contains help documentation. Start with truvari -h to see available commands.

The current most common Truvari use case is for structural variation benchmarking:

  truvari bench -b base.vcf.gz -c comp.vcf.gz -o output_dir/

🧬 Truvari Commands

  • bench - Performance metrics from comparison of two VCFs
  • collapse - Collapse possibly redundant VCF entries
  • anno - Add SV annotations to a VCF
  • consistency - Consistency report between multiple VCFs
  • vcf2df - Turn a VCF into a pandas DataFrame
  • segment - Normalization of SVs into disjointed genomic regions
  • stratify - Count variants per-region in vcf
  • divide - Divide a VCF into independent shards
  • phab - Harmonize variant representations using MSA
  • refine - Automated bench result refinement with phab

πŸ”Ž More Information

All documentation about Truvari is on the WIKI. Additional information about using Truvari can be found in Discussions

truvari's People

Contributors

acenglish avatar bnoyvert avatar caspargross avatar chapmanb avatar ctsa avatar github-actions[bot] avatar hackerfriendly avatar mohammedkhalfan avatar pwwang avatar scalavision avatar turion avatar wwliao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

truvari's Issues

Definition of "refdist"

Hi,

I'd like to ask for more detail of the definition of "refdist".
The first step in methodology is:
Fetch CompCalls overlapping within refdist.

I found it is not required that
base_call_start - redist < comp_call_start < base_call_start + refdist AND
base_call_end - redist < comp_call_end < base_call_end + refdist

Does this step fetch all CompCalls having at least 1pb overlapping within base_call_start - refdist to base_call_end + refdist?

Thank you very much!

Latest intervaltree API update breaks on current truvari code

Hi, Thanks for releasing truvari. I just installed truvari according to the readme and got the following error the first time I ran it:

AttributeError: 'IntervalTree' object has no attribute 'search'

This looks like it is caused by a breaking API change in the intervaltree 3 release. I changed my installation to use:

pip install intervaltree==2.1.0
pip install pyvcf python-Levenshtein progressbar2 pysam pyfaidx

...instead, and everything seems to be working as documented now. Could you update the README installation instructions to reflect this change?

TypeError: values expected to be 1-tuple, given len=2

Hello, I'm having some issues on truvari 1.3.4 and 1.3.2, but not on 1.2

truvari -b HG002_SVs_Tier1_v0.6.vcf.gz -c NIST-002_merged_minimap2_ont_s2_sorted.vcf.gz -o Test_ont -r 2000 --pctsim 0 --passonly --includebed HG002_SVs_Tier1_v0.6.bed
2019-12-26 18:46:36,382 [INFO] Running /home/tintest/miniconda3/bin/truvari -b HG002_SVs_Tier1_v0.6.vcf.gz -c NIST-002_merged_minimap2_ont_s2_sorted.vcf.gz -o Test_ont -r 2000 --pctsim 0 --passonly --includebed HG002_SVs_Tier1_v0
.6.bed
2019-12-26 18:46:36,383 [INFO] Params:
{
    "base": "HG002_SVs_Tier1_v0.6.vcf.gz",
    "comp": "NIST-002_merged_minimap2_ont_s2_sorted.vcf.gz",
    "output": "Test_ont",
    "reference": null,
    "giabreport": false,
    "debug": false,
    "prog": false,
    "refdist": 2000,
    "pctsim": 0.0,
    "pctsize": 0.7,
    "pctovl": 0.0,
    "typeignore": false,
    "gtcomp": false,
    "bSample": null,
    "cSample": null,
    "sizemin": 50,
    "sizefilt": 30,
    "sizemax": 50000,
    "passonly": true,
    "no_ref": false,
    "includebed": "HG002_SVs_Tier1_v0.6.bed",
    "multimatch": false
}
[W::hts_idx_load2] The index file is older than the data file: HG002_SVs_Tier1_v0.6.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: HG002_SVs_Tier1_v0.6.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: NIST-002_merged_minimap2_ont_s2_sorted.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: NIST-002_merged_minimap2_ont_s2_sorted.vcf.gz.tbi
2019-12-26 18:46:37,234 [INFO] Including 34830 bed regions
2019-12-26 18:46:37,234 [INFO] Creating call interval tree for overlap search
2019-12-26 18:58:31,367 [INFO] 14398 call variants in total
2019-12-26 18:58:31,367 [INFO] 12793 call variants within size range (30, 50000)
[W::hts_idx_load2] The index file is older than the data file: HG002_SVs_Tier1_v0.6.vcf.gz.tbi
[W::hts_idx_load2] The index file is older than the data file: HG002_SVs_Tier1_v0.6.vcf.gz.tbi
2019-12-26 18:58:31,391 [INFO] Matching base to calls
Traceback (most recent call last):
  File "/home/tintest/miniconda3/bin/truvari", line 1051, in <module>
    main(sys.argv[1:])
  File "/home/tintest/miniconda3/bin/truvari", line 937, in main
    match_entry = copy_entry(match_entry, n_comp_header)
  File "/home/tintest/miniconda3/bin/truvari", line 663, in copy_entry
    info=entry.info)
  File "pysam/libcbcf.pyx", line 2080, in pysam.libcbcf.VariantHeader.new_record
  File "pysam/libcbcf.pyx", line 2746, in pysam.libcbcf.VariantRecordInfo.update
  File "pysam/libcbcf.pyx", line 2569, in pysam.libcbcf.VariantRecordInfo.__setitem__
  File "pysam/libcbcf.pyx", line 683, in pysam.libcbcf.bcf_info_set_value
  File "pysam/libcbcf.pyx", line 579, in pysam.libcbcf.bcf_check_values
TypeError: values expected to be 1-tuple, given len=2

Do you know what this may be due to ?

Thank you, regards.

List index out of range

Hi,

When I run truvari like ./truvari.py -b base_calls.vcf -c compare_calls.vcf -o output_dir/
I got the following error message. But I didn't get this for my another compare_calls.vcf.
Could you give some suggestions on how to solve this?
Thank you very much.

Traceback (most recent call last):
File "truvari.py", line 926, in
run(sys.argv[1:])
File "truvari.py", line 660, in run
contig = list(vcfB.contigs.keys())[0]
IndexError: list index out of range

Haplotype matching for differently represented alleles.

Instead of directly comparing the VCF's alleles, need to create the haplotypes and compare regions for matching.
e.g.
REF = AB
Allele1 = ins@pos1 BA
Allele2 = ins@pos0 (or pos2) AB
BA != AB for current allele matching, but both alleles create "ABAB" in sample genome.

  • pyfaidx
  • nhansen method
  • docs update

SVs without SVLEN in vcf file

Hi,
I'd like to ask how truvari deal with SV entries without SVLEN, such as inter-chromosomal translocation, which does not have SVLEN in INFO in vcf file.

Thank you very much.

ValueError: 'pysam.libcbcf.VariantFile.fetch' requires an index

Hi @ACEnglish ,
Thanks for your great SV benchmark tool Truvari.
Recently, I update my local truvari to the latest version 1.3.4, however, a ValueError arose as below:

2019-11-09 06:16:52,814 [INFO] Running /home/tjiang/Tools/truvari/truvari/truvari -f /data/tjiang/SV_benchmark/ref/human_hs37d5.fasta -b /home/tjiang/HG002_SVs_Tier1_v0.6.vcf --includebed /data/tjiang/SV_benchmark/giab/HG002_SVs_Tier1_v0.6.bed -o bench-10x --passonly --giabreport -r 1000 -p 0 -c hg2_ont_10x_indel.vcf.gz
2019-11-09 06:16:52,814 [INFO] Params:
{
"base": "/home/tjiang/HG002_SVs_Tier1_v0.6.vcf",
"comp": "hg2_ont_10x_indel.vcf.gz",
"output": "bench-10x",
"reference": "/data/tjiang/SV_benchmark/ref/human_hs37d5.fasta",
"giabreport": true,
"debug": false,
"prog": false,
"refdist": 1000,
"pctsim": 0.0,
"pctsize": 0.7,
"pctovl": 0.0,
"typeignore": false,
"gtcomp": false,
"bSample": null,
"cSample": null,
"sizemin": 50,
"sizefilt": 30,
"sizemax": 50000,
"passonly": true,
"no_ref": false,
"includebed": "/data/tjiang/SV_benchmark/giab/HG002_SVs_Tier1_v0.6.bed",
"multimatch": false
}
2019-11-09 06:16:54,223 [INFO] Including 34830 bed regions
2019-11-09 06:16:54,223 [INFO] Creating call interval tree for overlap search
[W::vcf_parse] INFO 'BREAKPOINT_STD' is not defined in the header, assuming Type=String
[W::vcf_parse] INFO 'SVLEN_STD' is not defined in the header, assuming Type=String
2019-11-09 06:17:00,549 [INFO] 19009 call variants in total
2019-11-09 06:17:00,549 [INFO] 19008 call variants within size range (30, 50000)
2019-11-09 06:17:00,551 [INFO] Matching base to calls
Traceback (most recent call last):
File "/home/tjiang/Tools/truvari/truvari/truvari", line 1051, in
main(sys.argv[1:])
File "/home/tjiang/Tools/truvari/truvari/truvari", line 790, in main
for pbarcnt, base_entry in enumerate(regions.iterate(vcf_base)):
File "/home/tjiang/Tools/truvari/truvari/truvari", line 540, in iterate
for entry in vcf_file.fetch(chrom, intv.begin, intv.end):
File "pysam/libcbcf.pyx", line 4359, in pysam.libcbcf.VariantFile.fetch
ValueError: fetch requires an index

Do you have any ideas to help me fix this error?
Thank you very much!
tjiang

template not working properly

I get this output in my template sheet:

True Positives | 2,795
False Negatives | 1,404
False Positives | 77
Precision | None
Recall | /gold/HG002_DEL_Tier1_v0.6.vcf.gz
F1 | 97.32%

The wrong cells are Precision and Recall. I am using truvari version 2.0.2, and using the path to the template given at the wiki:

https://docs.google.com/spreadsheets/d/1T3EdpyLO1Kq-bJ8SDatqJ5nP_wwFKCrH0qhxorvTVd4/edit?usp=sharing

It seems Precision should use $RawData.B181 and Recall $RawData.B182

Error parsing SVLEN=.

I am running truvari installed via python3 pip. It has worked very well for a few SV files, but I hit a problem with the Mt Sinai NA12878 published SVs from GIAB(ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NA12878_PacBio_MtSinai).
Warning- i had to hack the headers a bit to even get this VCF to run & even so I had a filter parsing error that I avoided by only processing passing filter calls. But when just processingfilter PASS calls, I hit a parsing error that appears to be easily fixable.

In this VCF file, SVLEN can sometimes be '=.', which chokes truvari.py at line 264. I hacked in a fix that handles the SVLEN=. as follows:

if "SVLEN" in entry.info:
    if type(entry.info["SVLEN"]) in [list, tuple]:
        if entry.info['SVLEN'][0] is None:
            size = entry.stop - entry.start
        else:
            size = abs(entry.info["SVLEN"][0])
    else:
        size = abs(entry.info["SVLEN"])

What does 'sequence resolved calls' mean?

Dear Truvari developers,

In truvari document, the term 'sequence resolved calls' has been mentioned. I quite did not get the meaning. Would you please explain it.

update to include v0.6 in readme

It would be good to update the GIAB report part of the readme to reflect that it works with v0.6. You could change this:

This currently only works with GIAB SV v0.5. Work will need to be done to ensure Truvari can parse future GIAB SV releases.

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_UnionSVs_12122017/

to this:

While truvari can use other benchmark sets, this formatted report currently only works with GIAB SV v0.5 and v0.6. Work will need to be done to ensure Truvari can parse future GIAB SV releases.

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/

Question about SVTYPE=INS - part2

Dear Truvari developers,
as suggested answering a previous question of mine, since I want to compare DUP/INS 'calls' to INS from 'base' GIAB vcf file, I subset the base/comp VCFs to only entries of a certain type, then used --typeignore to avoid all DUP calls being considered as False Positives because of non-matching SV type.
My additional questions are:

  • As for the --sizemin and --sizefilt parameters, since INS tipically have starting and ending coordinate that are identical, should I modify the default values (--sizemin 50 --sizefilt 30) ?

  • Is Truvari able to parse AVGLEN (from Parliament2 output) for inferring the INS size, in order to perform a comparison of non-sequence-resolved calls (--pctsim 0)?

KeyError: 'Illcalls'

Hi,

I have ran some commands and the summary.txt and other files were generated, but I had trouble generating the giabreport. Tracing back to the error, I tried printing out the fn by callingprint(fn) and the array was printed. However, there was no *call keys found in this array. Could you let me know what I am missing?

2020-06-17 02:12:32,796 [WARNING] Excluding 59 contigs present in comparison calls header but not base calls.
2020-06-17 02:12:32,797 [INFO] Creating call interval tree for overlap search
2020-06-17 02:12:32,798 [INFO] 1 call variants in total
2020-06-17 02:12:32,798 [INFO] 1 call variants within size range (30, 50000)
2020-06-17 02:12:32,800 [INFO] Matching base to calls
2020-06-17 02:12:32,802 [WARNING] No TP or FP calls in base!
2020-06-17 02:12:32,802 [INFO] Parsing FPs from calls
2020-06-17 02:12:32,861 [INFO] Stats: {
    "TP-base": 0,
    "TP-call": 0,
    "FP": 1,
    "FN": 20,
    "precision": 0,
    "recall": 0,
    "f1": "NaN",
    "base cnt": 20,
    "call cnt": 1,
    "base size filtered": 2,
    "call size filtered": 0,
    "base gt filtered": 0,
    "call gt filtered": 0,
    "TP-call_TP-gt": 0,
    "TP-call_FP-gt": 0,
    "TP-base_TP-gt": 0,
    "TP-base_FP-gt": 0,
    "gt_precision": 0,
    "gt_recall": 0,
    "gt_f1": "NaN"
}
2020-06-17 02:12:32,861 [INFO] Creating GIAB report
[{'SVTYPE': 'DUP', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'esv3637537.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 115679, 'ID': '91', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DUP', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'esv3637540.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 162127, 'ID': '94', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'esv3637541.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 192277, 'ID': '125', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'esv3637543.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 196434, 'ID': '96', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha27.6.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 198215, 'ID': '215', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'THAI.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 199800, 'ID': '202', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DUP', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'anti11.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 203016, 'ID': '3', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha11.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 203016, 'ID': '4', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha21.9.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 203509, 'ID': '134', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DUP', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'anti20.9.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 215144, 'ID': '9', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': '16.sea.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 215400, 'ID': '8', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha4.2.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 219817, 'ID': '5', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DUP', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'anti4.2.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 219817, 'ID': '6', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha30.8.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 221260, 'ID': '210', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha3.7.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 223300, 'ID': '1', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DUP', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'anti3.7.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 223300, 'ID': '2', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha2.7.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 225800, 'ID': '21', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha2.4.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 225996, 'ID': '15', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'alpha1.2.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 227121, 'ID': '16', 'FILTER': '', 'QUAL': None}, {'SVTYPE': 'DEL', 'SVMETHOD': 'CNVar', 'LEN': '34572', 'VARIANT': 'esv3637554.fa', 'NumNeighbors': 0, 'NumThresholdNeighbors': 0, 'SAMPLE_GT': '0/1', 'CHROM': '16', 'POS': 248243, 'ID': '99', 'FILTER': '', 'QUAL': None}]
Traceback (most recent call last):
  File "/root/miniconda3/bin/truvari", line 1051, in <module>
    main(sys.argv[1:])
  File "/root/miniconda3/bin/truvari", line 1045, in main
    make_giabreport(args, stats_box)
  File "/root/miniconda3/bin/truvari", line 422, in make_giabreport
    collapse_techs(fn)
  File "/root/miniconda3/bin/truvari", line 409, in collapse_techs
    if d[i] > 0:
KeyError: 'Illcalls'

TypeError: '>=' not supported between instances of 'int' and 'NoneType'

two vcf inputs are essentially the same and the headers and the first entry are below:

##fileformat=VCFv4.2
##fileDate='Thu Dec 12 18:26:36 CST 2019'
##source=SentieonMGC-callerStr1-dynamic
##reference=HG19
##suppliedSex=MAL
##predictedSex=MALE
##avgRPKMCorrelationWrtTrainingSamples=0.937643587011545
##medianAbsoluteDeviation(log2R diff)=0.104296609837663
##log2RWaviness=1.4025688822472
##contig=<ID=chr1,Description=chromosome chr1>
##contig=<ID=chr2,Description=chromosome chr2>
##contig=<ID=chr3,Description=chromosome chr3>
##contig=<ID=chr4,Description=chromosome chr4>
##contig=<ID=chr5,Description=chromosome chr5>
##contig=<ID=chr6,Description=chromosome chr6>
##contig=<ID=chr7,Description=chromosome chr7>
##contig=<ID=chr8,Description=chromosome chr8>
##contig=<ID=chr9,Description=chromosome chr9>
##contig=<ID=chr10,Description=chromosome chr10>
##contig=<ID=chr11,Description=chromosome chr11>
##contig=<ID=chr12,Description=chromosome chr12>
##contig=<ID=chr13,Description=chromosome chr13>
##contig=<ID=chr14,Description=chromosome chr14>
##contig=<ID=chr15,Description=chromosome chr15>
##contig=<ID=chr16,Description=chromosome chr16>
##contig=<ID=chr17,Description=chromosome chr17>
##contig=<ID=chr18,Description=chromosome chr18>
##contig=<ID=chr19,Description=chromosome chr19>
##contig=<ID=chr20,Description=chromosome chr20>
##contig=<ID=chr21,Description=chromosome chr21>
##contig=<ID=chr22,Description=chromosome chr22>
##contig=<ID=chrX,Description=chromosome chrX>
##contig=<ID=chrY,Description=chromosome chrY>
##contig=<ID=chrM,Description=chromosome chrM>
##contig=<ID=chrGL000207.1,Description=chromosome chrGL000207.1>
##contig=<ID=chrGL000226.1,Description=chromosome chrGL000226.1>
##contig=<ID=chrGL000229.1,Description=chromosome chrGL000229.1>
##contig=<ID=chrGL000231.1,Description=chromosome chrGL000231.1>
##contig=<ID=chrGL000210.1,Description=chromosome chrGL000210.1>
##contig=<ID=chrGL000239.1,Description=chromosome chrGL000239.1>
##contig=<ID=chrGL000235.1,Description=chromosome chrGL000235.1>
##contig=<ID=chrGL000201.1,Description=chromosome chrGL000201.1>
##contig=<ID=chrGL000247.1,Description=chromosome chrGL000247.1>
##contig=<ID=chrGL000245.1,Description=chromosome chrGL000245.1>
##contig=<ID=chrGL000197.1,Description=chromosome chrGL000197.1>
##contig=<ID=chrGL000203.1,Description=chromosome chrGL000203.1>
##contig=<ID=chrGL000246.1,Description=chromosome chrGL000246.1>
##contig=<ID=chrGL000249.1,Description=chromosome chrGL000249.1>
##contig=<ID=chrGL000196.1,Description=chromosome chrGL000196.1>
##contig=<ID=chrGL000248.1,Description=chromosome chrGL000248.1>
##contig=<ID=chrGL000244.1,Description=chromosome chrGL000244.1>
##contig=<ID=chrGL000238.1,Description=chromosome chrGL000238.1>
##contig=<ID=chrGL000202.1,Description=chromosome chrGL000202.1>
##contig=<ID=chrGL000234.1,Description=chromosome chrGL000234.1>
##contig=<ID=chrGL000232.1,Description=chromosome chrGL000232.1>
##contig=<ID=chrGL000206.1,Description=chromosome chrGL000206.1>
##contig=<ID=chrGL000240.1,Description=chromosome chrGL000240.1>
##contig=<ID=chrGL000236.1,Description=chromosome chrGL000236.1>
##contig=<ID=chrGL000241.1,Description=chromosome chrGL000241.1>
##contig=<ID=chrGL000243.1,Description=chromosome chrGL000243.1>
##contig=<ID=chrGL000242.1,Description=chromosome chrGL000242.1>
##contig=<ID=chrGL000230.1,Description=chromosome chrGL000230.1>
##contig=<ID=chrGL000237.1,Description=chromosome chrGL000237.1>
##contig=<ID=chrGL000233.1,Description=chromosome chrGL000233.1>
##contig=<ID=chrGL000204.1,Description=chromosome chrGL000204.1>
##contig=<ID=chrGL000198.1,Description=chromosome chrGL000198.1>
##contig=<ID=chrGL000208.1,Description=chromosome chrGL000208.1>
##contig=<ID=chrGL000191.1,Description=chromosome chrGL000191.1>
##contig=<ID=chrGL000227.1,Description=chromosome chrGL000227.1>
##contig=<ID=chrGL000228.1,Description=chromosome chrGL000228.1>
##contig=<ID=chrGL000214.1,Description=chromosome chrGL000214.1>
##contig=<ID=chrGL000221.1,Description=chromosome chrGL000221.1>
##contig=<ID=chrGL000209.1,Description=chromosome chrGL000209.1>
##contig=<ID=chrGL000218.1,Description=chromosome chrGL000218.1>
##contig=<ID=chrGL000220.1,Description=chromosome chrGL000220.1>
##contig=<ID=chrGL000213.1,Description=chromosome chrGL000213.1>
##contig=<ID=chrGL000211.1,Description=chromosome chrGL000211.1>
##contig=<ID=chrGL000199.1,Description=chromosome chrGL000199.1>
##contig=<ID=chrGL000217.1,Description=chromosome chrGL000217.1>
##contig=<ID=chrGL000216.1,Description=chromosome chrGL000216.1>
##contig=<ID=chrGL000215.1,Description=chromosome chrGL000215.1>
##contig=<ID=chrGL000205.1,Description=chromosome chrGL000205.1>
##contig=<ID=chrGL000219.1,Description=chromosome chrGL000219.1>
##contig=<ID=chrGL000224.1,Description=chromosome chrGL000224.1>
##contig=<ID=chrGL000223.1,Description=chromosome chrGL000223.1>
##contig=<ID=chrGL000195.1,Description=chromosome chrGL000195.1>
##contig=<ID=chrGL000212.1,Description=chromosome chrGL000212.1>
##contig=<ID=chrGL000222.1,Description=chromosome chrGL000222.1>
##contig=<ID=chrGL000200.1,Description=chromosome chrGL000200.1>
##contig=<ID=chrGL000193.1,Description=chromosome chrGL000193.1>
##contig=<ID=chrGL000194.1,Description=chromosome chrGL000194.1>
##contig=<ID=chrGL000225.1,Description=chromosome chrGL000225.1>
##contig=<ID=chrGL000192.1,Description=chromosome chrGL000192.1>
##contig=<ID=chrNC_007605,Description=chromosome chrNC_007605>
##contig=<ID=chrhs37d5,Description=chromosome chrhs37d5>
##contig=<ID=phix,Description=chromosome phix>
##ALT=<ID=CNV,Description="Copy number variable region">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=DEL,Description="Deletion">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this
record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT al
leles">
##INFO=<ID=SRC,Number=1,Type=String,Description="Source of this event">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene identifier",Source="reporting.bed">
##INFO=<ID=TRANSCRIPT,Number=1,Type=String,Description="Transcript identifier",Source="reportin
g.bed">
##INFO=<ID=LOC,Number=1,Type=String,Description="Feature identifier",Source="reporting.bed">
##INFO=<ID=PEERS,Number=1,Type=String,Description="Number of reference samples utilized for ana
lysis">
##INFO=<ID=IQR,Number=1,Type=Float,Description="Inter-quartile range of log2Ratio values among
training samples">
##INFO=<ID=FEATURE_CATEGORY,Number=.,Type=String,Description="Category to which feature belongs
to">
##INFO=<ID=SNR,Number=1,Type=Float,Description="Signal-to-noise ratio for SNR">
##INFO=<ID=CALL,Number=1,Type=String,Description="Category call">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=LG2R,Number=1,Type=Float,Description="CNV Log2 Ratio">
##FORMAT=<ID=ZSCORE,Number=1,Type=Float,Description="Z-Score">
##FORMAT=<ID=-LOG10PVALUE,Number=1,Type=Float,Description="-log10 transformed P-value">
##FORMAT=<ID=TRAINING,Number=1,Type=Integer,Description="Sample label (0:test, 1:training)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002 NGS82-DV042 NGS82-DV008 NGS82-DV014 NGS82-DV043 NGS82-DV044 NGS82-DV051 HG003 HG004 NGS82-DV046 MGS82-DV047 NGS82-DV048 NGS82-DV049 NGS82-DV050 NGS82-DV001
chr1 1475628 . G . . SVTYPE=DEL;SNR=24.081;END=1475747;SVLEN
=120;GENE=476_458424_339453(TMEM240)_1.1_1;TRANSCRIPT=NA;LOC=0;SRC=SentieonMGC-callerStr1-dynam
ic;PEERS=14;IQR=0.135;FEATURE_CATEGORY=CAT1;CALL=DEL GT:LG2R:ZSCORE:-LOG10PVALUE:TRAINING 1:0.505:-4.919:6.424:0 .:0.05:.:.:1 .:0.999:.:.:1 .:0.063:.:.:1 .:0.004:.:.:1 .:0.032
:.:.:1 .:0.018:.:.:1 .:0.083:.:.:1 .:0.348:.:.:1 .:0.023:.:.:1 .:0.101:.:.:1 .:0.089
:.:.:1 .:0.221:.:.:1 .:0.12:.:.:1 .:0.027:.:.:1

Detailed error message:
Traceback (most recent call last):
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/bin/truvari", line 1047, in
run(sys.argv[1:])
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/bin/truvari", line 731, in run
regions = GenomeTree(vcf_base, vcf_comp, args.includebed)
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/bin/truvari", line 537, in init
all_regions[name].addi(0, length)
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/lib/python3.6/site-packages/intervaltree/intervaltree.py", line 343, in addi
return self.add(Interval(begin, end, data))
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/lib/python3.6/site-packages/intervaltree/intervaltree.py", line 323, in add
if interval.is_null():
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/lib/python3.6/site-packages/intervaltree/interval.py", line 116, in is_null
return self.begin >= self.end
TypeError: '>=' not supported between instances of 'int' and 'NoneType'

Does SVTYPE of a variant must be either DEL or INS for Truvari to be able to handle it?

Dear Truvari developers,

Is it so that Truvari needs to have 'DEL' or 'INS' as SVTYPE, meaning it would not be able to benchmark variants with SVTYPE=DUP for instance? For example, I have this variant

19 8998800 DUP_delly_421 N 1000 PASS SVTYPE=DUP;SVMETHOD=EMBL.DELLYv0.7.8;CHR2=19;END=9010766;SVLEN=11966;OCC=1;FRQ=0.000447427 GT 0/0

Truvari does not return this as true positive if SVTYPE is 'DUP'

However if I change 'DUP' to 'INS',

19 8998800 DUP_delly_421 N 1000 PASS SVTYPE=INS;SVMETHOD=EMBL.DELLYv0.7.8;CHR2=19;END=9010766;SVLEN=11966;OCC=1;FRQ=0.000447427 GT 0/0

it seems to return it as true positive.

Also, in many other samples, Truvari seems to catch DEL ones better than INS. It did not for once returned DUP ones as true positive.

Truvari 1.3.4 freezes during analysis of DELLY results

Hi,

I have attempted to use Truvari 1.3.4 for the analysis of DELLY output. Truvari freezes at the stage of creating call interval trees. Despite active --debug flag, there is no detailed information about possible problem in the log. Truvari 1.2 performs the analysis without any problems.
DELLY sample
DELLY.tst.vcf.gz
Log:
2019-11-14 15:27:51,456 [INFO] Running /home/usr/miniconda3/bin/truvari --reference /home/usr/Documents/references/b37_decoy/human_g1k_v37_decoy.fasta --includebed /home/usr/Documents/GIAB/NIST_SV_Integration_v0.6/HG002_SVs_Tier1_v0.6.bed --base /home/usr/Documents/GIAB/NIST_SV_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz --comp DDD/DELLY.tst.vcf.gz --output /home/usr/Documents/SV_benchmarking/truvari/DELLY.tst_TV1.3.4_2019-11-14_15-27-51 --giabreport --refdist 500 --pctsim 0 --pctsize 0.7 --pctovl 0 --sizemin 50 --sizemax 150000 --sizefilt 30 --passonly --no-ref a --prog --debug
2019-11-14 15:27:51,456 [INFO] Params:
{
"base": "/home/usr/Documents/GIAB/NIST_SV_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz",
"comp": "DDD/DELLY.tst.vcf.gz",
"output": "/home/usr/Documents/SV_benchmarking/truvari/DELLY.tst_TV1.3.4_2019-11-14_15-27-51",
"reference": "/home/usr/Documents/references/b37_decoy/human_g1k_v37_decoy.fasta",
"giabreport": true,
"debug": true,
"prog": true,
"refdist": 500,
"pctsim": 0.0,
"pctsize": 0.7,
"pctovl": 0.0,
"typeignore": false,
"gtcomp": false,
"bSample": null,
"cSample": null,
"sizemin": 50,
"sizefilt": 30,
"sizemax": 150000,
"passonly": true,
"no_ref": "a",
"includebed": "/home/usr/Documents/GIAB/NIST_SV_Integration_v0.6/HG002_SVs_Tier1_v0.6.bed",
"multimatch": false
}
2019-11-14 15:27:52,416 [INFO] Including 34830 bed regions
2019-11-14 15:27:52,416 [INFO] Creating call interval tree for overlap search

Two ValueError: invalid literal for int() with base 10: 'False', could not convert string to float: 'False'

Hi all,

I'm trying to run this comparison command:

truvari -b NA12878.sorted.vcf -c sorted.vcf.gz --pctsim=0 -o compare_dir/

But it gives me this error:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/site-packages/vcf/parser.py", line 390, in _parse_info
    val = self._map(int, vals)
  File "/anaconda3/lib/python3.7/site-packages/vcf/parser.py", line 360, in _map
    for x in iterable]
  File "/anaconda3/lib/python3.7/site-packages/vcf/parser.py", line 360, in <listcomp>
    for x in iterable]
ValueError: invalid literal for int() with base 10: 'False'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/anaconda3/bin/truvari", line 1047, in <module>
    run(sys.argv[1:])
  File "/anaconda3/bin/truvari", line 742, in run
    for entry in regions.iterate(vcf_base):
  File "/anaconda3/bin/truvari", line 545, in iterate
    for entry in vcf_file:
  File "/anaconda3/lib/python3.7/site-packages/vcf/parser.py", line 572, in __next__
    info = self._parse_info(row[7])
  File "/anaconda3/lib/python3.7/site-packages/vcf/parser.py", line 394, in _parse_info
    val = self._map(float, vals)
  File "/anaconda3/lib/python3.7/site-packages/vcf/parser.py", line 360, in _map
    for x in iterable]
  File "/anaconda3/lib/python3.7/site-packages/vcf/parser.py", line 360, in <listcomp>
    for x in iterable]
ValueError: could not convert string to float: 'False'

Any help/suggest to solve that please?

Locating truvari.py

Hi

I installed your program on our cluster but I don't know how to locate the executable .py

(/home/fi1d18/.conda/onco) [fi1d18@cyan01 shared]$ pip install pyvcf python-Levenshtein progressbar2 pysam pyfaidx intervaltree==2.1.0
Requirement already satisfied: pyvcf in /temp/hgig/fi1d18/.conda/onco/lib/python3.6/site-packages (0.6.8)
Collecting python-Levenshtein
  Downloading https://files.pythonhosted.org/packages/42/a9/d1785c85ebf9b7dfacd08938dd028209c34a0ea3b1bcdb895208bd40a67d/python-Levenshtein-0.12.0.tar.gz (48kB)
    100% |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 51kB 4.9MB/s
Collecting progressbar2
  Downloading https://files.pythonhosted.org/packages/fb/89/d90f9ff03285d8eb56994e8cec1b73a4d0dc9bb529c1f8e8e10b1b663843/progressbar2-3.39.3-py2.py3-none-any.whl
Requirement already satisfied: pysam in /temp/hgig/fi1d18/.conda/onco/lib/python3.6/site-packages (0.15.2)
Collecting pyfaidx
  Downloading https://files.pythonhosted.org/packages/75/a5/7e2569527b3849ea28d79b4f70d7cf46a47d36459bc59e0efa4e10e8c8b2/pyfaidx-0.5.5.2.tar.gz
Requirement already satisfied: intervaltree==2.1.0 in /temp/hgig/fi1d18/.conda/onco/lib/python3.6/site-packages (2.1.0)
Requirement already satisfied: setuptools in /temp/hgig/fi1d18/.conda/onco/lib/python3.6/site-packages (from pyvcf) (40.9.0)
Collecting python-utils>=2.3.0 (from progressbar2)
  Downloading https://files.pythonhosted.org/packages/eb/a0/19119d8b7c05be49baf6c593f11c432d571b70d805f2fe94c0585e55e4c8/python_utils-2.3.0-py2.py3-none-any.whl
Requirement already satisfied: six in /temp/hgig/fi1d18/.conda/onco/lib/python3.6/site-packages (from progressbar2) (1.12.0)
Requirement already satisfied: sortedcontainers in /temp/hgig/fi1d18/.conda/onco/lib/python3.6/site-packages (from intervaltree==2.1.0) (2.1.0)
Building wheels for collected packages: python-Levenshtein, pyfaidx
  Building wheel for python-Levenshtein (setup.py) ... done
  Stored in directory: /home/fi1d18/.cache/pip/wheels/de/c2/93/660fd5f7559049268ad2dc6d81c4e39e9e36518766eaf7e342
  Building wheel for pyfaidx (setup.py) ... done
  Stored in directory: /home/fi1d18/.cache/pip/wheels/54/a2/b4/e242e58d23b2808e191b214067880faa46cd2341f363886e0b
Successfully built python-Levenshtein pyfaidx
Installing collected packages: python-Levenshtein, python-utils, progressbar2, pyfaidx
Successfully installed progressbar2-3.39.3 pyfaidx-0.5.5.2 python-Levenshtein-0.12.0 python-utils-2.3.0
(/home/fi1d18/.conda/onco) [fi1d18@cyan01 shared]$ truvari.py -b base_calls.vcf -c compare_calls.vcf -o output_dir
-bash: truvari.py: command not found

Could you please help me?

Thanks a lot in advance

getting a corrected VCF from truvari

Hi,
I wanted to know whether Truvari has the functionality to output a single 'corrected VCF' that shows for each position in the COMP, what is the true label for each position.

I've looked at all the output files it already produces, but I don't think it currently does this (nor with any of the flags). I think this feature would be somewhat similar to what hap.py does, which has the QUERY and TRUTH columns that show what the caller proposed versus what the truth shows.

It seems like Truvari would basically compute this internally at some point, but it just doesn't actually output it in any file (as far as I can tell).

Self-comparison not giving 100% F1 score value

Dear truvari developers,
in order to see if I had correctly understood how to use Truvari, I tried passing as 'base' and as 'call' the same VCF file. Obviously, I was expecting to get a 100% recall and 100% precision, but that was not the case.
In particular, I tried using a subset of the HG002_SVs_Tier1_v0.6.vcf file, including only PASS deletions (grep "SVTYPE=DEL" | grep $’\tPASS$’\t’) as base, and HG002_SVs_Tier1_v0.6.vcf file, including only deletions (grep "SVTYPE=DEL" and --passonly flag) as call.
Since I am interested in using Truvari to benchmark some callers on hg38 reference, I also did a lift-over of the HG002_SVs_Tier1_v0.6.vcf file and of the HG002_SVs_Tier1_v0.6.bed file, and repeated the analysis.

This was my command for analysis on hg19:
/mnt/cifs01/simone/software/truvari/truvari.py -b /mnt/cifs01/simone/NA24385/GIAB/hg19/HG002_SVs_Tier1_v0.6_DELs_PASS.vcf.gz -c /mnt/cifs01/simone/NA24385/GIAB/hg19/prova/HG002_SVs_Tier1_v0.6_DELs.vcf.gz -o /mnt/cifs01/simone/NA24385/GIAB/hg19/prova/TRUVARI_DELs_PASS_SIZEMIN50_SIZEMAX1000000000 --includebed /mnt/cifs01/simone/NA24385/GIAB/hg19/HG002_SVs_Tier1_v0.6.bed --passonly --pctsim 0 --pctovl 0.0 --refdist 2000 --sizemin 50 --sizemax 1000000000 --giabreport -f /mnt/cifs01/simone/Whole_genome/ucsc.hg19.fasta

And these were the results (as expected):

{
"FP": 0,
"f1": 1.0,
"base gt filtered": 0,
"precision": 1.0,
"TP-call": 4203,
"call cnt": 4203,
"base size filtered": 0,
"FN": 0,
"TP-base": 4203,
"base cnt": 4203,
"recall": 1.0,
"call gt filtered": 0,
"call size filtered": 0
}

While this was my command for analysis on hg38 after lift-over of VCF and bed files:
/mnt/cifs01/simone/software/truvari/truvari.py -b /mnt/cifs01/basaglia/NA24385/GIAB_reference/HG002_SVs_Tier1_v0.6_hg38_chr_PASS_DELs.vcf.gz -c /mnt/cifs01/basaglia/NA24385/GIAB_reference/prova/HG002_SVs_Tier1_v0.6_hg38_chr_DELs.vcf.gz -o /mnt/cifs01/basaglia/NA24385/GIAB_reference/prova/TRUVARI_DELs_PASS_SIZEMIN50_SIZEMAX1000000000 --includebed /mnt/cifs01/basaglia/NA24385/GIAB_reference/HG002_SVs_Tier1_v0.6_hg38_chr.bed --passonly --pctsim 0 --pctovl 0.0 --refdist 2000 --sizemin 50 --sizemax 1000000000 --giabreport -f /mnt/cifs01/basaglia/Homo_sapiens_assembly38.fasta

And these were the results:

{
"FP": 141,
"f1": 0.8166449934980494,
"base gt filtered": 0,
"precision": 0.8166449934980494,
"TP-call": 628,
"call cnt": 769,
"base size filtered": 0,
"FN": 141,
"TP-base": 628,
"base cnt": 769,
"recall": 0.8166449934980494,
"call gt filtered": 0,
"call size filtered": 0
}

What am I missing?
At a visual inspection of tn.vcf and fp.vcf, it seems that the 141 records are the same. Is it a problem due to the 'chr' prefix in chromosome names?

Thanks in advance.
Simone

TypeError: bad operand type for abs(): 'NoneType'

hi,
I use it to compare pacbio sv to nanopore sv, but I tried several times.
It bugged as below:
Traceback (most recent call last):
File "/usr/local/bin/truvari", line 1047, in
run(sys.argv[1:])
File "/usr/local/bin/truvari", line 788, in run
sizeA = get_vcf_entry_size(base_entry)
File "/usr/local/bin/truvari", line 271, in get_vcf_entry_size
size = abs(entry.INFO["SVLEN"])
TypeError: bad operand type for abs(): 'NoneType'

what's meaning? How I can change paramater?

my command line as below:
truvari -b ngm_Nanopore_human_ngmlr-0.2.3_mapped.bam.sniffles1kb_auto_s5_noalts.header.vcf.gz -c na12878_pacbio_mts_ngmlr-0.2.3_mapped.bam.sniffles1kb_auto_noalts.header.vcf.gz -o nanopore-vs-pacbio --reference ../GRCh38_full_analysis_set_plus_decoy_hla.fa --giabreport -t --debug --multimatch --pctsim=0

I have read code, there is a note """
returns the size of the variant.

How size is determined:

  • Starts by trying to use INFO/SVLEN
  • If SVLEN is unavailable and ALT field is an SV (e.g. , , etc),
    use abs(vcf.model._Record.start - vcf.model._Record.end). The INFO/END
    tag needs to be available, especially for INS.
  • Otherwise, return the size difference of the sequence resolved call using
    abs(len(vcf.model._Record.REF) - len(str(vcf.model._Record.ALT[0])))
    """
    I'm currious about sv type, does it include "<DUP/INS> and "? there are six types of sv in my file as below.
    <DUP/INS>

Thanks for replying as soon as possible!

when SVs without SVLEN in vcf file

which version of pyvcf package do you use, about the line 226 and 228, i should fix it as the left, otherwise it will report error ( the left is i fixed, the right is the original code)
truvari_bug

Question about SVTYPE=INS comparison

Dear Truvari developers,
I would like to use your tool to compare a set of 'comp' calls to a set of 'base' calls for NA24385 sample (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/HG002_SVs_Tier1_v0.6.vcf.gz).
I read that HG002_SVs_Tier1_v0.6.vcf.gz contains only SVTYPE=DUP and SVTYPE=INS variants, however there is a field named REPTYPE which gives further information about the type of variant. I saw that SVTYPE=INS variants may have REPTYPE=DUP value. So, my questions are:

  • If I want to benchmark insertion calls from a SV-caller whose VCF file splits the INS structural variants over SVTYPE=INS and SVTYPE=DUP types (or gives only SVTYPE=DUP variants), is Truvari only going to consider SVTYPE=INS variants for insertions comparisons or is it able to manage different types of variants annotations, if these are compliant to the VCF format?

  • If a SV-caller I want to benchmark produces a VCF file with many other SVTYPEs (as INV, BND, ...) along with DEL and INS, are those going to decrease the overall precision or are they just going to be ignored? In other words, is it better to grep only the "interesting" lines from the 'comp' calls VCF or is it indifferent?

Thanks

0 base variants

Hi there,

Firstly, thanks for this cool tool. I have two data-sets (Illumina PE reads) corresponding to two related individuals (humans). I want to compare their SVs which I created using Lumpy (and GrCh38 as the reference). When I run truvari with one of them as ground/base_calls and the other as compare_calls, I get everything 0 in summary.txt.

The log file of the run is attached.
truvari_log.txt

No 2.0 Tag

In the updates section you mention Truvari 2.0 but there is no tag for the release.

[E::get_intv] Failed to parse TBX_GENERIC, was wrong -p [type] used?

We are experiencing format issue of vcf entries.

Header and one entry:

##fileformat=VCFv4.2
##fileDate=data
##source=SentieonMGC-callerStr1-dynamic
##reference=HG19
##suppliedSex=MAL
##predictedSex=MALE
##avgRPKMCorrelationWrtTrainingSamples=0.956284181786183
##medianAbsoluteDeviation(log2R diff)=0.113303658101527
##log2RWaviness=1.23294155429722
##contig=<ID=chr1,Description="chromosome chr1",length=249250621>
##contig=<ID=chr2,Description="chromosome chr2",length=243199373>
##contig=<ID=chr3,Description="chromosome chr3",length=198022430>
##contig=<ID=chr4,Description="chromosome chr4",length=191154276>
##contig=<ID=chr5,Description="chromosome chr5",length=180915260>
##contig=<ID=chr6,Description="chromosome chr6",length=171115067>
##contig=<ID=chr7,Description="chromosome chr7",length=159138663>
##contig=<ID=chr8,Description="chromosome chr8",length=146364022>
##contig=<ID=chr9,Description="chromosome chr9",length=141213431>
##contig=<ID=chr10,Description="chromosome chr10",length=135534747>
##contig=<ID=chr11,Description="chromosome chr11",length=135006516>
##contig=<ID=chr12,Description="chromosome chr12",length=133851895>
##contig=<ID=chr13,Description="chromosome chr13",length=115169878>
##contig=<ID=chr14,Description="chromosome chr14",length=107349540>
##contig=<ID=chr15,Description="chromosome chr15",length=102531392>
##contig=<ID=chr16,Description="chromosome chr16",length=90354753>
##contig=<ID=chr17,Description="chromosome chr17",length=81195210>
##contig=<ID=chr18,Description="chromosome chr18",length=78077248>
##contig=<ID=chr19,Description="chromosome chr19",length=59128983>
##contig=<ID=chr20,Description="chromosome chr20",length=63025520>
##contig=<ID=chr21,Description="chromosome chr21",length=48129895>
##contig=<ID=chr22,Description="chromosome chr22",length=51304566>
##contig=<ID=chrX,Description="chromosome chrX",length=155270560>
##contig=<ID=chrY,Description="chromosome chrY",length=59373566>
##contig=<ID=chrM,Description="chromosome chrM",length=16569>
##contig=<ID=chrGL000207.1,Description="chromosome chrGL000207.1",length=4262>
##contig=<ID=chrGL000226.1,Description="chromosome chrGL000226.1",length=15008>
##contig=<ID=chrGL000229.1,Description="chromosome chrGL000229.1",length=19913>
##contig=<ID=chrGL000231.1,Description="chromosome chrGL000231.1",length=27386>
##contig=<ID=chrGL000210.1,Description="chromosome chrGL000210.1",length=27682>
##contig=<ID=chrGL000239.1,Description="chromosome chrGL000239.1",length=33824>
##contig=<ID=chrGL000235.1,Description="chromosome chrGL000235.1",length=34474>
##contig=<ID=chrGL000201.1,Description="chromosome chrGL000201.1",length=36148>
##contig=<ID=chrGL000247.1,Description="chromosome chrGL000247.1",length=36422>
##contig=<ID=chrGL000245.1,Description="chromosome chrGL000245.1",length=36651>
##contig=<ID=chrGL000197.1,Description="chromosome chrGL000197.1",length=37175>
##contig=<ID=chrGL000203.1,Description="chromosome chrGL000203.1",length=37498>
##contig=<ID=chrGL000246.1,Description="chromosome chrGL000246.1",length=38154>
##contig=<ID=chrGL000249.1,Description="chromosome chrGL000249.1",length=38502>
##contig=<ID=chrGL000196.1,Description="chromosome chrGL000196.1",length=38914>
##contig=<ID=chrGL000248.1,Description="chromosome chrGL000248.1",length=39786>
##contig=<ID=chrGL000244.1,Description="chromosome chrGL000244.1",length=39929>
##contig=<ID=chrGL000238.1,Description="chromosome chrGL000238.1",length=39939>
##contig=<ID=chrGL000202.1,Description="chromosome chrGL000202.1",length=40103>
##contig=<ID=chrGL000234.1,Description="chromosome chrGL000234.1",length=40531>
##contig=<ID=chrGL000232.1,Description="chromosome chrGL000232.1",length=40652>
##contig=<ID=chrGL000206.1,Description="chromosome chrGL000206.1",length=41001>
##contig=<ID=chrGL000240.1,Description="chromosome chrGL000240.1",length=41933>
##contig=<ID=chrGL000236.1,Description="chromosome chrGL000236.1",length=41934>
##contig=<ID=chrGL000241.1,Description="chromosome chrGL000241.1",length=42152>
##contig=<ID=chrGL000243.1,Description="chromosome chrGL000243.1",length=43341>
##contig=<ID=chrGL000242.1,Description="chromosome chrGL000242.1",length=43523>
##contig=<ID=chrGL000230.1,Description="chromosome chrGL000230.1",length=43691>
##contig=<ID=chrGL000237.1,Description="chromosome chrGL000237.1",length=45867>
##contig=<ID=chrGL000233.1,Description="chromosome chrGL000233.1",length=45941>
##contig=<ID=chrGL000204.1,Description="chromosome chrGL000204.1",length=81310>
##contig=<ID=chrGL000198.1,Description="chromosome chrGL000198.1",length=90085>
##contig=<ID=chrGL000208.1,Description="chromosome chrGL000208.1",length=92689>
##contig=<ID=chrGL000191.1,Description="chromosome chrGL000191.1",length=106433>
##contig=<ID=chrGL000227.1,Description="chromosome chrGL000227.1",length=128374>
##contig=<ID=chrGL000228.1,Description="chromosome chrGL000228.1",length=129120>
##contig=<ID=chrGL000214.1,Description="chromosome chrGL000214.1",length=137718>
##contig=<ID=chrGL000221.1,Description="chromosome chrGL000221.1",length=155397>
##contig=<ID=chrGL000209.1,Description="chromosome chrGL000209.1",length=159169>
##contig=<ID=chrGL000218.1,Description="chromosome chrGL000218.1",length=161147>
##contig=<ID=chrGL000220.1,Description="chromosome chrGL000220.1",length=161802>
##contig=<ID=chrGL000213.1,Description="chromosome chrGL000213.1",length=164239>
##contig=<ID=chrGL000211.1,Description="chromosome chrGL000211.1",length=166566>
##contig=<ID=chrGL000199.1,Description="chromosome chrGL000199.1",length=169874>
##contig=<ID=chrGL000217.1,Description="chromosome chrGL000217.1",length=172149>
##contig=<ID=chrGL000216.1,Description="chromosome chrGL000216.1",length=172294>
##contig=<ID=chrGL000215.1,Description="chromosome chrGL000215.1",length=172545>
##contig=<ID=chrGL000205.1,Description="chromosome chrGL000205.1",length=174588>
##contig=<ID=chrGL000219.1,Description="chromosome chrGL000219.1",length=179198>
##contig=<ID=chrGL000224.1,Description="chromosome chrGL000224.1",length=179693>
##contig=<ID=chrGL000223.1,Description="chromosome chrGL000223.1",length=180455>
##contig=<ID=chrGL000195.1,Description="chromosome chrGL000195.1",length=182896>
##contig=<ID=chrGL000212.1,Description="chromosome chrGL000212.1",length=186858>
##contig=<ID=chrGL000222.1,Description="chromosome chrGL000222.1",length=186861>
##contig=<ID=chrGL000200.1,Description="chromosome chrGL000200.1",length=187035>
##contig=<ID=chrGL000193.1,Description="chromosome chrGL000193.1",length=189789>
##contig=<ID=chrGL000194.1,Description="chromosome chrGL000194.1",length=191469>
##contig=<ID=chrGL000225.1,Description="chromosome chrGL000225.1",length=211173>
##contig=<ID=chrGL000192.1,Description="chromosome chrGL000192.1",length=547496>
##contig=<ID=chrNC_007605,Description="chromosome chrNC_007605",length=171823>
##contig=<ID=chrhs37d5,Description="chromosome chrhs37d5",length=35477943>
##contig=<ID=phix,Description="chromosome phix",length=5386>
##ALT=<ID=CNV,Description="Copy number variable region">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SRC,Number=1,Type=String,Description="Source of this event">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene identifier",Source="reporting.bed">
##INFO=<ID=TRANSCRIPT,Number=1,Type=String,Description="Transcript identifier",Source="reporting.bed">
##INFO=<ID=LOC,Number=1,Type=String,Description="Feature identifier",Source="reporting.bed">
##INFO=<ID=PEERS,Number=1,Type=String,Description="Number of reference samples utilized for analysis">
##INFO=<ID=IQR,Number=1,Type=Float,Description="Inter-quartile range of log2Ratio values among training samples">
##INFO=<ID=FEATURE_CATEGORY,Number=.,Type=String,Description="Category to which feature belongs to">
##INFO=<ID=SNR,Number=1,Type=Float,Description="Signal-to-noise ratio for SNR">
##INFO=<ID=CALL,Number=1,Type=String,Description="Category call">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=LG2R,Number=1,Type=Float,Description="CNV Log2 Ratio">
##FORMAT=<ID=ZSCORE,Number=1,Type=Float,Description="Z-Score">
##FORMAT=<ID=MINUSLOG10PVALUE,Number=1,Type=Float,Description="-log10 transformed P-value">
##FORMAT=<ID=TRAINING,Number=1,Type=Integer,Description="Sample label (0:test, 1:training)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NGS82-DV001 HG002NGS82-DV008 VIP4-DAUGHTER NGS82-DV024 NGS82-DV041 NGS82-DV033 2019-NORM-CTRLNGS82-DV002 NGS82-DV036 NGS82-DV009 NGS82-DV012 DONOR5-WBH-P DONOR6-WBA-C DONOR7-WBA-P VIP3-FATHER
chr2 130951460 . T . . SVTYPE=DUP;SNR=24.857;END=130951579;SVLEN=120;GENE=TUBA3E;TRANSCRIPT=NM_207312.2;LOC=Ex4;SRC=SentieonMGC-callerStr1-dynamic;PEERS=15;IQR=0.213;FEATURE_CATEGORY=CAT1;CALL=DUP GT:LG2R:ZSCORE:MINUSLOG10PVALUE:TRAINING 1:0.579:6.888:11.247:0 .:0.005:.:.:1 .:0.006:.:.:1 .:0.006:.:.:1 .:0.002:.:.:1.:0.045:.:.:1 .:0.002:.:.:1 .:0.108:.:.:1 .:0.444:.:.:1 .:0.058:.:.:1 .:0.322:.:.:1.:0.656:.:.:1 .:0.07:.:.:1 .:0.003:.:.:1 .:0.409:.:.:1 .:0.092:.:.:1

Error message:

The offending line was: "chr9 2622171 . G . . SVTYPE=DEL;SNR=24.979;END=2622290;SVLEN=120;GENE=VLDLR;TRANSCRIPT=NM_003383.4;LOC=5UTR;SRC=SentieonMGC-callerStr1-dynamic;PEERS=15;IQR=0.134;FEATURE_CATEGORY=CAT1;CALL=DEL GT:LG2R:ZSCORE:MINUSLOG10PVALUE:TRAINING 1:0.767:-8.375:21.2:0 .:0.059:.:.:1 .:0.093:.:.:1 .:0.045:.:.:1 .:0.119:.:.:1 .:0.023:.:.:1 .:0.081:.:.:1 .:0.005:.:.:1 .:0.049:.:.:1 .:0.092:.:.:1 .:0.005:.:.:1 .:0.102:.:.:1 .:0.099:.:.:1 .:0.009:.:.:1 .:0.104:.:.:1 .:0.023:.:.:1"
[E::get_intv] Failed to parse TBX_GENERIC, was wrong -p [type] used?
The offending line was: "chrX 71822006 . T . . SVTYPE=DUP;SNR=22.545;END=71822125;SVLEN=120;GENE=PHKA1;TRANSCRIPT=NM_002637.3;LOC=In27/28;SRC=SentieonMGC-callerStr1-dynamic;PEERS=15;IQR=0.177;FEATURE_CATEGORY=CAT1;CALL=DUP GT:LG2R:ZSCORE:MINUSLOG10PVALUE:TRAINING 1:0.6:3.554:3.421:0 .:0.152:.:.:1 .:0.142:.:.:1 .:0.167:.:.:1 .:0.056:.:.:1 .:0.07:.:.:1 .:0.043:.:.:1 .:0.053:.:.:1 .:0.251:.:.:1 .:0.113:.:.:1 .:0.043:.:.:1 .:0.584:.:.:1 .:0.283:.:.:1 .:0.116:.:.:1 .:0.051:.:.:1 .:0.053:.:.:1"
[Elapsed Time: 0:00:00 129/129] |########################################| (Time: 0:00:00)
2019-12-17 09:48:04,795 [WARNING] No TP or FP calls in base!
2019-12-17 09:48:04,795 [INFO] Parsing FPs from calls
[Elapsed Time: 0:00:00 129/129] |########################################| (Time: 0:00:00)

Error : IndexError: list index out of range

Hello,

While trying to run truvari I got this error :

2019-07-24 16:37:42,485 [INFO] Running /home/tintest/miniconda3/bin/truvari -b vcf_control/NA12878_DGV-2016_LR-assembly.vcf -c vcf/sniffles/rel_6_minimap2_pbsv.vcf -o res/rel_6_minimap2_pbsv --reference hg38.fasta
2019-07-24 16:37:42,486 [INFO] Params:
{
    "sizemax": 50000,
    "reference": "hg38.fasta",
    "noprog": false,
    "multimatch": false,
    "pctsize": 0.7,
    "cSample": null,
    "includebed": null,
    "no_ref": false,
    "passonly": false,
    "pctsim": 0.7,
    "pctovl": 0.0,
    "comp": "vcf/sniffles/rel_6_minimap2_pbsv.vcf",
    "refdist": 500,
    "base": "vcf_control/NA12878_DGV-2016_LR-assembly.vcf",
    "giabreport": false,
    "sizefilt": 30,
    "typeignore": false,
    "gtcomp": false,
    "debug": false,
    "output": "res/rel_6_minimap2_pbsv",
    "bSample": null,
    "sizemin": 50
}
Traceback (most recent call last):
  File "/home/tintest/miniconda3/bin/truvari", line 1047, in <module>
    run(sys.argv[1:])
  File "/home/tintest/miniconda3/bin/truvari", line 702, in run
    sampleA = vcf_base.samples[0]
IndexError: list index out of range

I've installed truvari with pip.

Can you help me ?

Regards.

Truvari v1.2 and v1.3.2 give different results

Hi,

we upgraded Truvari from v1.2 to v1.3.2, and given the same input data and Truvari parameters, we get quite different results.

E.g. results for Manta DEL:
For variants larger that 1000bp, v1.2 reports approximately 450 true positives, while v1.3.2 reports 2
The two versions agree on variants smaller than 1000bp (only missing 5 of 1000 in 300to999 range)

Is this behaviour expected?

AttributeError: 'pysam.libcbcf.VariantFile' object has no attribute 'contigs'

I installed the latest version 1.3 into a separate venv with all requirements (pysam 0.15.2)

For all runs i get the following error message:

2019-09-30 18:38:40,241 [WARNING] Excluding 1 contigs present in comparison calls header but not base calls.
Traceback (most recent call last):
  File "/mnt/SRV018/projects/external/promethion/sv_comparison/venv/bin/truvari", line 1044, in <module>
    main(sys.argv[1:])
  File "/mnt/SRV018/projects/external/promethion/sv_comparison/venv/bin/truvari", line 725, in main
    regions = GenomeTree(vcf_base, vcf_comp, args.includebed)
  File "/mnt/SRV018/projects/external/promethion/sv_comparison/venv/bin/truvari", line 528, in __init__
    name = vcfA.contigs[contig].name
AttributeError: pysam.libcbcf.VariantFile object has no attribute contigs

For test purposes i used the GIAB reference set compared to itself. Other combinations of vcf files with and without reference produced the same error.

truvari -b ../../reference_genomes/HG002_SVs_Tier1_v0.6.vcf.gz -c ../../reference_genomes/HG002_SVs_Tier1_v0.6.vcf.gz -o truvari_test --pctsim=0

Do you have any idea what might cause this?

SVs that partially overlap bed file

It looks like truvari might currently include variants in the TP, FP, and FN counts if they only partially overlap the include bed file. I think these should probably be excluded, since now large deletions in the query set often appear to be FPs when in fact they might be mostly outside the high-confidence bed.

Truvari does not recognize call in base VCF

Command
/usr/local/biotools/python/3.6.3/bin/truvari -b ${GOLD_VCF}.gz -c ${COMP_VCF}.gz -o ${OUTDIR} --pctsim 0

Issue
When I try and run the following VCF, Truvari prints out the warning, 'No TP or FN calls in base!' and as a result shows 0 for both the precision and recall metrics. Why can't it recognize the variant? I tried looking for and deleting weird characters, but that does not seem to be the issue. Please help

Input: GOLD_VCF
##fileformat=VCFv4.2
##contig=<ID=chr1,Description="chromosome chr1",length=249250621>
##contig=<ID=chr2,Description="chromosome chr2",length=243199373>
##contig=<ID=chr3,Description="chromosome chr3",length=198022430>
##contig=<ID=chr4,Description="chromosome chr4",length=191154276>
##contig=<ID=chr5,Description="chromosome chr5",length=180915260>
##contig=<ID=chr6,Description="chromosome chr6",length=171115067>
##contig=<ID=chr7,Description="chromosome chr7",length=159138663>
##contig=<ID=chr8,Description="chromosome chr8",length=146364022>
##contig=<ID=chr9,Description="chromosome chr9",length=141213431>
##contig=<ID=chr10,Description="chromosome chr10",length=135534747>
##contig=<ID=chr11,Description="chromosome chr11",length=135006516>
##contig=<ID=chr12,Description="chromosome chr12",length=133851895>
##contig=<ID=chr13,Description="chromosome chr13",length=115169878>
##contig=<ID=chr14,Description="chromosome chr14",length=107349540>
##contig=<ID=chr15,Description="chromosome chr15",length=102531392>
##contig=<ID=chr16,Description="chromosome chr16",length=90354753>
##contig=<ID=chr17,Description="chromosome chr17",length=81195210>
##contig=<ID=chr18,Description="chromosome chr18",length=78077248>
##contig=<ID=chr19,Description="chromosome chr19",length=59128983>
##contig=<ID=chr20,Description="chromosome chr20",length=63025520>
##contig=<ID=chr21,Description="chromosome chr21",length=48129895>
##contig=<ID=chr22,Description="chromosome chr22",length=51304566>
##contig=<ID=chrX,Description="chromosome chrX",length=155270560>
##contig=<ID=chrY,Description="chromosome chrY",length=59373566>
##contig=<ID=chrM,Description="chromosome chrM",length=16569>
##contig=<ID=chrGL000207.1,Description="chromosome chrGL000207.1",length=4262>
##contig=<ID=chrGL000226.1,Description="chromosome chrGL000226.1",length=15008>
##contig=<ID=chrGL000229.1,Description="chromosome chrGL000229.1",length=19913>
##contig=<ID=chrGL000231.1,Description="chromosome chrGL000231.1",length=27386>
##contig=<ID=chrGL000210.1,Description="chromosome chrGL000210.1",length=27682>
##contig=<ID=chrGL000239.1,Description="chromosome chrGL000239.1",length=33824>
##contig=<ID=chrGL000235.1,Description="chromosome chrGL000235.1",length=34474>
##contig=<ID=chrGL000201.1,Description="chromosome chrGL000201.1",length=36148>
##contig=<ID=chrGL000247.1,Description="chromosome chrGL000247.1",length=36422>
##contig=<ID=chrGL000245.1,Description="chromosome chrGL000245.1",length=36651>
##contig=<ID=chrGL000197.1,Description="chromosome chrGL000197.1",length=37175>
##contig=<ID=chrGL000203.1,Description="chromosome chrGL000203.1",length=37498>
##contig=<ID=chrGL000246.1,Description="chromosome chrGL000246.1",length=38154>
##contig=<ID=chrGL000249.1,Description="chromosome chrGL000249.1",length=38502>
##contig=<ID=chrGL000196.1,Description="chromosome chrGL000196.1",length=38914>
##contig=<ID=chrGL000248.1,Description="chromosome chrGL000248.1",length=39786>
##contig=<ID=chrGL000244.1,Description="chromosome chrGL000244.1",length=39929>
##contig=<ID=chrGL000238.1,Description="chromosome chrGL000238.1",length=39939>
##contig=<ID=chrGL000202.1,Description="chromosome chrGL000202.1",length=40103>
##contig=<ID=chrGL000234.1,Description="chromosome chrGL000234.1",length=40531>
##contig=<ID=chrGL000232.1,Description="chromosome chrGL000232.1",length=40652>
##contig=<ID=chrGL000206.1,Description="chromosome chrGL000206.1",length=41001>
##contig=<ID=chrGL000240.1,Description="chromosome chrGL000240.1",length=41933>
##contig=<ID=chrGL000236.1,Description="chromosome chrGL000236.1",length=41934>
##contig=<ID=chrGL000241.1,Description="chromosome chrGL000241.1",length=42152>
##contig=<ID=chrGL000243.1,Description="chromosome chrGL000243.1",length=43341>
##contig=<ID=chrGL000242.1,Description="chromosome chrGL000242.1",length=43523>
##contig=<ID=chrGL000230.1,Description="chromosome chrGL000230.1",length=43691>
##contig=<ID=chrGL000237.1,Description="chromosome chrGL000237.1",length=45867>
##contig=<ID=chrGL000233.1,Description="chromosome chrGL000233.1",length=45941>
##contig=<ID=chrGL000204.1,Description="chromosome chrGL000204.1",length=81310>
##contig=<ID=chrGL000198.1,Description="chromosome chrGL000198.1",length=90085>
##contig=<ID=chrGL000208.1,Description="chromosome chrGL000208.1",length=92689>
##contig=<ID=chrGL000191.1,Description="chromosome chrGL000191.1",length=106433>
##contig=<ID=chrGL000227.1,Description="chromosome chrGL000227.1",length=128374>
##contig=<ID=chrGL000228.1,Description="chromosome chrGL000228.1",length=129120>
##contig=<ID=chrGL000214.1,Description="chromosome chrGL000214.1",length=137718>
##contig=<ID=chrGL000221.1,Description="chromosome chrGL000221.1",length=155397>
##contig=<ID=chrGL000209.1,Description="chromosome chrGL000209.1",length=159169>
##contig=<ID=chrGL000218.1,Description="chromosome chrGL000218.1",length=161147>
##contig=<ID=chrGL000220.1,Description="chromosome chrGL000220.1",length=161802>
##contig=<ID=chrGL000213.1,Description="chromosome chrGL000213.1",length=164239>
##contig=<ID=chrGL000211.1,Description="chromosome chrGL000211.1",length=166566>
##contig=<ID=chrGL000199.1,Description="chromosome chrGL000199.1",length=169874>
##contig=<ID=chrGL000217.1,Description="chromosome chrGL000217.1",length=172149>
##contig=<ID=chrGL000216.1,Description="chromosome chrGL000216.1",length=172294>
##contig=<ID=chrGL000215.1,Description="chromosome chrGL000215.1",length=172545>
##contig=<ID=chrGL000205.1,Description="chromosome chrGL000205.1",length=174588>
##contig=<ID=chrGL000219.1,Description="chromosome chrGL000219.1",length=179198>
##contig=<ID=chrGL000224.1,Description="chromosome chrGL000224.1",length=179693>
##contig=<ID=chrGL000223.1,Description="chromosome chrGL000223.1",length=180455>
##contig=<ID=chrGL000195.1,Description="chromosome chrGL000195.1",length=182896>
##contig=<ID=chrGL000212.1,Description="chromosome chrGL000212.1",length=186858>
##contig=<ID=chrGL000222.1,Description="chromosome chrGL000222.1",length=186861>
##contig=<ID=chrGL000200.1,Description="chromosome chrGL000200.1",length=187035>
##contig=<ID=chrGL000193.1,Description="chromosome chrGL000193.1",length=189789>
##contig=<ID=chrGL000194.1,Description="chromosome chrGL000194.1",length=191469>
##contig=<ID=chrGL000225.1,Description="chromosome chrGL000225.1",length=211173>
##contig=<ID=chrGL000192.1,Description="chromosome chrGL000192.1",length=547496>
##contig=<ID=chrNC_007605,Description="chromosome chrNC_007605",length=171823>
##contig=<ID=chrhs37d5,Description="chromosome chrhs37d5",length=35477943>
##contig=<ID=phix,Description="chromosome phix",length=5386>
##ALT=<ID=CNV,Description="Copy number variable region">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SRC,Number=1,Type=String,Description="Source of this event">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene identifier",Source="reporting.bed">
##INFO=<ID=TRANSCRIPT,Number=1,Type=String,Description="Transcript identifier",Source="reporting.bed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=LG2R,Number=1,Type=Float,Description="CNV Log2 Ratio">
##FORMAT=<ID=ZSCORE,Number=1,Type=Float,Description="Z-Score">
##FORMAT=<ID=MINUSLOG10PVALUE,Number=1,Type=Float,Description="-log10 transformed P-value">
##FORMAT=<ID=RPKM,Number=1,Type=Float,Description="Reads Per Kilobase of transcript per Million mapped reads">
##FORMAT=<ID=AVGBPCOVG,Number=1,Type=Float,Description="Average base-pair coverage">
##FORMAT=<ID=TRAINING,Number=1,Type=Integer,Description="Sample label (0:test, 1:training)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2019-NORMCTRL-i
chr8 10692171 . A . . SVTYPE=.;END=10692290;SVLEN=120;GENE=ABCA1;TRANSCRIPT=ENST00000435915.1 GT 0/1

Cannot fetch on call_h_s.vcf. Is it sorted/indexed?

Hi,
I was trying to evaluate my vcf files with cmd like:
truvari.py -b baseline.vcf -c call_h_s.vcf -o truvari_out
I have sorted my vcf files, but I still got ERRORs like:
[ERROR] Cannot fetch on call_h_s.vcf. Is it sorted/indexed?

And I used bgzipped & indexed vcf , with another error:
Traceback (most recent call last): File "/home/maize/gst/sftw/SV-Call/truvari/truvari.py", line 926, in <module> run(sys.argv[1:]) File "/home/maize/gst/sftw/SV-Call/truvari/truvari.py", line 661, in run contig = vcfB.fetch(contig, 0, 1000) File "/home/maize/gst/sftw/anaconda2/lib/python2.7/site-packages/vcf/parser.py", line 599, in fetch self.reader = self._tabix.fetch(chrom, start, end) File "pysam/libctabix.pyx", line 447, in pysam.libctabix.TabixFile.fetch ValueError: start out of range (1000)
So I was wondering what's the right Input format for the vcf files? Should they be gzipped or not?
Or that error were caused by someting else? Thank you!

Please find attached the test vcfs I used.

Best wishes,

Songtao Gui

baseline.vcf.gz
call_h_s.vcf.gz

Multimatch and MatchId

Hello,

When inspecting the MatchId in tp-call.vcf and tp-base.vcf, I noticed that the set of tp-call match IDs is not equal to the set of tp-base match IDs. As a result, there are match IDs that are unique to tp-base and no easy to way to find what the call matches.

This is only observed when using --multimatch.

Thanks!

Truvari not adding variants to call count or interval tree for comparsion.

Dear Truvari developer,

I have this variant,
1 108734987 . T 20 PASS SVTYPE=DEL;SVLEN=-379;END=108735366 GT:GTcons1:PB_GT:PB_REF:PB_ALT:PBHP_GT:PB_REF_HP1:PB_ALT_HP1:PB_REF_HP2:PB_ALT_HP2:TenX_GT:TenX_REF_HP1:TenX_ALT_HP1:TenX_REF_HP2:TenX_ALT_HP2:ILL250bp_GT:ILL250bp_REF:ILL250bp_ALT:ILLMP_GT:ILLMP_REF:ILLMP_ALT:BNG_LEN_DEL:BNG_LEN_INS:nabsys_svm 1/1:1/1:1/1:2:110:./.:0:2:0:2:./.:1:12:0:10:1/1:0:30:1/1:3:285:5046:.:.

I am using --includebed parameter of truvari and this variant is covered by this region in my bed file
1 108734987 108735366

This variant is being discarded by truvari and is not added to a call count. However, when I extended the bed region by 1 (Note that bed region is exactly same as start and end of this variant) like this
1 108734986 108735367
truvari adds it to the call count and performs the comparison. Is it intended to behave like this? or could it be by any chance 'Off by 1' bug?

TypeError: '>=' not supported between instances of 'int' and 'NoneType'

Detailed error messages:

Traceback (most recent call last):
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/bin/truvari", line 1047, in
run(sys.argv[1:])
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/bin/truvari", line 731, in run
regions = GenomeTree(vcf_base, vcf_comp, args.includebed)
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/bin/truvari", line 537, in init
all_regions[name].addi(0, length)
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/lib/python3.6/site-packages/intervaltree/intervaltree.py", line 343, in addi
return self.add(Interval(begin, end, data))
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/lib/python3.6/site-packages/intervaltree/intervaltree.py", line 323, in add
if interval.is_null():
File "/dlmp/sandbox/cgslIS/DLMP_CGSL_OS/python3_env/lib/python3.6/site-packages/intervaltree/interval.py", line 116, in is_null
return self.begin >= self.end
TypeError: '>=' not supported between instances of 'int' and 'NoneType'

Header and a few lines of the input:

##fileformat=VCFv4.2
##fileDate='Thu Dec 12 18:26:36 CST 2019'
##source=SentieonMGC-callerStr1-dynamic
##reference=HG19
##suppliedSex=MAL
##predictedSex=MALE
##avgRPKMCorrelationWrtTrainingSamples=0.937643587011545
##medianAbsoluteDeviation(log2R diff)=0.104296609837663
##log2RWaviness=1.4025688822472
##contig=<ID=chr1,Description=chromosome chr1>
##contig=<ID=chr2,Description=chromosome chr2>
##contig=<ID=chr3,Description=chromosome chr3>
##contig=<ID=chr4,Description=chromosome chr4>
##contig=<ID=chr5,Description=chromosome chr5>
##contig=<ID=chr6,Description=chromosome chr6>
##contig=<ID=chr7,Description=chromosome chr7>
##contig=<ID=chr8,Description=chromosome chr8>
##contig=<ID=chr9,Description=chromosome chr9>
##contig=<ID=chr10,Description=chromosome chr10>
##contig=<ID=chr11,Description=chromosome chr11>
##contig=<ID=chr12,Description=chromosome chr12>
##contig=<ID=chr13,Description=chromosome chr13>
##contig=<ID=chr14,Description=chromosome chr14>
##contig=<ID=chr15,Description=chromosome chr15>
##contig=<ID=chr16,Description=chromosome chr16>
##contig=<ID=chr17,Description=chromosome chr17>
##contig=<ID=chr18,Description=chromosome chr18>
##contig=<ID=chr19,Description=chromosome chr19>
##contig=<ID=chr20,Description=chromosome chr20>
##contig=<ID=chr21,Description=chromosome chr21>
##contig=<ID=chr22,Description=chromosome chr22>
##contig=<ID=chrX,Description=chromosome chrX>
##contig=<ID=chrY,Description=chromosome chrY>
##contig=<ID=chrM,Description=chromosome chrM>
##contig=<ID=chrGL000207.1,Description=chromosome chrGL000207.1>
##contig=<ID=chrGL000226.1,Description=chromosome chrGL000226.1>
##contig=<ID=chrGL000229.1,Description=chromosome chrGL000229.1>
##contig=<ID=chrGL000231.1,Description=chromosome chrGL000231.1>
##contig=<ID=chrGL000210.1,Description=chromosome chrGL000210.1>
##contig=<ID=chrGL000239.1,Description=chromosome chrGL000239.1>
##contig=<ID=chrGL000235.1,Description=chromosome chrGL000235.1>
##contig=<ID=chrGL000201.1,Description=chromosome chrGL000201.1>
##contig=<ID=chrGL000247.1,Description=chromosome chrGL000247.1>
##contig=<ID=chrGL000245.1,Description=chromosome chrGL000245.1>
##contig=<ID=chrGL000197.1,Description=chromosome chrGL000197.1>
##contig=<ID=chrGL000203.1,Description=chromosome chrGL000203.1>
##contig=<ID=chrGL000246.1,Description=chromosome chrGL000246.1>
##contig=<ID=chrGL000249.1,Description=chromosome chrGL000249.1>
##contig=<ID=chrGL000196.1,Description=chromosome chrGL000196.1>
##contig=<ID=chrGL000248.1,Description=chromosome chrGL000248.1>
##contig=<ID=chrGL000244.1,Description=chromosome chrGL000244.1>
##contig=<ID=chrGL000238.1,Description=chromosome chrGL000238.1>
##contig=<ID=chrGL000202.1,Description=chromosome chrGL000202.1>
##contig=<ID=chrGL000234.1,Description=chromosome chrGL000234.1>
##contig=<ID=chrGL000232.1,Description=chromosome chrGL000232.1>
##contig=<ID=chrGL000206.1,Description=chromosome chrGL000206.1>
##contig=<ID=chrGL000240.1,Description=chromosome chrGL000240.1>
##contig=<ID=chrGL000236.1,Description=chromosome chrGL000236.1>
##contig=<ID=chrGL000241.1,Description=chromosome chrGL000241.1>
##contig=<ID=chrGL000243.1,Description=chromosome chrGL000243.1>
##contig=<ID=chrGL000242.1,Description=chromosome chrGL000242.1>
##contig=<ID=chrGL000230.1,Description=chromosome chrGL000230.1>
##contig=<ID=chrGL000237.1,Description=chromosome chrGL000237.1>
##contig=<ID=chrGL000233.1,Description=chromosome chrGL000233.1>
##contig=<ID=chrGL000204.1,Description=chromosome chrGL000204.1>
##contig=<ID=chrGL000198.1,Description=chromosome chrGL000198.1>
##contig=<ID=chrGL000208.1,Description=chromosome chrGL000208.1>
##contig=<ID=chrGL000191.1,Description=chromosome chrGL000191.1>
##contig=<ID=chrGL000227.1,Description=chromosome chrGL000227.1>
##contig=<ID=chrGL000228.1,Description=chromosome chrGL000228.1>
##contig=<ID=chrGL000214.1,Description=chromosome chrGL000214.1>
##contig=<ID=chrGL000221.1,Description=chromosome chrGL000221.1>
##contig=<ID=chrGL000209.1,Description=chromosome chrGL000209.1>
##contig=<ID=chrGL000218.1,Description=chromosome chrGL000218.1>
##contig=<ID=chrGL000220.1,Description=chromosome chrGL000220.1>
##contig=<ID=chrGL000213.1,Description=chromosome chrGL000213.1>
##contig=<ID=chrGL000211.1,Description=chromosome chrGL000211.1>
##contig=<ID=chrGL000199.1,Description=chromosome chrGL000199.1>
##contig=<ID=chrGL000217.1,Description=chromosome chrGL000217.1>
##contig=<ID=chrGL000216.1,Description=chromosome chrGL000216.1>
##contig=<ID=chrGL000215.1,Description=chromosome chrGL000215.1>
##contig=<ID=chrGL000205.1,Description=chromosome chrGL000205.1>
##contig=<ID=chrGL000219.1,Description=chromosome chrGL000219.1>
##contig=<ID=chrGL000224.1,Description=chromosome chrGL000224.1>
##contig=<ID=chrGL000223.1,Description=chromosome chrGL000223.1>
##contig=<ID=chrGL000195.1,Description=chromosome chrGL000195.1>
##contig=<ID=chrGL000212.1,Description=chromosome chrGL000212.1>
##contig=<ID=chrGL000222.1,Description=chromosome chrGL000222.1>
##contig=<ID=chrGL000200.1,Description=chromosome chrGL000200.1>
##contig=<ID=chrGL000193.1,Description=chromosome chrGL000193.1>
##contig=<ID=chrGL000194.1,Description=chromosome chrGL000194.1>
##contig=<ID=chrGL000225.1,Description=chromosome chrGL000225.1>
##contig=<ID=chrGL000192.1,Description=chromosome chrGL000192.1>
##contig=<ID=chrNC_007605,Description=chromosome chrNC_007605>
##contig=<ID=chrhs37d5,Description=chromosome chrhs37d5>
##contig=<ID=phix,Description=chromosome phix>
##ALT=<ID=CNV,Description="Copy number variable region">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=DEL,Description="Deletion">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this
record">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Difference in length between REF and ALT al
leles">
##INFO=<ID=SRC,Number=1,Type=String,Description="Source of this event">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene identifier",Source="reporting.bed">
##INFO=<ID=TRANSCRIPT,Number=1,Type=String,Description="Transcript identifier",Source="reportin
g.bed">
##INFO=<ID=LOC,Number=1,Type=String,Description="Feature identifier",Source="reporting.bed">
##INFO=<ID=PEERS,Number=1,Type=String,Description="Number of reference samples utilized for ana
lysis">
##INFO=<ID=IQR,Number=1,Type=Float,Description="Inter-quartile range of log2Ratio values among
training samples">
##INFO=<ID=FEATURE_CATEGORY,Number=.,Type=String,Description="Category to which feature belongs
to">
##INFO=<ID=SNR,Number=1,Type=Float,Description="Signal-to-noise ratio for SNR">
##INFO=<ID=CALL,Number=1,Type=String,Description="Category call">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=LG2R,Number=1,Type=Float,Description="CNV Log2 Ratio">
##FORMAT=<ID=ZSCORE,Number=1,Type=Float,Description="Z-Score">
##FORMAT=<ID=-LOG10PVALUE,Number=1,Type=Float,Description="-log10 transformed P-value">
##FORMAT=<ID=TRAINING,Number=1,Type=Integer,Description="Sample label (0:test, 1:training)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG002 NGS82-DV042 NGS82-DV008 NGS82-DV014 NGS82-DV043 NGS82-DV044 NGS82-DV051 HG003 HG004 NGS82-DV046 MGS82-DV047 NGS82-DV048 NGS82-DV049 NGS82-DV050 NGS82-DV001
chr1 1475628 . G . . SVTYPE=DEL;SNR=24.081;END=1475747;SVLEN
=120;GENE=476_458424_339453(TMEM240)_1.1_1;TRANSCRIPT=NA;LOC=0;SRC=SentieonMGC-callerStr1-dynam
ic;PEERS=14;IQR=0.135;FEATURE_CATEGORY=CAT1;CALL=DEL GT:LG2R:ZSCORE:-LOG10PVALUE:TRAINING 1:0.505:-4.919:6.424:0 .:0.05:.:.:1 .:0.999:.:.:1 .:0.063:.:.:1 .:0.004:.:.:1 .:0.032
:.:.:1 .:0.018:.:.:1 .:0.083:.:.:1 .:0.348:.:.:1 .:0.023:.:.:1 .:0.101:.:.:1 .:0.089
:.:.:1 .:0.221:.:.:1 .:0.12:.:.:1 .:0.027:.:.:1

β€˜Mismatch when double-checking Truvari calculations

Dear Truvari development team,

We recently used Truvari for a project comparing SV detection performance but the performance metrics did not match with our scripts. Here's the command used to run Truvari (input files attached):

/usr/local/biotools/python/3.6.3/bin/truvari -b ref.vcf.gz -c test.vcf.gz -o ${OUTDIR} --includebed include.bed --pctsim 0

Output:
"TP-base": 6405,
"TP-call": 6405,
"FP": 160,
"FN": 367,
"precision": 0.9756283320639756,
"recall": 0.9458062610750148,
"base cnt": 6772,
"call cnt": 6565,
"base size filtered": 0,
"call size filtered": 0,
"base gt filtered": 0,
"call gt filtered": 0,
"TP-call_TP-gt": 6405,
"TP-call_FP-gt": 0,
"TP-base_TP-gt": 6405,
"TP-base_FP-gt": 0,
"gt_precision": 0.9756283320639756,
"gt_recall": 0.9458062610750148,

Output from our scripts (r.cmd.txt: code attached):

Note: In the R code, run.all.melt is essentially test.vcf in tabular form with different column names.

#dups.perf.df
sample dup.TP dup.FP dup.TN dup.FN dups.SENS dups.SPEC dups.PPV dups.NPV dups.FPR dups.FDR
2019-NORMCTRL-i 12350 240 397794 1775 0.8743363 0.999397 0.9809373 1 0.9955577 0.0006029636 0.01906275

#dels.perf.df
sample del.TP del.FP del.TN del.FN dels.SENS dels.SPEC dels.PPV dels.NPV dels.FPR dels.FDR
2019-NORMCTRL-i 9 156 411992 2 0.8181818 0.9996215 0.05454545 0.9999951 0.0003785048 0.9454545

All the input files and our code are attached. Please let me know if you require more info.

Best,

Shengbing Huang, Ph.D, MS
Assistant Professor of Medicine
IT Analyst/Programmer

Archive.zip

Question regarding matching parameters

I am attempting to perform a benchmarking of various SV callers and their combinations against the GIAB SV 0.6 dataset, and I have troubles making an educated choice of truvari matching parameters for this task. Could you please give guidelines on how to choose:
pctsim (min percent allele sequence similarity),
pctsize (min pct allele size similarity (minvarsize/maxvarsize)),
pctovl (min percent reciprocal overlap),
refdist (max reference location distance)?

understanding "call cnt"

Hello!

I am still new to using truvari, so I am still trying to fully comprehend what I see.

One thing I have encountered and is troubling me, is that I assumed that "call cnt" should be the number of records in the input VCF COMP file, however the two numbers would not match.

I found this in truvari.py, which explains why it would not match the number of records in the COMP vcf:

# call count is just of those used were used
stats_box["call cnt"] = stats_box["TP-base"] + stats_box["FP"]

Is this working as intended? Why would it not be "TP-call + FP" for instance?

Thanks!

Filter/report by variant type

Hello,

Is there an option to only compare variants of a certain type (i.e. "DEL") in the base/call set? Alternatively, is there an option that will generate metrics by type? (i.e. precision/recall for DEL, INS, etc.). I'm trying to see the metrics stratified by type.

Thanks

AttributeError: 'IntervalTree' object has no attribute 'search'

Hello, I am using truvari installed via bioconda. I have tried running the simple quick start script on VCF files produced by CNVnator as well as the reference CNV call set released for NA24385. Both tests were comparing the same file to itself just to make sure things worked, but all attempts to run truvari yield this stacktrace with the same error:

(wgspy3) jmajor@research-team-one:~/data/CNVitaeTest/NA24385/output$ truvari -b ~/wgs_resources/data/reference/controls/b37/controls/cnv/NA24385/giab/NA24385.cnv.vcf.gz -c ~/wgs_resources/data/reference/controls/b37/controls/cnv/NA24385/giab/NA24385.cnv.vcf.gz -f ~/wgs_resources/data/reference/human/human_g1k_v37_modified.fasta/human_g1k_v37_modified.fasta -o ./OUT
2019-11-23 11:14:48,643 [INFO] Running /locus/home/jmajor/conda/envs/wgspy3/bin/truvari -b /locus/home/jmajor/wgs_resources/data/reference/controls/b37/controls/cnv/NA24385/giab/NA24385.cnv.vcf.gz -c /locus/home/jmajor/wgs_resources/data/reference/controls/b37/controls/cnv/NA24385/giab/NA24385.cnv.vcf.gz -f /locus/home/jmajor/wgs_resources/data/reference/human/human_g1k_v37_modified.fasta/human_g1k_v37_modified.fasta -o ./OUT
2019-11-23 11:14:48,644 [INFO] Params:
{
"pctovl": 0.0,
"sizemax": 50000,
"includebed": null,
"sizefilt": 30,
"cSample": null,
"reference": "/locus/home/jmajor/wgs_resources/data/reference/human/human_g1k_v37_modified.fasta/human_g1k_v37_modified.fasta",
"no_ref": false,
"multimatch": false,
"comp": "/locus/home/jmajor/wgs_resources/data/reference/controls/b37/controls/cnv/NA24385/giab/NA24385.cnv.vcf.gz",
"typeignore": false,
"refdist": 500,
"pctsize": 0.7,
"passonly": false,
"base": "/locus/home/jmajor/wgs_resources/data/reference/controls/b37/controls/cnv/NA24385/giab/NA24385.cnv.vcf.gz",
"pctsim": 0.7,
"gtcomp": false,
"giabreport": false,
"debug": false,
"output": "./OUT",
"bSample": null,
"sizemin": 50
}
2019-11-23 11:14:48,654 [INFO] Creating call interval tree for overlap search
2019-11-23 11:15:03,429 [INFO] 74012 call variants
2019-11-23 11:15:03,429 [INFO] 45753 call variants within size range (30, 50000)
Traceback (most recent call last):
File "/locus/home/jmajor/conda/envs/wgspy3/bin/truvari", line 990, in
run(sys.argv[1:])
File "/locus/home/jmajor/conda/envs/wgspy3/bin/truvari", line 739, in run
for entry in regions.iterate(vcf_base):
File "/locus/home/jmajor/conda/envs/wgspy3/bin/truvari", line 555, in iterate
if self.include(entry):
File "/locus/home/jmajor/conda/envs/wgspy3/bin/truvari", line 567, in include
return overlaps and len(self.tree[entry.CHROM].search(astart, aend)) == 1
AttributeError: 'IntervalTree' object has no attribute 'search'

I'm eager to get truvari working as it looks like the exact tool for CNV comparison I've been dreading writing myself :-) . I'll poke around and see if I can get more info, but in the meantime, any info on this bug is most welcome.

truvari does not support variants in BND notation.

VCF allows any structural variant to be reported in BND notation. There is no difference between a breakpoint-based caller report a variant as a <DEL> and the caller reporting the same breakpoint in BND notation. Both calls are equivalent.

This means truvari does not support benchmarking of caller such as GRIDSS and Hydra.

Old tabix files fail silently

When the --comp vcf has an outdated tabix index, the program appears to fail silently. Need to check for the validity of the index

setup.py exact dependency versions

Hi,

I was just wondering, is there any reason for the setup.py to have

        "python-Levenshtein==0.12.0",
        "progressbar2==3.41.0",
        "pysam==0.15.2",
        "pyfaidx==0.5.5.2",
        "intervaltree==3.0.2",

and not

        "python-Levenshtein>=0.12.0",
        "progressbar2>=3.41.0",
        "pysam>=0.15.2",
        "pyfaidx>=0.5.5.2",
        "intervaltree>=3.0.2",

I just came across this package because it's broken in the nixpkgs repo, because we are using pyfaidx version 0.5.8.

Thanks for any clarification,
Robert

Running on same CNV VCF file yields no results?

I am running truvari on a CNVnator generated VCF and specifying the same file as -b and -c, expecting too see perfect concordance. But instead truvari is reporting no matches at all? The CNVnator VCF entries look ok to me, ie:

Y 59363501 NA24385_CNVnator_del_2337 T . PASS END=59374000;SVTYPE=DEL;SVLEN=-10500;IMPRECISE;natorRD=0.00185287;natorP1=1.51783e-11;natorP2=2.76568e-212;natorP3=1.87497e-11;natorP4=1.57886e-183;natorQ0=0.999335 GT:CN 1/1:0

This is what the output log looks like

2019-11-23 11:27:03,931 [INFO] Running /home/ubuntu/conda/envs/wgspy3/bin/truvari -b ./NA24385.vcf.gz -c ./NA24385.vcf.gz -o Xxx -f /efs/WGS/data/reference/human/hum
an_g1k_v37_modified.fasta/human_g1k_v37_modified.fasta --sizemax 1000000
2019-11-23 11:27:03,931 [INFO] Params:
{
"pctovl": 0.0,
"sizemax": 1000000,
"includebed": null,
"sizefilt": 30,
"cSample": null,
"reference": "/efs/WGS/data/reference/human/human_g1k_v37_modified.fasta/human_g1k_v37_modified.fasta",
"no_ref": false,
"multimatch": false,
"comp": "./NA24385.vcf.gz",
"typeignore": false,
"refdist": 500,
"pctsize": 0.7,
"passonly": false,
"base": "./NA24385.vcf.gz",
"pctsim": 0.7,
"gtcomp": false,
"giabreport": false,
"debug": false,
"output": "Xxx",
"bSample": null,
"sizemin": 50
}
2019-11-23 11:27:03,944 [INFO] Creating call interval tree for overlap search
2019-11-23 11:27:04,122 [INFO] 2337 call variants
2019-11-23 11:27:04,122 [INFO] 2310 call variants within size range (30, 1000000)
2019-11-23 11:27:04,233 [INFO] 0 base variants
2019-11-23 11:27:04,291 [INFO] Matching base to calls
2019-11-23 11:27:04,405 [WARNING] No TP or FN calls in base!
2019-11-23 11:27:04,405 [INFO] Parsing FPs from calls
2019-11-23 11:27:04,616 [INFO] Stats: {
"FP": 0,
"f1": "NaN",
"base gt filtered": 0,
"precision": 0,
"TP-call": 0,
"call cnt": 0,
"base size filtered": 0,
"FN": 0,
"TP-base": 0,
"base cnt": 0,
"recall": 0,
"call gt filtered": 0,
"call size filtered": 0
}
2019-11-23 11:27:04,627 [INFO] Finished

Usable VCF

Put a note somewhere that vcfs need to pass usable_vcf to be confidently processed.
Some users have a hard time getting their data in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.