Comments (4)
Hello,
Truvari shouldn't be placing a variant in both files. So it sounds like something is going on. To check it, I would need to see the base/comparison VCFs.
from truvari.
Here are the input VCF and the truvari results.
https://cloud.imgag.de/s/AiyFwXWKWqMqt3C
Thanks for looking into it!
from truvari.
Here's what I looked at:
truvari consistency truvari_output/fn.vcf.gz truvari_output/tp-base.vcf.gz
bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\n" truvari_output/fn.vcf.gz | sort > f.txt
bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\n" truvari_output/tp-base.vcf.gz | sort > b.txt
echo "Pulling shared lines"
comm -1 -2 b.txt f.txt
Consistency does essentially the same thing as the bcftools queries and comm command. It shows that there are 5 variants found in both fn and tp-base.
Consistency report
#
# Total 28583 calls across 2 VCFs
#
#File NumCalls
truvari_output/fn.vcf.gz 10151
truvari_output/tp-base.vcf.gz 18437
#
# Summary of consistency
#
#VCFs Calls Pct
2 5 0.02%
1 28578 99.98%
#
# Breakdown of VCFs' consistency
#
#Group Total TotalPct PctOfFileCalls
01 18432 64.49% 0% 99.97%
10 10146 35.50% 99.95% 0%
11 5 0.02% 0.05% 0.03%
Looking at the 5 variants in detail with the comm
command:
"repeated" variants
chr1 211111841 GCTGTACTGTCTGGGAAGTGAGGAGCACCTCTGCTTGGCTGCCCACCATCTGGGAAGTGAGGAGTGCCTCTGCCTGGCTACTGCACCGTCTAGGAAGTGAGGTGCCCCTCTGCCTGGCCA G
chr10 132852325 T TTAAGAATTCTCAGATCCTGATCCACAGACGTAGTATGTTCCTCCAGTTACTTAGGAATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACTTAAGACTTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACGTAGGAATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTTCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTTCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCAGTTACTTAGGAATTCTCAGATCCTGATCCACAGACGTGGTATGTTTCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACG
chr10 1473941 A AGGGGGCCGGGTAGTTTCTAGTGCAGGAGACACGCCTTTCCTTGCTGAGCCCCCGGATCATGGGGGAGGCCTCGACG
chr19 404269 TCCTGAGGGGTCTGAGGGGGGGACGGGGTCCTGCCCTGAGGGGGTGTGGGGGGGACGGGGTCCTGCCCTGAGGGGTGTGGGGGGGGGACGGGGTCCTGC T
chr2 2884197 ATAAATACTTTATATATGTGTATATGTAAATACTTTATATATGTGTATATGTAAATACTTTATATATGTGTACATG A
Pulling any of those POS via e.g. zgrep 211111841 GRCh38_HG2-verrkoV1.1-V0.6_dipcall-z2k.vcf.gz
shows:
chr1 211111841 . GCTGTACTGTCTGGGAAGTGAGGAGCACCTCTGCTTGGCTGCCCACCATCTGGGAAGTGAGGAGTGCCTCTGCCTGGCTACTGCACCGTCTAGGAAGTGAGGTGCCCCTCTGCCTGGCCA G 30 . REPTYPE=CONTRAC;BREAKSIMLENGTH=425;REFWIDENED=chr1:211111730-211112273 GT:AD 1|0:1,1
chr1 211111841 . GCTGTACTGTCTGGGAAGTGAGGAGCACCTCTGCTTGGCTGCCCACCATCTGGGAAGTGAGGAGTGCCTCTGCCTGGCTACTGCACCGTCTAGGAAGTGAGGTGCCCCTCTGCCTGGCCA G 30 . REPTYPE=CONTRAC;BREAKSIMLENGTH=425;REFWIDENED=chr1:211111730-211112273 GT:AD 0|1:0,1
You can see that the differences are only the phase of the genotype, which is why truvari and the bcftools; comm
commands are saying they're the same variant, but they're actually two different variants... Or... the same variant on different haplotypes.
Presumably this is a reporting error where the vcf should make this a 1|1
variant. But as is, and given default parameters, I believe Truvari is working as expected. Having said that, I think you may be interested in the --pick
parameter. By default, each variant representation gets to participate in a single
match. In this case, if you were to allow variant to match up to ac
times, the FN would be become TP since the TP's counterpart at chr1:211111841
(found through MatchId=1288.1.0
) is...
chr1 211111842 Sniffles2.DEL.87AES0 N <DEL> 60 PASS PRECISE;SVLEN=-119;SVTYPE=DEL;SUPPORT=73;END=211111961;STDEV_POS=12.349;STDEV_LEN=0.777;COVERAGE=77,76,74,73,70;STRAND=+-;RNAMES=<truncated>;AF=0.986;PctSeqSimilarity=0.9537;PctSizeSimilarity=1;PctRecOverlap=0.9917;SizeDiff=0;StartDistance=-1;EndDistance=-1;GTMatch=1;TruScore=98;MatchId=1288.1.0 GT:GQ:DR:DV 1/1:60:1:73
... a homozygous variant. Have fun benchmarking!
p.s. You may be interested in truvari refine
. Basic bench
has a known limitation with 1-to-1 variant matching which explains part of the difference between truvari and hapeval reports. refine
overcomes that limitation via variant harmonization. It's currently only built to be used with a bed-file with fairly tight boundaries around regions that could benefit from being harmonized. If you're interested in exploring refine
, let me know because my development roadmap includes building a way to help users make their 'tight boundaries' bed file.
from truvari.
Just wanted to follow up and let you know that I've finished the feature that 'hooks' truvari bench
with truvari refine
for whole genome data and finds the closest estimate of benchmarking performance. It's currently available in the develop branch of the repository. I'll cut an official Truvari v4.1 release next week. Documentation on how to run it is available in the wiki.
from truvari.
Related Issues (20)
- Truvari bench finds no calls in vcf file. HOT 5
- query about VCF bed intersection HOT 2
- error from collapse function : TypeError: invalid value for Integer format HOT 2
- Can't install via conda or pip HOT 2
- Error with Truvari collapse from bcftools-merged Manta VCFs HOT 1
- --choose behavior HOT 5
- Output from truvari bench command HOT 13
- Error building Docker image HOT 1
- SVLEN conversion error HOT 3
- some tests failed on macOS HOT 3
- GT-precision calculation HOT 3
- 'Structural' is spelled wrong in the first line of the README
- TypeError: values expected to be 2-tuple, given len=1 HOT 2
- Are "TP-base" and "TP-comp" always the same value? HOT 1
- Error while installing on macOS M2 (Silica chipset) HOT 3
- The SVLEN and -m flag have somewhat unexpected behavior when REF and ALT are both long HOT 2
- Handling of symbolic variants without ref sequence HOT 1
- Truvari phab fails to read VCF headers from output file and crashes HOT 4
- Truvari refine hangs indefinitely HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from truvari.