Giter Club home page Giter Club logo

Comments (4)

ACEnglish avatar ACEnglish commented on June 25, 2024

Hello,

Truvari shouldn't be placing a variant in both files. So it sounds like something is going on. To check it, I would need to see the base/comparison VCFs.

from truvari.

caspargross avatar caspargross commented on June 25, 2024

Here are the input VCF and the truvari results.
https://cloud.imgag.de/s/AiyFwXWKWqMqt3C

Thanks for looking into it!

from truvari.

ACEnglish avatar ACEnglish commented on June 25, 2024

Here's what I looked at:

truvari consistency truvari_output/fn.vcf.gz truvari_output/tp-base.vcf.gz
bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\n" truvari_output/fn.vcf.gz | sort > f.txt
bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\n" truvari_output/tp-base.vcf.gz | sort > b.txt
echo "Pulling shared lines"
comm -1 -2 b.txt f.txt

Consistency does essentially the same thing as the bcftools queries and comm command. It shows that there are 5 variants found in both fn and tp-base.

Consistency report
#
# Total 28583 calls across 2 VCFs
#
#File	NumCalls
truvari_output/fn.vcf.gz	10151
truvari_output/tp-base.vcf.gz	18437
#
# Summary of consistency
#
#VCFs	Calls	Pct
2	5	0.02%
1	28578	99.98%
#
# Breakdown of VCFs' consistency
#
#Group	Total	TotalPct	PctOfFileCalls
01	18432	64.49%	0% 99.97%
10	10146	35.50%	99.95% 0%
11	5	0.02%	0.05% 0.03%

Looking at the 5 variants in detail with the comm command:

"repeated" variants
chr1	211111841	GCTGTACTGTCTGGGAAGTGAGGAGCACCTCTGCTTGGCTGCCCACCATCTGGGAAGTGAGGAGTGCCTCTGCCTGGCTACTGCACCGTCTAGGAAGTGAGGTGCCCCTCTGCCTGGCCA	G
chr10	132852325	T	TTAAGAATTCTCAGATCCTGATCCACAGACGTAGTATGTTCCTCCAGTTACTTAGGAATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACTTAAGACTTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACGTAGGAATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTTCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTTCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCAGTTACTTAGGAATTCTCAGATCCTGATCCACAGACGTGGTATGTTTCTCCATTTACTTAGACATTCTCAGATCCTGATCCACAGACGTGGTATGTTCCTCCATTTACG
chr10	1473941	A	AGGGGGCCGGGTAGTTTCTAGTGCAGGAGACACGCCTTTCCTTGCTGAGCCCCCGGATCATGGGGGAGGCCTCGACG
chr19	404269	TCCTGAGGGGTCTGAGGGGGGGACGGGGTCCTGCCCTGAGGGGGTGTGGGGGGGACGGGGTCCTGCCCTGAGGGGTGTGGGGGGGGGACGGGGTCCTGC	T
chr2	2884197	ATAAATACTTTATATATGTGTATATGTAAATACTTTATATATGTGTATATGTAAATACTTTATATATGTGTACATG	A

Pulling any of those POS via e.g. zgrep 211111841 GRCh38_HG2-verrkoV1.1-V0.6_dipcall-z2k.vcf.gz shows:

chr1	211111841	.	GCTGTACTGTCTGGGAAGTGAGGAGCACCTCTGCTTGGCTGCCCACCATCTGGGAAGTGAGGAGTGCCTCTGCCTGGCTACTGCACCGTCTAGGAAGTGAGGTGCCCCTCTGCCTGGCCA	G	30	.	REPTYPE=CONTRAC;BREAKSIMLENGTH=425;REFWIDENED=chr1:211111730-211112273	GT:AD	1|0:1,1
chr1	211111841	.	GCTGTACTGTCTGGGAAGTGAGGAGCACCTCTGCTTGGCTGCCCACCATCTGGGAAGTGAGGAGTGCCTCTGCCTGGCTACTGCACCGTCTAGGAAGTGAGGTGCCCCTCTGCCTGGCCA	G	30	.	REPTYPE=CONTRAC;BREAKSIMLENGTH=425;REFWIDENED=chr1:211111730-211112273	GT:AD	0|1:0,1

You can see that the differences are only the phase of the genotype, which is why truvari and the bcftools; comm commands are saying they're the same variant, but they're actually two different variants... Or... the same variant on different haplotypes.

Presumably this is a reporting error where the vcf should make this a 1|1 variant. But as is, and given default parameters, I believe Truvari is working as expected. Having said that, I think you may be interested in the --pick parameter. By default, each variant representation gets to participate in a single match. In this case, if you were to allow variant to match up to ac times, the FN would be become TP since the TP's counterpart at chr1:211111841 (found through MatchId=1288.1.0) is...

chr1	211111842	Sniffles2.DEL.87AES0	N	<DEL>	60	PASS	PRECISE;SVLEN=-119;SVTYPE=DEL;SUPPORT=73;END=211111961;STDEV_POS=12.349;STDEV_LEN=0.777;COVERAGE=77,76,74,73,70;STRAND=+-;RNAMES=<truncated>;AF=0.986;PctSeqSimilarity=0.9537;PctSizeSimilarity=1;PctRecOverlap=0.9917;SizeDiff=0;StartDistance=-1;EndDistance=-1;GTMatch=1;TruScore=98;MatchId=1288.1.0	GT:GQ:DR:DV	1/1:60:1:73

... a homozygous variant. Have fun benchmarking!

p.s. You may be interested in truvari refine. Basic bench has a known limitation with 1-to-1 variant matching which explains part of the difference between truvari and hapeval reports. refine overcomes that limitation via variant harmonization. It's currently only built to be used with a bed-file with fairly tight boundaries around regions that could benefit from being harmonized. If you're interested in exploring refine, let me know because my development roadmap includes building a way to help users make their 'tight boundaries' bed file.

from truvari.

ACEnglish avatar ACEnglish commented on June 25, 2024

Just wanted to follow up and let you know that I've finished the feature that 'hooks' truvari bench with truvari refine for whole genome data and finds the closest estimate of benchmarking performance. It's currently available in the develop branch of the repository. I'll cut an official Truvari v4.1 release next week. Documentation on how to run it is available in the wiki.

from truvari.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.