Comments (5)
Hello,
Those warnings are just warnings, not errors. Truvari will throw errors if there is a problem with the VCF such as corrupted qual fields. Instead, the warnings are telling us that the --base
VCF file has no SVs. Please check the file and make sure there are SVs. For Truvari to consider a VCF entry to be an SV, it has to pass some set of conditionals:
- The call must be between
--sizemin
and--sizemax
in length. See docs for how size is determined - The call must be non-reference-homozygous (0/0) when using
--no-ref
- The call must have a PASS for FILTER if
--passonly
- The call's start and end must land within a region from the
--includebed
if it was provided
It's possible I'm forgetting something, so please check these things and if you're still having issues I will need to see the VCF. Perhaps not the entire file, but just your hg38_SV_simulation_noSNP_sorted_fixed.vcf.gz
if that file is used as the --base
or the other non-mason VCF I believe you described you're comparing against. I wouldn't need to see the entire file, just a handful of entries you believe are SVs but Truvari is disagreeing.
Have a great day,
~/Adam English
from truvari.
Hello,
thank you for your answer. I checked the conditionals you gave me, but I still have empty base calls.
Hello,
Those warnings are just warnings, not errors. Truvari will throw errors if there is a problem with the VCF such as corrupted qual fields. Instead, the warnings are telling us that the
--base
VCF file has no SVs.
Sorry, of course I meant warnings, not errors.
Please check the file and make sure there are SVs. For Truvari to consider a VCF entry to be an SV, it has to pass some set of conditionals:
* The call must be between `--sizemin` and `--sizemax` in length. See [docs](https://truvari.readthedocs.io/en/latest/truvari.html#entry-size) for how size is determined
Yes, there is an SVLEN
. Plus we have start and end information.
* The call must be non-reference-homozygous (0/0) when using `--no-ref`
I hope that the GT ./.
works, as it works in my other vcf files.
* The call must have a PASS for FILTER if `--passonly`
Yes mason does only create entrys with PASS
.
* The call's start and end must land within a region from the `--includebed` if it was provided
The bed file is created by convert2bed --input=vcf --output=bed < hg38_SV_simulation_noSNP_sorted_fixed.vcf > hg38_SV_simulation_noSNP.bed
thus it should include all regions.
It's possible I'm forgetting something, so please check these things and if you're still having issues I will need to see the VCF. Perhaps not the entire file, but just your
hg38_SV_simulation_noSNP_sorted_fixed.vcf.gz
if that file is used as the--base
or the other non-mason VCF I believe you described you're comparing against. I wouldn't need to see the entire file, just a handful of entries you believe are SVs but Truvari is disagreeing.
Here is the start of my hg38_SV_simulation_noSNP_sorted_fixed.vcf.gz
and I think at least the sim_sv_indel*
ones should got used by truvari.
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=TARGETPOS,Number=1,Type=String,Description="Target position for duplications.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr21,length=46709983>
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=INS,Description="Insertion of novel sequence">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##reference=hg38_chr21.fa
##source=mason_variator
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT simulated
chr21 5030334 sim_trans_0 N A[chr21:5031186[ 1 PASS SVTYPE=BND GT ./.
chr21 5030335 sim_trans_0 N [chr21:5047468[A 1 PASS SVTYPE=BND GT ./.
chr21 5031185 sim_trans_0 N C[chr21:5047469[ 1 PASS SVTYPE=BND GT ./.
chr21 5031186 sim_trans_0 N [chr21:5030334[A 1 PASS SVTYPE=BND GT ./.
chr21 5047468 sim_trans_0 N T[chr21:5030335[ 1 PASS SVTYPE=BND GT ./.
chr21 5047469 sim_trans_0 N [chr21:5031185[C 1 PASS SVTYPE=BND GT ./.
chr21 5060293 sim_dup_0 N <DUP> 1 PASS END=5073031;SVLEN=12738;SVTYPE=DUP;TARGETPOS=chr21:5087033 GT ./.
chr21 5105328 sim_small_indel_0 N CTTTCAC 1 PASS . GT ./.
chr21 5126727 sim_small_indel_1 N C 1 PASS . GT ./.
chr21 5127415 sim_sv_indel_0 N <DEL> 1 PASS SVLEN=-18647;SVTYPE=DEL;END=5146062 GT ./.
chr21 5152875 sim_small_indel_2 N ACT 1 PASS . GT ./.
chr21 5217965 sim_sv_indel_1 N <INS> 1 PASS SVLEN=5919;SVTYPE=INS;END=5217965 GT ./.
chr21 5223979 sim_sv_indel_2 N <DEL> 1 PASS SVLEN=-9486;SVTYPE=DEL;END=5233465 GT ./.
chr21 5252802 sim_trans_1 N T[chr21:5271385[ 1 PASS SVTYPE=BND GT ./.
chr21 5252803 sim_trans_1 N [chr21:5288484[T 1 PASS SVTYPE=BND GT ./.
chr21 5256763 sim_small_indel_3 N C 1 PASS . GT ./.
chr21 5271384 sim_trans_1 N A[chr21:5288485[ 1 PASS SVTYPE=BND GT ./.
chr21 5271385 sim_trans_1 N [chr21:5252802[A 1 PASS SVTYPE=BND GT ./.
chr21 5288484 sim_trans_1 N T[chr21:5252803[ 1 PASS SVTYPE=BND GT ./.
chr21 5288485 sim_trans_1 N [chr21:5271384[T 1 PASS SVTYPE=BND GT ./.
chr21 5315180 sim_small_indel_4 N AGACAGAGAGGCTTGGA 1 PASS . GT ./.
chr21 5316839 sim_inv_0 N <INV> 1 PASS END=5335052;SVLEN=18213;SVTYPE=INV GT ./.
chr21 5337350 sim_dup_1 N <DUP> 1 PASS END=5350120;SVLEN=12770;SVTYPE=DUP;TARGETPOS=chr21:5366707 GT ./.
...
Have a great day, ~/Adam English
Thank you for your quick and helpful answer!
Best Lydia Buntrock
from truvari.
I believe the problem might be with the --includebed
. Could you try to run once without that file and see if it changes anything?
The reason I'm suspicious of the bed is that the example entries you provided above worked just fine (see below).
It could be as simple as an off by one error since beds/vcfs use different indexing (0/1 based respectively). I'd also be curious to see what would happen if you inflated your bed regions (see bedtools slop
here).
git checkout tags/v3.0.0
python3 -m pip install .
truvari bench -b ticket137.vcf.gz -c ticket137.vcf.gz -o test3.0 -f reference/grch38/GRCh38_1kg_mainchrs.fa
cat test3.0/summary.txt
{
"TP-base": 6,
"TP-call": 6,
"FP": 0,
"FN": 0,
"precision": 1.0,
"recall": 1.0,
"f1": 1.0,
"base cnt": 6,
"call cnt": 6,
"TP-call_TP-gt": 6,
"TP-call_FP-gt": 0,
"TP-base_TP-gt": 6,
"TP-base_FP-gt": 0,
"gt_concordance": 1.0
}
from truvari.
It was indeed the bedfile. Thank you very much for your help. Now I have learned a lot more about truvari.
From my point of view, the issue can now be closed.
from truvari.
Thanks for reporting this.
As a note: I looked into it more and I believe you've actually found an error. Apparently pyintervaltree isn't behaving as I was expecting and therefore your bed files didn't contain the SVs. For example, say we have coordinates from BedOps convert2bed
using repo_utils/test_files/input1.vcf.gz
:
>>> x = IntervalTree()
>>> x.addi(66234, 66235) # convert2bed coordinates of first entry
>>> x.overlaps(66234) # pysam.VariantRecord.start
True
>>> x.overlaps(66235) # pysam.VariantRecord.stop
False
Position 66235 not overlapping position 66235 is not desired behavior. I'll close this ticket after I get the change checked in
from truvari.
Related Issues (20)
- Truvari refine fails when no regions to refine HOT 4
- TypeError, 'NoneType' and 'NoneType' HOT 2
- Duplication to Insertion doubt HOT 4
- Failure in pip installation HOT 2
- Question: Does truvari have a upper limit on the file size? How to speed up? HOT 2
- BED Region off-by-one error HOT 4
- Zero matches between base and comp HOT 4
- AttributeError: 'CollapsedCalls' object has no attribute 'consolidate' | version 4.2.1 HOT 4
- Calculate SNV HOT 7
- complex genotype problem HOT 3
- GT integrate HOT 1
- No TP or FP calls for CNV HOT 1
- merging different SV type? HOT 3
- No FP or TP calls HOT 2
- Unable to run MAFFT HOT 9
- md5sum FIPS issue HOT 1
- Support vector for intra-sample merge HOT 6
- some questions about the results in fp.vcf.gz
- some questions about the results in fp.vcf.gz HOT 1
- Getting same numbers of TP-base and TP-comp HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from truvari.