Giter Club home page Giter Club logo

Comments (5)

ACEnglish avatar ACEnglish commented on June 16, 2024

Hello,

Those warnings are just warnings, not errors. Truvari will throw errors if there is a problem with the VCF such as corrupted qual fields. Instead, the warnings are telling us that the --base VCF file has no SVs. Please check the file and make sure there are SVs. For Truvari to consider a VCF entry to be an SV, it has to pass some set of conditionals:

  • The call must be between --sizemin and --sizemax in length. See docs for how size is determined
  • The call must be non-reference-homozygous (0/0) when using --no-ref
  • The call must have a PASS for FILTER if --passonly
  • The call's start and end must land within a region from the --includebed if it was provided

It's possible I'm forgetting something, so please check these things and if you're still having issues I will need to see the VCF. Perhaps not the entire file, but just your hg38_SV_simulation_noSNP_sorted_fixed.vcf.gz if that file is used as the --base or the other non-mason VCF I believe you described you're comparing against. I wouldn't need to see the entire file, just a handful of entries you believe are SVs but Truvari is disagreeing.

Have a great day,
~/Adam English

from truvari.

Irallia avatar Irallia commented on June 16, 2024

Hello,
thank you for your answer. I checked the conditionals you gave me, but I still have empty base calls.

Hello,

Those warnings are just warnings, not errors. Truvari will throw errors if there is a problem with the VCF such as corrupted qual fields. Instead, the warnings are telling us that the --base VCF file has no SVs.

Sorry, of course I meant warnings, not errors.

Please check the file and make sure there are SVs. For Truvari to consider a VCF entry to be an SV, it has to pass some set of conditionals:

* The call must be between `--sizemin` and `--sizemax` in length. See [docs](https://truvari.readthedocs.io/en/latest/truvari.html#entry-size) for how size is determined

Yes, there is an SVLEN. Plus we have start and end information.

* The call must be non-reference-homozygous (0/0) when using `--no-ref`

I hope that the GT ./. works, as it works in my other vcf files.

* The call must have a PASS for FILTER if `--passonly`

Yes mason does only create entrys with PASS.

* The call's start and end must land within a region from the `--includebed` if it was provided

The bed file is created by convert2bed --input=vcf --output=bed < hg38_SV_simulation_noSNP_sorted_fixed.vcf > hg38_SV_simulation_noSNP.bed thus it should include all regions.

It's possible I'm forgetting something, so please check these things and if you're still having issues I will need to see the VCF. Perhaps not the entire file, but just your hg38_SV_simulation_noSNP_sorted_fixed.vcf.gz if that file is used as the --base or the other non-mason VCF I believe you described you're comparing against. I wouldn't need to see the entire file, just a handful of entries you believe are SVs but Truvari is disagreeing.

Here is the start of my hg38_SV_simulation_noSNP_sorted_fixed.vcf.gz and I think at least the sim_sv_indel* ones should got used by truvari.

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=TARGETPOS,Number=1,Type=String,Description="Target position for duplications.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=chr21,length=46709983>
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=INS,Description="Insertion of novel sequence">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##reference=hg38_chr21.fa
##source=mason_variator
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	simulated
chr21	5030334	sim_trans_0	N	A[chr21:5031186[	1	PASS	SVTYPE=BND	GT	./.
chr21	5030335	sim_trans_0	N	[chr21:5047468[A	1	PASS	SVTYPE=BND	GT	./.
chr21	5031185	sim_trans_0	N	C[chr21:5047469[	1	PASS	SVTYPE=BND	GT	./.
chr21	5031186	sim_trans_0	N	[chr21:5030334[A	1	PASS	SVTYPE=BND	GT	./.
chr21	5047468	sim_trans_0	N	T[chr21:5030335[	1	PASS	SVTYPE=BND	GT	./.
chr21	5047469	sim_trans_0	N	[chr21:5031185[C	1	PASS	SVTYPE=BND	GT	./.
chr21	5060293	sim_dup_0	N	<DUP>	1	PASS	END=5073031;SVLEN=12738;SVTYPE=DUP;TARGETPOS=chr21:5087033	GT	./.
chr21	5105328	sim_small_indel_0	N	CTTTCAC	1	PASS	.	GT	./.
chr21	5126727	sim_small_indel_1	N	C	1	PASS	.	GT	./.
chr21	5127415	sim_sv_indel_0	N	<DEL>	1	PASS	SVLEN=-18647;SVTYPE=DEL;END=5146062	GT	./.
chr21	5152875	sim_small_indel_2	N	ACT	1	PASS	.	GT	./.
chr21	5217965	sim_sv_indel_1	N	<INS>	1	PASS	SVLEN=5919;SVTYPE=INS;END=5217965	GT	./.
chr21	5223979	sim_sv_indel_2	N	<DEL>	1	PASS	SVLEN=-9486;SVTYPE=DEL;END=5233465	GT	./.
chr21	5252802	sim_trans_1	N	T[chr21:5271385[	1	PASS	SVTYPE=BND	GT	./.
chr21	5252803	sim_trans_1	N	[chr21:5288484[T	1	PASS	SVTYPE=BND	GT	./.
chr21	5256763	sim_small_indel_3	N	C	1	PASS	.	GT	./.
chr21	5271384	sim_trans_1	N	A[chr21:5288485[	1	PASS	SVTYPE=BND	GT	./.
chr21	5271385	sim_trans_1	N	[chr21:5252802[A	1	PASS	SVTYPE=BND	GT	./.
chr21	5288484	sim_trans_1	N	T[chr21:5252803[	1	PASS	SVTYPE=BND	GT	./.
chr21	5288485	sim_trans_1	N	[chr21:5271384[T	1	PASS	SVTYPE=BND	GT	./.
chr21	5315180	sim_small_indel_4	N	AGACAGAGAGGCTTGGA	1	PASS	.	GT	./.
chr21	5316839	sim_inv_0	N	<INV>	1	PASS	END=5335052;SVLEN=18213;SVTYPE=INV	GT	./.
chr21	5337350	sim_dup_1	N	<DUP>	1	PASS	END=5350120;SVLEN=12770;SVTYPE=DUP;TARGETPOS=chr21:5366707	GT	./.
...

Have a great day, ~/Adam English

Thank you for your quick and helpful answer!
Best Lydia Buntrock

from truvari.

ACEnglish avatar ACEnglish commented on June 16, 2024

I believe the problem might be with the --includebed. Could you try to run once without that file and see if it changes anything?

The reason I'm suspicious of the bed is that the example entries you provided above worked just fine (see below).
It could be as simple as an off by one error since beds/vcfs use different indexing (0/1 based respectively). I'd also be curious to see what would happen if you inflated your bed regions (see bedtools slop here).

git checkout tags/v3.0.0
python3 -m pip install .
truvari bench -b ticket137.vcf.gz -c ticket137.vcf.gz -o test3.0 -f reference/grch38/GRCh38_1kg_mainchrs.fa
cat test3.0/summary.txt
{
    "TP-base": 6,
    "TP-call": 6,
    "FP": 0,
    "FN": 0,
    "precision": 1.0,
    "recall": 1.0,
    "f1": 1.0,
    "base cnt": 6,
    "call cnt": 6,
    "TP-call_TP-gt": 6,
    "TP-call_FP-gt": 0,
    "TP-base_TP-gt": 6,
    "TP-base_FP-gt": 0,
    "gt_concordance": 1.0
}

from truvari.

Irallia avatar Irallia commented on June 16, 2024

It was indeed the bedfile. Thank you very much for your help. Now I have learned a lot more about truvari.
From my point of view, the issue can now be closed.

from truvari.

ACEnglish avatar ACEnglish commented on June 16, 2024

Thanks for reporting this.

As a note: I looked into it more and I believe you've actually found an error. Apparently pyintervaltree isn't behaving as I was expecting and therefore your bed files didn't contain the SVs. For example, say we have coordinates from BedOps convert2bed using repo_utils/test_files/input1.vcf.gz :

>>> x = IntervalTree()
>>> x.addi(66234, 66235) # convert2bed coordinates of first entry
>>> x.overlaps(66234) # pysam.VariantRecord.start
True
>>> x.overlaps(66235) # pysam.VariantRecord.stop
False

Position 66235 not overlapping position 66235 is not desired behavior. I'll close this ticket after I get the change checked in

from truvari.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.