Comments (5)
Hello,
The traceback lines are made somewhere around truvari.region_vcf_iter:build_anno_tree
which is called in truvari.refine.resolve_regions
, which is run between refine's logging.info("Params:...
and logging.info("%d regions to be refined"
.
So the traceback is stuff that happens during the first half of the setup before the variants/regions are sent to be harmonized. You're hanging during truvari.phab
.
If I had to guess, you're hanging somewhere around the call pysam's bcftools consensus dispatch which translates the variants back into the samples' haplotypes. This step has given users problems in the past due to bcftools consensus
having a problem with the VCF's format, but not being very friendly about telling us why.
Is it possible for you to share your input.vcf.gz? I could debug it to see if there's anything that jumps out. If you cannot share it, let me know and I'll need to spend a few hours writing documentation to teach you how to debug it
Have a great day,
~/Adam
from truvari.
Making notes for posterity here regarding the follow up email
- INFO/END being ill defined breaks bcftools consensus
- pysam 0.19 doesn't seem to hang, updated truvari's minimum requirement
- Running bcftools norm without
--do-not-normalize
might be a gamble
from truvari.
Leaving this here for future reference.
When dealing with VCFs in CNV:TR format, this commands allow to translate the calls into something that works with bench+refine (I split it in intermediate steps for clarity).
# split multi allelic variants
bcftools norm -m-any -N -o 1_split_variants.vcf
# populate the REF and ALT fields with putative repeats and the INFO fields with minimal information. For example, if REF has 4 copies and ALT 5 copies of the GCC pattern:
REF ALT INFO
AGCCGCCGCCGCC AGCCGCCGCCGCCGCC SVTYPE=INS;SVLEN=3;RUS=GCC ...
# The A is the base before the variant on the reference (REF field in the CNV:TR format)
# sort
bcftools sort $@/2_fill_ref_alt.vcf -o $@/3_sorted.vcf
bgzip -c $@/3_sorted.vcf > $@/3_sorted.vcf.gz
tabix -p vcf $@/3_sorted.vcf.gz
# correct the REF field to be the same as the reference sequence (required by bcftools consensus to work)
bcftools +fill-from-fasta 3_sorted.vcf.gz -- -c REF -f $(GENOME_FASTA) > 4_fix_ref.vcf
A couple of things to keep in mind:
- the reason for the hang is bcftools consensus failing to retrieve the haplotype sequence. to make troubleshooting easier, one can just run the
bcftools consensus -H1 -f $(GENOME_FASTA)
command alone to get more meaningful errors - I still get the hang in truvari with
pysam 0.21
- one sneaky problem is also duplicate records that cause
bcftools consensus
to fail and truvari to hang - Be wary of running bcftools norm without the
-N
option, it could mess up the records in unpredictable ways
from truvari.
@ACEnglish Truvari refine hangs indefinitely even using the 4.1.1.dev version and pysam 0.22. It hangs mostly when I use the TR benchmark. When I use the CMRG SV benchmark it works ok. Any suggestions?
I prepared the vcf file the same way as noted above by @andy941 but still hangs.
from truvari.
I don't have enough information to provide a useful answer.
- Did you attempt his note on "the reason for the hang is bcftools consensus failing to retrieve the haplotype sequence. to make troubleshooting easier, one can just run the bcftools consensus -H1 -f $(GENOME_FASTA) command alone to get more meaningful errors"?
- What about "one sneaky problem is also duplicate records that cause bcftools consensus to fail and truvari to hang"?
- Are you using the same input VCF for your runs against CMRG and the TR benchmark? If not, what is different about them?
from truvari.
Related Issues (20)
- BED Region off-by-one error HOT 4
- Zero matches between base and comp HOT 4
- AttributeError: 'CollapsedCalls' object has no attribute 'consolidate' | version 4.2.1 HOT 4
- Calculate SNV HOT 7
- complex genotype problem HOT 3
- GT integrate HOT 1
- No TP or FP calls for CNV HOT 1
- merging different SV type? HOT 3
- No FP or TP calls HOT 2
- Unable to run MAFFT HOT 9
- md5sum FIPS issue HOT 1
- Support vector for intra-sample merge HOT 6
- some questions about the results in fp.vcf.gz
- some questions about the results in fp.vcf.gz HOT 1
- Getting same numbers of TP-base and TP-comp HOT 4
- Suggested minor documentation changes
- Truvari, STRs and Expansion Hunter - Query HOT 2
- Bug in benchmarking HOT 4
- Request: truvari collapse --keep option to mantain the ALT sequence HOT 1
- Inquiry on the Determination of Representative Structural Variants in Merged VCF Sets HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from truvari.