Giter Club home page Giter Club logo

Comments (5)

ACEnglish avatar ACEnglish commented on July 23, 2024

Hello,

The traceback lines are made somewhere around truvari.region_vcf_iter:build_anno_tree which is called in truvari.refine.resolve_regions, which is run between refine's logging.info("Params:... and logging.info("%d regions to be refined".

So the traceback is stuff that happens during the first half of the setup before the variants/regions are sent to be harmonized. You're hanging during truvari.phab.

If I had to guess, you're hanging somewhere around the call pysam's bcftools consensus dispatch which translates the variants back into the samples' haplotypes. This step has given users problems in the past due to bcftools consensus having a problem with the VCF's format, but not being very friendly about telling us why.

Is it possible for you to share your input.vcf.gz? I could debug it to see if there's anything that jumps out. If you cannot share it, let me know and I'll need to spend a few hours writing documentation to teach you how to debug it

Have a great day,
~/Adam

from truvari.

ACEnglish avatar ACEnglish commented on July 23, 2024

Making notes for posterity here regarding the follow up email

  1. INFO/END being ill defined breaks bcftools consensus
  2. pysam 0.19 doesn't seem to hang, updated truvari's minimum requirement
  3. Running bcftools norm without --do-not-normalize might be a gamble

from truvari.

andy941 avatar andy941 commented on July 23, 2024

Leaving this here for future reference.

When dealing with VCFs in CNV:TR format, this commands allow to translate the calls into something that works with bench+refine (I split it in intermediate steps for clarity).

# split multi allelic variants
bcftools norm -m-any -N -o 1_split_variants.vcf

# populate the REF and ALT fields with putative repeats and the INFO fields with minimal information. For example, if REF has 4 copies and ALT 5 copies of the GCC pattern:
REF                 ALT                INFO
AGCCGCCGCCGCC       AGCCGCCGCCGCCGCC   SVTYPE=INS;SVLEN=3;RUS=GCC ...
# The A is the base before the variant on the reference (REF field in the CNV:TR format)

# sort 
bcftools sort $@/2_fill_ref_alt.vcf -o $@/3_sorted.vcf
bgzip -c $@/3_sorted.vcf > $@/3_sorted.vcf.gz
tabix -p vcf $@/3_sorted.vcf.gz

# correct the REF field to be the same as the reference sequence (required by bcftools consensus to work)
bcftools +fill-from-fasta 3_sorted.vcf.gz -- -c REF -f $(GENOME_FASTA) > 4_fix_ref.vcf

A couple of things to keep in mind:

  • the reason for the hang is bcftools consensus failing to retrieve the haplotype sequence. to make troubleshooting easier, one can just run the bcftools consensus -H1 -f $(GENOME_FASTA) command alone to get more meaningful errors
  • I still get the hang in truvari with pysam 0.21
  • one sneaky problem is also duplicate records that cause bcftools consensus to fail and truvari to hang
  • Be wary of running bcftools norm without the -N option, it could mess up the records in unpredictable ways

from truvari.

kokyriakidis avatar kokyriakidis commented on July 23, 2024

@ACEnglish Truvari refine hangs indefinitely even using the 4.1.1.dev version and pysam 0.22. It hangs mostly when I use the TR benchmark. When I use the CMRG SV benchmark it works ok. Any suggestions?

I prepared the vcf file the same way as noted above by @andy941 but still hangs.

from truvari.

ACEnglish avatar ACEnglish commented on July 23, 2024

I don't have enough information to provide a useful answer.

  • Did you attempt his note on "the reason for the hang is bcftools consensus failing to retrieve the haplotype sequence. to make troubleshooting easier, one can just run the bcftools consensus -H1 -f $(GENOME_FASTA) command alone to get more meaningful errors"?
  • What about "one sneaky problem is also duplicate records that cause bcftools consensus to fail and truvari to hang"?
  • Are you using the same input VCF for your runs against CMRG and the TR benchmark? If not, what is different about them?

from truvari.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.