Giter Club home page Giter Club logo

Comments (7)

AndreaGuarracino avatar AndreaGuarracino commented on August 11, 2024

Hi @jdamas13, can you confirm you are using vg 1.40.0?

from pggb.

ekg avatar ekg commented on August 11, 2024

For clarity, this is not the most recent vg version. There has been a regression in vg deconstruct in recent versions, and only a specific range of versions, ending at 1.40.0, will work.

from pggb.

jdamas13 avatar jdamas13 commented on August 11, 2024

Hi, I was using the nf-core/pangenome dev docker container, which has vg: variation graph tool, version v1.40.0 "Suardi".

from pggb.

vatanparast avatar vatanparast commented on August 11, 2024

I got the same error using latest singularity version.

vg deconstruct -P Cantata -H # -e -a -t 4 community.9/pg2-pg5_prefixed-50kb.community.9.fa.gz.bf3285f.11fba48.867196c.smooth.final.gfa
457.70s user 15.60s system 222% cpu 213.00s total 2875776Kb max memory
[vg::deconstruct] decompose VCF
vcfwave 1.0.7 processing...
error: more sample names in header than sample fields
samples: PG5
line: Cantata#1#Scf9YQZ_25_HRSCAF_39    19      >1823810>1823812        CC      C       60.0    .       AC=0;AF=0;AN=0;AT=<1823812<1823811<1823810,<1823812<1823810;NS=0;LV=0   GT
Command exited with non-zero status 1

vg:
version: v1.40.0
deconstruct: Cantata:1000
reporting:
version: v1.21
multiqc: true

from pggb.

glennhickey avatar glennhickey commented on August 11, 2024

This looks to be the same error as ComparativeGenomicsToolkit/cactus#1416 and possibly ComparativeGenomicsToolkit/cactus#1402

I'm looking into it now, and it seems to be caused by:

  • vg deconstruct writes genotype as .
  • then vcfbub replaces the . genotype with a completely empty column, producing an invalid VCF.

It seems strange that this error is only now coming up, as vg deconstruct and vcfbub haven't changed much at all lately (tho deconstruct will be very refactored in the next vg release). Update I just noticed the original issue here is a year old -- makes more sense!

I am going to double-check the deconstruct end today (I think the . genotype is coming from its conflict resolution and is probably by design). But it seems like there is a bug in vcfbub (by way of the api its using to write VCF) that by stripping . genotype columns produces invalid VCF. @ekg @AndreaGuarracino let me know if you want some data to reproduce.

from pggb.

ekg avatar ekg commented on August 11, 2024

Would love data please send!

from pggb.

glennhickey avatar glennhickey commented on August 11, 2024

Here is a VCF file that vcfbub invalidates by virtue of erasing the sample column for records where the GT is .

wget -q http://public.gi.ucsc.edu/~hickey/debug/region.vcf.gz
zcat region.vcf.gz |awk '{print $1 "\t" $2 "\t" $9 "\t" $10}' | tail -5
NC_054371.1     30059010        GT      1
NC_054371.1     30059019        GT      .
NC_054371.1     30059027        GT      1
NC_054371.1     30059035        GT      1
NC_054371.1     30059046        GT      .

vcfbub --input region.vcf.gz --max-ref-length 100000 --max-level 0 > region.bub.vcf
tail -5 region.bub.vcf
cat region.bub.vcf  | awk '{print $1 "\t" $2 "\t" $9 "\t" $10}' | tail -5
NC_054371.1     30059010        GT      1
NC_054371.1     30059019        GT
NC_054371.1     30059027        GT      1
NC_054371.1     30059035        GT      1
NC_054371.1     30059046        GT
bcftools view region.vcf.gz > /dev/null
# fine

bcftools view region.bub.vcf > /dev/null
[E::vcf_parse_format_empty1] FORMAT column with no sample columns starting at NC_054371.1:30057051
[E::vcf_parse_format_empty1] FORMAT column with no sample columns starting at NC_054371.1:30057221
[E::bcf_write] Broken VCF record, the number of columns at NC_054371.1:30057051 does not match the number of samples (0 vs 1)
[main_vcfview] Error: cannot write to (null)

from pggb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.