Comments (7)
Hi @jdamas13, can you confirm you are using vg
1.40.0?
from pggb.
For clarity, this is not the most recent vg version. There has been a regression in vg deconstruct in recent versions, and only a specific range of versions, ending at 1.40.0, will work.
from pggb.
Hi, I was using the nf-core/pangenome dev docker container, which has vg: variation graph tool, version v1.40.0 "Suardi".
from pggb.
I got the same error using latest singularity version.
vg deconstruct -P Cantata -H # -e -a -t 4 community.9/pg2-pg5_prefixed-50kb.community.9.fa.gz.bf3285f.11fba48.867196c.smooth.final.gfa
457.70s user 15.60s system 222% cpu 213.00s total 2875776Kb max memory
[vg::deconstruct] decompose VCF
vcfwave 1.0.7 processing...
error: more sample names in header than sample fields
samples: PG5
line: Cantata#1#Scf9YQZ_25_HRSCAF_39 19 >1823810>1823812 CC C 60.0 . AC=0;AF=0;AN=0;AT=<1823812<1823811<1823810,<1823812<1823810;NS=0;LV=0 GT
Command exited with non-zero status 1
vg:
version: v1.40.0
deconstruct: Cantata:1000
reporting:
version: v1.21
multiqc: true
from pggb.
This looks to be the same error as ComparativeGenomicsToolkit/cactus#1416 and possibly ComparativeGenomicsToolkit/cactus#1402
I'm looking into it now, and it seems to be caused by:
vg deconstruct
writes genotype as.
- then
vcfbub
replaces the.
genotype with a completely empty column, producing an invalid VCF.
It seems strange that this error is only now coming up, as vg deconstruct
and vcfbub
haven't changed much at all lately (tho deconstruct
will be very refactored in the next vg release). Update I just noticed the original issue here is a year old -- makes more sense!
I am going to double-check the deconstruct
end today (I think the . genotype is coming from its conflict resolution and is probably by design). But it seems like there is a bug in vcfbub
(by way of the api its using to write VCF) that by stripping .
genotype columns produces invalid VCF. @ekg @AndreaGuarracino let me know if you want some data to reproduce.
from pggb.
Would love data please send!
from pggb.
Here is a VCF file that vcfbub
invalidates by virtue of erasing the sample column for records where the GT is .
wget -q http://public.gi.ucsc.edu/~hickey/debug/region.vcf.gz
zcat region.vcf.gz |awk '{print $1 "\t" $2 "\t" $9 "\t" $10}' | tail -5
NC_054371.1 30059010 GT 1
NC_054371.1 30059019 GT .
NC_054371.1 30059027 GT 1
NC_054371.1 30059035 GT 1
NC_054371.1 30059046 GT .
vcfbub --input region.vcf.gz --max-ref-length 100000 --max-level 0 > region.bub.vcf
tail -5 region.bub.vcf
cat region.bub.vcf | awk '{print $1 "\t" $2 "\t" $9 "\t" $10}' | tail -5
NC_054371.1 30059010 GT 1
NC_054371.1 30059019 GT
NC_054371.1 30059027 GT 1
NC_054371.1 30059035 GT 1
NC_054371.1 30059046 GT
bcftools view region.vcf.gz > /dev/null
# fine
bcftools view region.bub.vcf > /dev/null
[E::vcf_parse_format_empty1] FORMAT column with no sample columns starting at NC_054371.1:30057051
[E::vcf_parse_format_empty1] FORMAT column with no sample columns starting at NC_054371.1:30057221
[E::bcf_write] Broken VCF record, the number of columns at NC_054371.1:30057051 does not match the number of samples (0 vs 1)
[main_vcfview] Error: cannot write to (null)
from pggb.
Related Issues (20)
- Parameters optimization
- [help] My reference genome of a diploid organism is a primary assembly HOT 3
- Gradual increase of pan-genomes HOT 1
- computational efficiency of pggb HOT 3
- container creation failed
- Command terminated by signal 6 HOT 4
- Setting the poa-length-target(s)
- A null vcf file HOT 3
- what dose "consensus path" generated with parameter -Q mean HOT 2
- pggb: option requires an argument -- 'i' HOT 1
- the effect of `n_mappings`
- For aligning chromosomes of different species over 100MYA or div is it better to use .masked files?
- Memory size required to build large genomes HOT 14
- run without errors and outputs HOT 7
- erro about paf2net.py HOT 2
- Recommended practices for downstream analyses
- mismatched line lengths at line 3 within sequence
- PGGB singularity vg deconstruct not recognizing sample prefix
- Unable to generate giraffe indexes for PGGB graph
- wfmash to speed up HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pggb.