brentp / vcfanno Goto Github PK

View Code? Open in Web Editor NEW

356.0 26.0 56.0 37.99 MB

annotate a VCF with other VCFs/BEDs/tabixed files

Home Page: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0973-5

License: MIT License

Go 63.37% Python 13.95% Shell 11.19% Lua 11.01% Makefile 0.48%

bioinformatics vcf annotation genomics

vcfanno's Introduction

vcfanno

If you use vcfanno, please cite the paper

Overview

vcfanno allows you to quickly annotate your VCF with any number of INFO fields from any number of VCFs or BED files. It uses a simple conf file to allow the user to specify the source annotation files and fields and how they will be added to the info of the query VCF.

For VCF, values are pulled by name from the INFO field with special-cases of ID and FILTER to pull from those VCF columns.
For BED, values are pulled from (1-based) column number.
For BAM, depth (count), "mapq" and "seq" are currently supported.

vcfanno is written in go and it supports custom user-scripts written in lua. It can annotate more than 8,000 variants per second with 34 annotations from 9 files on a modest laptop and over 30K variants per second using 12 processes on a server.

We are actively developing vcfanno and appreciate feedback and bug reports.

Usage

After downloading the binary for your system (see section below) usage looks like:

  ./vcfanno -lua example/custom.lua example/conf.toml example/query.vcf.gz

Where conf.toml looks like:

[[annotation]]
file="ExAC.vcf"
# ID and FILTER are special fields that pull the ID and FILTER columns from the VCF
fields = ["AC_AFR", "AC_AMR", "AC_EAS", "ID", "FILTER"]
ops=["self", "self", "min", "self", "self"]
names=["exac_ac_afr", "exac_ac_amr", "exac_ac_eas", "exac_id", "exac_filter"]

[[annotation]]
file="fitcons.bed"
columns = [4, 4]
names=["fitcons_mean", "lua_sum"]
# note the 2nd op here is lua that has access to `vals`
ops=["mean", "lua:function sum(t) local sum = 0; for i=1,#t do sum = sum + t[i] end return sum / #t end"]

[[annotation]]
file="example/ex.bam"
names=["ex_bam_depth"]
fields=["depth", "mapq", "seq"]
ops=["count", "mean", "concat"]

So from ExAC.vcf we will pull the fields from the info field and apply the corresponding operation from the ops array. Users can add as many [[annotation]] blocks to the conf file as desired. Files can be local as above, or available via http/https.

See the additional usage section at the bottom for more.

Example

The example directory contains the data and conf for a full example. To run, download the appropriate binary for your system.

Then, you can annotate with:

./vcfanno -p 4 -lua example/custom.lua example/conf.toml example/query.vcf.gz > annotated.vcf

An example INFO field row before annotation (pos 98683):

AB=0.282443;ABP=56.8661;AC=11;AF=0.34375;AN=32;AO=45;CIGAR=1X;TYPE=snp

and after:

AB=0.2824;ABP=56.8661;AC=11;AF=0.3438;AN=32;AO=45;CIGAR=1X;TYPE=snp;AC_AFR=0;AC_AMR=0;AC_EAS=0;fitcons_mean=0.061;lua_sum=0.061

Typecasting values

By default, using ops of mean,max,sum,div2 or min will result in type=Float, using self will get the type from the annotation VCF and other fields will have type=String. It's possible to add field type info to the field name. To change the field type add _int or _float to the field name. This suffix will be parsed and removed, and your field will be of the desired type.

Operations

In most cases, we will have a single annotation entry for each entry (variant) in the query VCF, in which case the self op is the best choice. However, it is possible that there will be multiple annotations from a single annotation file--in this case, the op determines how the many values are reduced. Valid operations are:

lua:$lua // see section below for more details
self // pull directly from the annotation and handle multi-allelics
concat // comma delimited list of output
count // count the number of overlaps
div2 // given two values a and b, return a / b
first // take only the first value
flag // presense/absence via VCF flag
max // numbers only
mean // numbers only
min // numbers only
sum // numbers only
uniq // comma-delimited list of uniq values
by_alt // comma-delimited by alt (Number=A), pipe-delimited (|) for multiple annos for the same alt.

There are some operations that are only for postannotation:

delete // remove fields from the query VCF's INFO
setid // set the ID file of the query VCF with values from its INFO

In nearly all cases, if you are annotating with a VCF, use self

Note that when the file is a BAM, the operation is determined by the field name ('seq', 'mapq', 'DP2', 'coverage' are supported).

PostAnnotation

One of the most powerful features of vcfanno is the embedded scripting language, lua, combined with postannotation. [[postannotation]] blocks occur after all the annotations have been applied. They are similar, but in the fields column, they request a number of columns from the query file (including the new columns added in annotation). For example if we have AC and AN columns indicating the alternate count and the number of chromosomes, respectively, we could create a new allele frequency column, AF, with this block:

[[postannotation]]
fields=["AC", "AN"]
op="lua:AC / AN"
name="AF"
type="Float"

where type is one of the types accepted in VCF format, name is the name of the field that is created, fields indicates the fields (from the INFO) that will be available to the op, and op indicates the action to perform. This can be quite powerful. For an extensive example that demonstrates the utility of this type of approach, see docs/examples/clinvar_exac.md.

A user can set the ID field of the VCF in a [[postannotation]] block by using name=ID. For example:

[[postannotation]]
name="ID"
fields=["other_field", "ID"]
op="lua:other_field .. ';' .. ID"
type="String"

will take the value in other_field, concatenate it with the existing ID, and set the ID to that value.

see the setid function in examples/custom.lua for a more robust method of doing this.

Additional Usage

-ends

For annotating large variants, such as CNVs or structural variants (SVs), it can be useful to annotate the ends of the variant in addition to the region itself. To do this, specify the -ends flag to vcfanno. e.g.:

vcfanno -ends example/conf.toml example/query.vcf.gz

In this case, the names field in the conf file contains "fitcons_mean". The output will contain fitcons_mean as before along with left_fitcons_mean and right_fitcons_mean for any variants that are longer than 1 base. The left end will be for the single-base at the lowest base of the variant and the right end will be for the single base at the higher numbered base of the variant.

-permissive-overlap

By default, when annotating with a variant, in addition to the overlap requirement, the variants must share the same position, the same reference allele and at least one alternate allele (this is only used for variants, not for BED/BAM annotations). If this flag is specified, only overlap testing is used and shared REF/ALT are not required.

-p

Set to the number of processes that vcfanno can use during annotation. vcfanno parallelizes well up to 15 or so cores.

-lua

Custom in ops (lua). For use when the built-in ops don't supply the needed reduction.

We embed the lua engine go-lua so that it's possible to create a custom op if it is not provided. For example if the user wants to

"lua:function sum(t) local sum = 0; for i=1,#t do sum = sum + t[i] end return sum end"

where the last value (in this case sum) is returned as the annotation value. It is encouraged to instead define lua functions in a separate .lua file and point to it when calling vcfanno using the -lua flag. So, in an external file, "some.lua", instead put:

function sum(t)
    local sum = 0
    for i=1,#t do
        sum = sum + t[i]
    end
    return sum
end

And then the above custom op would be: "lua:sum(vals)". (note that there's a sum op provided by vcfanno which will be faster).

The variables vals, chrom, start, stop, ref, alt from the currently variant will all be available in the lua code. alt will be a table with length equal to the number of alternate alleles. Example usage could be:

op="lua:ref .. '/' .. alt[1]"

See example/conf.toml and example/custom.lua for more examples.

Mailing List

Installation

Please download a static binary (executable) from here and copy it into your '$PATH'. There are no dependencies.

If you use bioconda, you can install with: conda install -c bioconda vcfanno

Multi-Allelics

A multi-allelic variant is simply a site where there are multiple, non-reference alleles seen in the population. These will appear as e.g. REF="A", ALT="G,C". As of version 0.2, vcfanno will handle these fully with op="self" when the Number from the VCF header is A (Number=A)

For example this table lists Alt columns query and annotation (assuming the REFs and position match) along with the values from the annotation, and shows how the query INFO will be filled:

query ALTS	anno ALTS	anno vals from INFO	result
C,G	C,G	22,23	22,23
C,G	C,T	22,23	22,.
C,G	T,G	22,23	.,23
G,C	C,G	22,23	23,22
C,G	C	YYY	YYY,.
G,C,T	C	YYY	.,YYY,.
C,T	G	YYY	.,.
T,C	C,T	AA,BB	BB,AA

Note the flipped values in the result column, and that values that are not present in the annotation are filled with '.' as a place-holder.

vcfanno's People

Contributors

Stargazers

Watchers

vcfanno's Issues

lua function on samples

Do you have any examples on how to call sample GT or PL format tags using a custom lua function? I'm looking for a way to annotation variants that are consistent with affection from a ped file. A binary flag or a super basic LOD score equivalent based on the genotype likelihoods is the plan.

Float type with value "-0" in annoation VCF file becomes "-" in annotated VCF file

In ExAC database, variants with InbreedingCoeff=-0; becomes InbreedingCoeff=-; in the annotated VCF file. This causes later conversion to float giving an error message like "can't convert - to float"

use Number from source file when op == "self"

currently, we always use Number=1 except for Flag which has Number=0 by definition. When the user specifies op="self", we should propagate the Number from the source header definition to the annotated VCF.

vcf - annotate by column possible?

Hi,

thanks for the nice program and the nice easy conda installation.

I have a public VCF where I am interested in annotating my VCF using the publc ID field (rsID).

public VCF
#CHROM POS ID REF ALT QUAL FILTER INFO
1 4441 rs199073068 C T . . dbSNP_138;TSA=SNV;VE=intergenic_variant
1 4785 rs199337820 A G . . dbSNP_138;TSA=SNV;VE=intergenic_variant

my toml (are columns 0 or 1 based? )

testconf.toml
[[annotation]]
file="/home/colin/seqres/RN/var/Rattus_norvegicus_incl_consequences.vcf.gz"
columns = [2]
names = ["ID"]
ops=["first"]

Command:
vcfanno testconf.toml ../combined.vcf > combined_annot.vcf

So I just want to lift over the rsids (to start with, maybe INFO fields later) into my VCF. Is this possible ?

Thanks,
Colin

Index out of range

I am trying to use vcfanno with kaviar database and I get 'panic: runtime error: index out of range'. Is this due to size of the database?

Here is the conf:

[[annotation]]
file="/PublicData/kaviar/Kaviar-150810-Public-hg19.vcf"
fields = ["AF"]
ops=["first"]

Here is the log:

vcfanno version 0.0.7 [built with go1.5beta3]

see: https://github.com/brentp/vcfanno

2015/11/02 10:41:20 found 1 sources from 1 files
panic: runtime error: index out of range

goroutine 5 [running]:
github.com/brentp/vcfgo.(*Variant).End(0xc8210c0960, 0x0)
/usr/local/src/gocode/src/github.com/brentp/vcfgo/variant.go:104 +0xc7b
github.com/brentp/irelate.CheckOverlapPrefix(0x2aaaab4f7000, 0xc8210c0960, 0x2aaaab4f7000, 0xc8210c0a50, 0x0)
/usr/local/src/gocode/src/github.com/brentp/irelate/irelate.go:96 +0x5b
github.com/brentp/irelate.IRelate.func1(0xc8200109c0, 0xbe5d38, 0xbe5d08, 0x0, 0xc820010a20)
/usr/local/src/gocode/src/github.com/brentp/irelate/irelate.go:184 +0x4c9
created by github.com/brentp/irelate.IRelate
/usr/local/src/gocode/src/github.com/brentp/irelate/irelate.go:217 +0xbf

goroutine 1 [chan receive]:
main.main()
/usr/local/src/gocode/src/github.com/brentp/vcfanno/vcfanno.go:131 +0x1385

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1696 +0x1

goroutine 21 [chan send]:
github.com/brentp/irelate.StreamVCF.func1(0xc8203099c0, 0xc8203687e0)
/usr/local/src/gocode/src/github.com/brentp/irelate/vcf.go:53 +0x10b
created by github.com/brentp/irelate.StreamVCF
/usr/local/src/gocode/src/github.com/brentp/irelate/vcf.go:61 +0x5d

goroutine 22 [runnable]:
github.com/brentp/vcfgo.(*Reader).Read(0xc820309c80, 0xc820368b40)
/usr/local/src/gocode/src/github.com/brentp/vcfgo/reader.go:187 +0x89b
github.com/brentp/irelate.StreamVCF.func1(0xc820309c80, 0xc820368b40)
/usr/local/src/gocode/src/github.com/brentp/irelate/vcf.go:49 +0x2e
created by github.com/brentp/irelate.StreamVCF
/usr/local/src/gocode/src/github.com/brentp/irelate/vcf.go:61 +0x5d

goroutine 3 [chan receive]:
github.com/brentp/vcfanno/api.(_Annotator).Annotate.func1(0xc8201f5ab0, 0x2, 0x2, 0xc820010960, 0xc820178e40, 0x0, 0x0)
/usr/local/src/gocode/src/github.com/brentp/vcfanno/api/api.go:421 +0x9f
created by github.com/brentp/vcfanno/api.(_Annotator).Annotate
/usr/local/src/gocode/src/github.com/brentp/vcfanno/api/api.go:426 +0xb2

goroutine 4 [runnable]:
github.com/brentp/irelate.Merge.func1(0xc82000e480, 0xc8200109c0, 0xc82000a1b0, 0x0, 0xc8201f5ab0, 0x2, 0x2, 0x0)
/usr/local/src/gocode/src/github.com/brentp/irelate/irelate.go:247 +0x1ba
created by github.com/brentp/irelate.Merge
/usr/local/src/gocode/src/github.com/brentp/irelate/irelate.go:282 +0x3b2

CSQ tag

Hi,

I am trying to run vcfanno (v 0.0.11) against multiple VCF files using a query VCF coming from VEP annotation (Variant Effect Predictor). In the VEP output VCF file the CSQ tag is the only INFO tag, which is a String of potentially multiple values, in which each element contains fields separated by the pipe '|'. e.g. "CSQ= T|stop_gained|HIGH|ISG15|ENSG00000187608||||||,T|upstream_gene_variant|MODIFIER||||"

When I am running vcfanno using the VEP VCF file as query, the output VCF has the pipe separators of the CSQ tag converted into commas, which is quite inconvenient

e.g. after vcfanno: "CSQ= T,stop_gained,HIGH,ISG15,ENSG00000187608,,,,,,T,upstream_gene_variant,MODIFIER,,,,,"

I have made a work-around to get it working, but thought I'd check if you have any ideas about the reason for why it alters the query in this manner.

best,
Sigve

Does VCF anno strip FORMAT and GT fields after running?

I ran vcfanno on NA12878 for a small region. The FORMAT and GT fields are stripped in the result file. Is this the desired behavior of the software? Here is before decoration:

chr22   42128241        .       CT      C       0       PASS    metal=platinum;isaac2=HD:0,LOOHD:0;bwa_freebayes=HD:0,LOOHD:0;bwa_platypus=HD:0,LOOHD:0;bwa_gatk3=HD:0,LOOHD:0;dist2closest=452 GT      0|1

Here is after:

chr22   42128241        .       CT      C       0.0     PASS    metal=platinum;isaac2=HD:0,LOOHD:0;bwa_freebayes=HD:0,LOOHD:0;bwa_platypus=HD:0,LOOHD:0;bwa_gatk3=HD:0,LOOHD:0;dist2closest=452;RSID=rs35742686;CYPALLELE=*3;HGVSc=NM_000106.5:c.2549delA

New release

Hi Brent,

Sorry to be that guy but any shot at a new release? We have some code using the new REF/ALT stuff but right now are grabbing the develop version by hand.

annotate with multisample vcf

Hi Brent,

I'm looking to annotate an unannotated vcf file with allele frequencies from an existing multisample vcf.
Is this something that's possible with vcfanno (I suppose so).

Would you be able to help with the configuration? I can't quite wrap my head around how calculate the allele frequency from the multisample vcf and add it to the new vcf.

EDIT: actually, I'd just like to add a field that contains in how many samples of the multisample vcf a certain variant is found.
Thanks
M

Problem with VCF file converted from BED

I have a bed file and converted this into a fake VCF file like this

fileformat=VCFv4.0

CHROM POS REF ALT INFO

1 762320 T C AC=10
1 783071 T C AC=10
1 879400 T C AC=10

and now I am trying to use VCFAnno on this and getting error as follows.

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

panic: runtime error: slice bounds out of range

goroutine 53 [running]:
panic(0x80c7e0, 0xc420014140)
/usr/local/src/go-git/src/runtime/panic.go:500 +0x1a1
github.com/brentp/vcfgo.makeFields(0xc4218c3600, 0x12, 0x13, 0x13, 0x13, 0x0)
/usr/local/src/gocode/src/github.com/brentp/vcfgo/reader.go:156 +0x229
github.com/brentp/vcfgo.(*Reader).Read(0xc420502bc0, 0x0)
/usr/local/src/gocode/src/github.com/brentp/vcfgo/reader.go:184 +0x107
github.com/brentp/irelate/parsers.vWrapper.Next(0xc420502bc0, 0x0, 0xfa0, 0xc424376000, 0x0)
/usr/local/src/gocode/src/github.com/brentp/irelate/parsers/vcf.go:71 +0x2f
github.com/brentp/irelate.PIRelate.func4(0x1f40, 0xa59ca0, 0xc420502bc0, 0x8b0e00, 0x4e20, 0xc421ca8240, 0xc420502c80, 0x4, 0x4, 0xc420067900, ...)
/usr/local/src/gocode/src/github.com/brentp/irelate/parallel.go:283 +0x14a
created by github.com/brentp/irelate.PIRelate
/usr/local/src/gocode/src/github.com/brentp/irelate/parallel.go:346 +0x252

I am not sure why this is throwing this error. Can you please help on this.

fault tolerance

I'm debating a single run per annotation vs multiple. Is vcfanno fault tolerant to failed steps i.e. for some reason an annotation was updated and it changed a tag name.

Error in cadd2vcf.py: UnboundLocalError: local variable 'header' referenced before assignment

I tried running cadd2vcf.py on whole_genome_SNVs.tsv.compressed.gz that comes with the gemini installation, but I am getting an error:

python cadd2vcf.py whole_genome_SNVs.tsv.compressed.gz | bgzip -c > cadd_v1.3.vcf.gz

Traceback (most recent call last):
  File "cadd2vcf.py", line 36, in <module>
    main(a.precision, a.path)
  File "cadd2vcf.py", line 20, in main
    d = dict(zip(header, line.rstrip().split("\t")))
UnboundLocalError: local variable 'header' referenced before assignment

P.S: I had to fix print to print( as well.

REF/ALTs

Hi Brent,

Is there a way I can access the REF and ALT fields from the variant that is going to be updated in the lua scripts? I was hoping to annotate with a BED file by position but then make some decisions about whether or not to go ahead with the annotation based on REF/ALT information, namely is it a SNP and is it a specific mutation.

Write into the ID column

Hi Brent,

Thank you for vcfanno, it's really great to be able to delegate nearly all VCF post-processing job to one fast tool.

I wonder if there a way to put an annotation value into the ID column in VCF. I used to add the rsIDs and COSMIC ids into the ID column, and I don't seem to find a way to configure vcfanno do that instead of appending to INFO.

As a quick solution, I could just utilise bcftools to remap the INFO field over to ID. But it would be much better to keep all annotation within on vcfanno call with a proper config file.

What do you think about this?

vcfanno fails to annotate variant(s) in my VCF present in ExAC VCF

I am getting some strange behavior with vcfanno (the vcfanno_0.0.3_linux_386 executable). Its not annotating variants in my VCF that are present in the VCF in my conf.toml file.

I have several variants I pulled from Clinvar via NCBI eUtils API and put in VCF format. I wanted to see if they were segregating in ExAC. I normalized the ExAC VCF with vt and my VCF which I want to annotate with information from ExAC VCF.

My toml.conf file looks like:

file="/home/aindap/data/ExAC/ExAC.r0.3.sites.vep.vcf.normalized.vcf.gz"
fields = ["AF"]
ops=["first"]

Then I ran vcfanno:

$HOME/software/vcfanno_0.0.3_linux_386/vcfanno conf.toml my.variants.20150527.normalized.vcf > my.normalized.exac_maf.vcf

There several variants that I expected to be annotate with AF info from ExAC, but the resulting vcf has several records that are in ExAC, but not decorated with AF info. For example I had this record in my vcf:

#fileformat=VCFv4.0
##fileDate=20150505
##reference=GRCh37
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>
##contig=<ID=HSCHR6_MHC_SSTO,length=4789211>
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=dbSNP,Number=1,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=OMIM,Number=1,Type=String,Description="OMIM id">
##INFO=<ID=CLINSIG,Number=.,Type=String,Description="Variant Clinical Significance,ftp://ftp.ncbi.nlm.nih.gov/pub/GTR/standard_terms/Clinical_significance.txt">
##INFO=<ID=dbSNP,Number=1,Type=String,Description="dbSNP ID (i.e. rs number)">
##INFO=<ID=TRAIT,Number=1,Type=String,Description="disease trait">
##INFO=<ID=VT,Number=1,Type=String,Description="Variation Class">
##INFO=<ID=GENE,Number=1,Type=String,Description="Gene symbol(s) comma-delimited">
4       5577974 c.3025C>T       G       A       .       .       OMIM=607261.0007;dbSNP=rs137852927;GENE=EVC2;CLINSIG=Pathogenic;TRAIT="Chondroectodermaldysplasia";VT=singlenucleotidevariant

But its clearly in the ExAC vcf listed in my conf.toml file:

tabix /home/aindap/data/ExAC/ExAC.r0.3.sites.vep.vcf.normalized.vcf.gz 4:5577974-5577974

4   5577974 rs137852927 G   A   17524.2 PASS    AC=10;AC_AFR=0;AC_AMR=0;AC_Adj=10;AC_EAS=0;AC_FIN=0;AC_Het=10;AC_Hom=0;AC_NFE=10;AC_OTH=0;AC_SAS=0;AF=8.236e-05;AN=121412;AN_AFR=10404;AN_AMR=11578;AN_Adj=121400;AN_EAS=8654;AN_FIN=6614;AN_NFE=66730;AN_OTH=908;AN_SAS=16512;BaseQRankSum=-4.686;ClippingRankSum=-0.016;DB;DP=2164728;FS=0.591;GQ_MEAN=73.01;GQ_STDDEV=27.38;Het_AFR=0;Het_AMR=0;Het_EAS=0;Het_FIN=0;Het_NFE=10;Het_OTH=0;Het_SAS=0;Hom_AFR=0;Hom_AMR=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom_SAS=0;InbreedingCoeff=-0.0001;MQ=59.68;MQ0=0;MQRankSum=0.596;NCC=0;QD=11.36;ReadPosRankSum=0.625;VQSLOD=2.42;culprit=MQRankSum;DP_HIST=0|6|6|2|7536|25144|13164|2809|1304|1202|1280|1383|1370|1208|1062|872|659|456|326|917,0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|9;GQ_HIST=0|0|0|0|7|2|3|1|1|2|1|1|34346|9449|2160|1768|751|272|350|11592,0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|10;CSQ=A|ENSG00000173040|ENST00000475313|Transcript|stop_gained&NMD_transcript_variant|3757|3025|1009|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|nonsense_mediated_decay|||ENSP00000431981|LBN_HUMAN|Q4W5A4_HUMAN|UPI000000D836|||18/23|||ENST00000475313.1:c.3025C>T|ENSP00000431981.1:p.Gln1009Ter||||||A:0|A:0.000233|pathogenic||||||||||,A|ENSG00000173040|ENST00000344938|Transcript|stop_gained|3319|3265|1089|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|protein_coding|||ENSP00000339954|LBN_HUMAN|Q4W5A4_HUMAN|UPI000021C4F1|||18/22|||ENST00000344938.1:c.3265C>T|ENSP00000339954.1:p.Gln1089Ter||||||A:0|A:0.000233|pathogenic|||||||POSITION:0.872761293771719&ANN_ORF:539.654&MAX_ORF:539.654|||HC,A|ENSG00000173040|ENST00000509670|Transcript|3_prime_UTR_variant&NMD_transcript_variant|3358|||||rs137852927&CM024156|1||-1|EVC2|HGNC|19747|nonsense_mediated_decay|||ENSP00000423876||Q4W5A4_HUMAN&E9PFT2_HUMAN|UPI0001D3B0F7|||19/23|||ENST00000509670.1:c.*1658C>T|||||||A:0|A:0.000233|pathogenic||||||||||,A|ENSG00000173040|ENST00000310917|Transcript|stop_gained|3757|3025|1009|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|protein_coding||CCDS54718.1|ENSP00000311683|LBN_HUMAN|Q4W5B1_HUMAN&Q4W5A4_HUMAN|UPI000006DE35|||18/22|||ENST00000310917.2:c.3025C>T|ENSP00000311683.2:p.Gln1009Ter||||||A:0|A:0.000233|pathogenic|||||||POSITION:0.820450230539734&ANN_ORF:539.654&MAX_ORF:539.654|||HC,A|ENSG00000173040|ENST00000344408|Transcript|stop_gained|3319|3265|1089|Q/*|Cag/Tag|rs137852927&CM024156|1||-1|EVC2|HGNC|19747|protein_coding|YES|CCDS3382.2|ENSP00000342144|LBN_HUMAN|Q4W5B1_HUMAN&Q4W5A4_HUMAN|UPI00001910B5|||18/22|||ENST00000344408.5:c.3265C>T|ENSP00000342144.5:p.Gln1089Ter||||||A:0|A:0.000233|pathogenic|||||||POSITION:0.831423478482302&ANN_ORF:539.654&MAX_ORF:539.654|||HC```

The program isn't performing as I expected. Do you have any debugging suggestions? I would really like to incorporate vcfanno in my pipeline. Thanks!

Type mismatch for 'shared.Annotation.Columns': Expected integer but found 'string'.

I am using vcfanno v0.0.11 and the cosmic annotation that comes with gemini installation v0.18.3:

I have the snpEff annotated .gvcf file here:
https://drive.google.com/open?id=0B-8gQV1WZcYdR0J1TWVkYS1QMVk

command:

vcfanno -p 4 -base-path /path/to/gemini_data -lua vcfanno.lua vcfanno.conf sample_new.snpeff.gvcf | bgzip -c > sample_new.snpeff.gvcf.anno.gz

vcfanno.conf:

[[annotation]]
file = "cosmic-v68-GRCh37.tidy.vcf.gz"
columns = ["ID", "AA", "CDS", "GENE", "STRAND"]
names = ["cosmic_id", "cosmic_aa", "cosmic_cds", "cosmic_gene", "cosmic_strand"]
ops = ["self", "self", "self", "self", "self"]

cosmic:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       69345   COSM911918      C       A       .       .       AA=p.I85I;CDS=c.255C>A;CNT=1;GENE=OR4F5;STRAND=+
1       69523   COSM426644      G       T       .       .       AA=p.G145C;CDS=c.433G>T;CNT=1;GENE=OR4F5;STRAND=+
1       69538   COSM75742       G       A       .       .       AA=p.V150M;CDS=c.448G>A;CNT=1;GENE=OR4F5;STRAND=+
1       69539   COSM1343690     T       C       .       .       AA=p.V150A;CDS=c.449T>C;CNT=1;GENE=OR4F5;STRAND=+

vcfanno.lua (although I don't really need the lua):

function divfunc(a, b)
    if(a == 0) then 
        return 0.0
    else 
        return string.format("%.9f", a / b)
    end
end

Error:

=============================================
vcfanno version 0.0.11-beta [built with devel +5ec87ba Thu Apr 28 15:36:34 2016 +0000]

see: https://github.com/brentp/vcfanno
=============================================
panic: Type mismatch for 'shared.Config.Annotation': Type mismatch for 'shared.Annotation.Columns': Expected integer but found 'string'.

goroutine 1 [running]:
panic(0x7c1340, 0xc8201621f0)
    /usr/local/src/go-git/src/runtime/panic.go:500 +0x18c
main.main()
    /usr/local/src/gocode/src/github.com/brentp/vcfanno/vcfanno.go:81 +0x17f5

If you have encountered an error, please include:

minimal conf and lua files that you are using.
urls or actual files for annotations in conf file.
minimal query file.
the command you used to invoke vcfanno
the full error message

profile blocking

There are a lot of moving parts in the parallelism. Would be good to see where things are blocking to know where the bottlenecks are.
https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs

Append to existing INFO tag

Hi,

Is there an operation that allows you to append an annotation to an already existing tag in your query VCF. Let's say you have an ID tag in your query VCF, and there exists an ID tag in the annotation VCF also. Currently, it appears as if the query VCF tag is overwritten if it appears in the annotation VCF. Is there functionality to append/aggregate tag values if an existing tag is already present in your query?

best,
Sigve

"go build" with freshly cloned repo: irelate go package error

Running go 1.5.3 and 1.6 I'm getting the following errors when trying to build/install what's in master for vcfanno:

mba-2:vcfanno romanvg$ go build -o vcfanno
# github.com/brentp/irelate/parsers
../../go/src/github.com/brentp/irelate/parsers/bam.go:196: assignment count mismatch: 2 = 1
../../go/src/github.com/brentp/irelate/parsers/bam.go:198: undefined: index.ErrInvalid

mba-2:vcfanno romanvg$ go get github.com/brentp/irelate
# github.com/brentp/irelate/parsers
../../go/src/github.com/brentp/irelate/parsers/bam.go:196: assignment count mismatch: 2 = 1
../../go/src/github.com/brentp/irelate/parsers/bam.go:198: undefined: index.ErrInvalid

INFO fields with multiple values

Hi,

I just played with your tool, great work:) Looking at the result from a test I did, annotating ~ 100,000 variants against 6-7 other VCF files, there were a few things that caught my attention:

If my annotation file had an INFO field with multiple values (i.e. "Number=.", in which multiple values are being comma-separated for each variant), I could not figure out which operation was best to retrieve the complete set of values. I tried 'uniq' and 'concat', but either way it seems vcfanno concatenates the values with the pipe operator ('|'). Would it be possible to get the identical comma-separated as is present in the annotation VCF file?
Would you consider adding the meta-information lines concerning the INFO fields of interest that you specify in the configuration file in the result VCF?

api.go:655: tabix: name count mismatch:

I have been straggling to annotate my .vcf file with .bed file, which has a lot of HLA and other "variable" chromosomes, like 'chr1_KI270713v1_random'. I was not able to sort this .bed file with sortBed program and index it afterwards, so I used Unix sort commands, and after that I was able to run tabix to get .tbi. Still vcfanno throws en error.

Command:

vcfanno test.toml ../vcf/1234567_1_l_r_m_c_lib_g-vardict.vcf > ../../temp/test_annotation.vcf

Configuration test.toml file

[[annotation]]
file="./sv_manual_sort.bed.gz"
columns=[4]
names=["problematic_region"]
ops=["uniq"]

Error:

=============================================
vcfanno version 0.2.3 [built with go1.8]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:114: found 1 sources from 1 files
api.go:655: tabix: name count mismatch: 48153 != 3365

I have attached .bed and .tbi files
attachment.zip

Collecting GRCh38 annotations

Hi,

As far as I understand, if I were to use GRCh38 reference for my VCF, the corresponding hg38 annotations would have to be downloaded and probably saved in the /path/to/gemini/data/gemini_data/ which is path to all default hg19 annotations. Then I would use it with vcfanno, like this:

# download hg38 clinvar annotation
export GEMINI_ANNO=/path/to/gemini_data/hg38_annotations
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar_20160531.vcf.gz -O $GEMINI_ANNO/clinvar_20160531.vcf.gz

# run vcfanno
export VCF=/path/to/snpeff-annotated-sample.vcf
export lua=/path/to/test.lua
export conf=/path/to/conf
vcfanno -p 4 -base-path $GEMINI_ANNO -lua $lua $conf $VCF | bgzip -c > anno.vcf.gz

Is this correct or is there any other (faster & better) way the annotations can be collected and used in vcfanno? When I google, I keep getting references to Annovar but I don't know how to go about it. To get the hg38 clinvar annotation, I tried the command below but I am kind of stuck beyond this:

gemini_conda install -c ggd-alpha hg38-clinvar=20150330.2

This installed a package in /path/to/gemini/data/anaconda. This location is different than the location of default hg19 annotations i.e. /path/to/gemini/data/gemini_data.

# this is the directory structure of the package:
hg38-clinvar-20150330.2-2
    ├── bin
    └── info
        ├── files
        ├── index.json
        ├── recipe
        └── recipe.json

Somewhat similar to this discussion, I noticed that the default hg19 annotations which are obtained with gemini installation have names such as clinvar_20160203.tidy.vcf.gz, cosmic-v68-GRCh37.tidy.vcf.gz and ExAC.r0.3.sites.vep.tidy.vcf.gz. Does tidy mean they have been processed in some way and that the user has to do the same when they download annotations other than hg19?

Thank you for your patience in responding to my questions!

Annotations from Genotype fields

Hey Brent,

I've been testing this out recently and I really love how fast it adds the annotations. I have a particular use case, and it may be something you don't want to support, but I thought I would float it because it could be more generally useful. I've moved my day to day work from Mendelian disease to cancer, and one thing I'm doing a lot of is using multiple variant callers to call variants in our tumour-only samples. I then merge calls and add annotations, but when evaluating variants I want to evaluate things like depth and other QC metrics on a caller-by-caller basis. Right now I get the required info from the original normalized caller-produced VCF using cyvcf2. Most callers stick everything I need in the INFO string fields, but a few only put DP, AD, etc in the genotype field. It's a really specific use case, because you need to generate vcfanno conf files for each sample (although this is easy to script in an automated workflow) but I thought I would suggest it. It seems like vcfanno, if it could parse out those fields, would do my entire annotation workflow much quicker and in fewer overall steps than my current workflow.

provide some method to look at CIPOS and CIEND for SVS

currently, annotations are by the fixed start and end for structural variants. There may be some value in doing something with the confidence intervals as more folks (hopefully) start to use those fields.

Slow annotation with CADD

[[annotation]]
file="/home/mcgaugheyd/CADD/whole_genome_SNVs.tsv.gz"
columns=[5,6]
names=["cadd_raw","cadd_phred"]
ops=["mean","mean"]

This was added to your https://github.com/brentp/vcfanno/blob/master/example/gem.conf. I'm also using your https://github.com/brentp/vcfanno/blob/master/example/custom.lua lua files.

Ended up with 64 annotations across 20 files.

vcfanno_0.1.1_dev -p 10 -lua custom.lua gem.conf  CCGO.b37.bwa-mem.hardFilterSNP-INDEL.VEP.GRCh37.VEP.GRCh37.old.vcf.gz| bgzip > more_anno4.vcf.gz

Commenting out the CADD section increases annotation speed from ~80 to ~1500 variants/second.

(These are exomes)

match ALT in bed/tsv files

Can ALTs be matched when using some form a non VCF annotation? I'm typically annotating VCFs with bed/tsv files at the variant level.

bam: add op to count forward and reverse reads

would be nice to have DP4 (foward and reverse REF and ALT counts) but checking if a read is REF or ALT for a site might be out of the wheelhouse of vcfanno.

Adding ref/alt counts is straight-forward and will go into the next release via:

field = DP2

add DP4

it would be nice to allow op="dp4" for bam annotation files to check strand bias. We can do this with bigly

function in lua file not working

Hi,

I created the following .lua and config files from your examples:

test.lua

function divfunc(a, b)
    if(a == 0) then 
        return 0.0
    else 
        return string.format("%.9f", a / b)
    end
end

test.conf

[[annotation]]
file="ExAC.r0.3.sites.vep.tidy.vcf.gz"
fields = ["AC_Adj", "AN_Adj"]
names = ["exac_ac_all", "exac_an_all"]
ops=["self", "self"]

[[postannotation]]
fields=["exac_ac_all", "exac_an_all"]
name="exac_af_all"
op="lua:divfunc(exac_ac_all/exac_an_all)"
type="Float"

This is my command:

$ vcfanno -p 4 -base-path $GEMINI_ANNO -lua test.lua test.conf $VCF | bgzip -c > anno.vcf.gz

I am getting the following errors and in the output anno.vcf.gz there is no field corresponding to exac_af_all:

=============================================
vcfanno version 0.0.11-beta [built with devel +5ec87ba Thu Apr 28 15:36:34 2016 +0000]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:111: found 2 sources from 1 files
api.go:442: lua error in postannotation exac_af_all <string>:1: cannot perform div operation between table and number
stack traceback:
    <string>:1: in main chunk
    [G]: ?
vcfanno.go:156: Info Error: exac_an_all not found in row >> this error may occur many times. reporting once here...
api.go:442: lua error in postannotation exac_af_all <string>:1: cannot perform div operation between table and number
stack traceback:
    <string>:1: in main chunk
    [G]: ?

**This is repeated several times until finally the output is created.**

Output:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  733434  733459  733482
chr1    17385   .       G       A       183.4   PASS    AC=3;AF=0.5;AN=6;BaseQRankSum=1.45;ClippingRankSum=-1.036;DP=23;FS=0;MLEAC=4;MLEAF=0.667;MQ=48.28;MQRankSum=-1.036;QD=14.11;Rea
dPosRankSum=0.633;SOR=0.836;EFF=INTRON(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000488147|5|A),EXON(MODIFIER|||||MIR6859-1|miRNA|NON_CODING|ENST00000619216|1|A);exa
c_ac_all=1327;exac_an_all=5408    GT:AD:DP:GQ:PL:TP       1|0:9,0:9:0:0,0,230:6   1|1:0,3:3:9:106,9,0:6   1|0:6,4:10:99:107,0,165:6

However, it works alright when I change the test.conf to:

[[annotation]]
file="ExAC.r0.3.sites.vep.tidy.vcf.gz"
fields = ["AC_Adj", "AN_Adj"]
names = ["exac_ac_all", "exac_an_all"]
ops=["self", "self"]

[[postannotation]]
fields=["exac_ac_all", "exac_an_all"]
name="exac_af_all"
op="div2"
type="Float"

Output:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  733434  733459  733482
chr1    17385   .       G       A       183.4   PASS    AC=3;AF=0.5;AN=6;BaseQRankSum=1.45;ClippingRankSum=-1.036;DP=23;FS=0;MLEAC=4;MLEAF=0.667;MQ=48.28;MQRankSum=-1.036;QD=14.11;Rea
dPosRankSum=0.633;SOR=0.836;EFF=INTRON(MODIFIER|||||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000488147|5|A),EXON(MODIFIER|||||MIR6859-1|miRNA|NON_CODING|ENST00000619216|1|A);exa
c_ac_all=1327;exac_an_all=5408;exac_af_all=0.2454 GT:AD:DP:GQ:PL:TP       1|0:9,0:9:0:0,0,230:6   1|1:0,3:3:9:106,9,0:6   1|0:6,4:10:99:107,0,165:6

I am sure I am missing something but I just don't see why it shouldn't work with the function in test.lua?

Closest

Hello,

I wanted to ask if there is a way with vcfanno to annotate a variant with its proximity to a certain feature. Specifically, I would like to know if a variant is close to the intron/exon boundary. So the input would be a bed file with all the ranges of the exons and the output of the annotation would be the distance from the boundry for each variant.

There is something similar to it it bedtools http://bedtools.readthedocs.org/en/latest/content/tools/closest.html

Thanks

'Index out of range' does not point to exact annotation

vcfanno version 0.2.3 [built with go1.8]
In case of having wrong column index in [[annotation]] like:

[[annotation]]
file="data/radar/radar_hg38_v2_sorted.bed.gz"
columns=[5]
names=["radar"]
ops=["uniq"]

,where 5 is more than I have in my .bed file.
The error message:

panic: runtime error: index out of range

goroutine 75 [running]:
github.com/brentp/vcfanno/api.collect(0x7f2faa27e468, 0xc4201ddb90, 0xc422339160, 0x1, 0x1, 0xc420018a00, 0x1, 0x0, 0x1, 0x115a3c24b66, ...)
	/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:309 +0x171c
github.com/brentp/vcfanno/api.(*Annotator).AnnotateOne(0xc42001acc0, 0x92d4e0, 0xc4201ddb90, 0x3a2c01, 0x0, 0x0, 0x0, 0x0, 0x0)
	/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:399 +0x1ed
github.com/brentp/vcfanno/api.(*Annotator).AnnotateEnds(0xc42001acc0, 0x92d4e0, 0xc4201ddb90, 0x0, 0x0, 0x0, 0x0)
	/home/brentp/go/src/github.com/brentp/vcfanno/api/api.go:734 +0xdda
main.main.func1(0x92d4e0, 0xc4201ddb90)
	/home/brentp/go/src/github.com/brentp/vcfanno/vcfanno.go:154 +0x71
github.com/brentp/irelate.PIRelate.func1.1(0xc42020ef60, 0xc4228c4c80, 0x55, 0x190, 0xc422cb9680)
	/home/brentp/go/src/github.com/brentp/irelate/parallel.go:202 +0x5f
created by github.com/brentp/irelate.PIRelate.func1
	/home/brentp/go/src/github.com/brentp/irelate/parallel.go:207 +0x89

Does not point to exact annotation block which should be either corrected or removed. This would be really helpful in case there are a lot of annotation blocks to review.

Multiallelic sites

Hi,

I have multiallelic input that I'd like annotated from data that will have multiple alleles per position. Obviously, I'd like the output the provide the correct values for only my input alleles returned in the correct order for analysis. I see you mention that the ability for vcfanno to handle this type of scenario is forthcoming, but suggest decomposing the inputs for now. However, my downstream application will require those sites to be recomposed before continuing. First, I'd like to make a formal request for this functionality to be completed, but in the meantime, do you have any suggestions for possibly decomposing, running vcfanno, then recomposing to get the same result as annotating multiallelic sites?

Thanks,
Scott

Not handling identical start/end coordinates

If you have encountered an error, please include:

minimal conf and lua files that you are using.
urls or actual files for annotations in conf file.
minimal query file.
the command you used to invoke vcfanno
the full error message

Hi Brent,

vcfanno doesn't overlap coordinates if the start/end coordinates are identical, but bedtools does.
I put up an example here: https://dl.dropboxusercontent.com/u/2822886/start-end-example.tar

Is bedtools or vcfanno doing it right?

-ends only subsets to intervals that were overlapping the original interval

given an interval of e.g. 1234, 5678, and CIPOS=-1000,10, the current version doesnt grab from 234 to 5778.

This is a relatively major bug. It will probably require using the new tabix machinery to address it.

-permissive-overlap per annotation block

Can this flag be applied per annotation or is it global? I'm debating running all annotations versus single annotation for simplified debugging.

Database not getting annotated with adjusted allele count from Exac

This is the conf:

[[annotation]]
file = "ExAC.r0.3.sites.vep.tidy.vcf.gz"
fields = ["AC_Adj", "AN_Adj"]
names = ["exac_ac_all", "exac_an_all"]
ops = ["min", "min"]

[[annotation]]
file = "grin/dbsnp137.coding.variants.sift.prediction.bed.gz"
columns = [14,15,18,19]
names = ["Provean_score","Provean_prediction","SIFT_score","SIFT_prediction"]
ops     = ["self", "self", "self", "self"]

[[annotation]]
file = "ESP6500SI.all.snps_indels.tidy.v2.vcf.gz"
fields = ["FG","AAC", "PH"]
names = ["functionGVS", "aminoAcidChange", "polyPhen"]
ops = ["self", "self", "self"]

[[annotation]]
file = "grin/literaturegenes_GRCh37.bed.gz"
columns = [4]
names = ["literature_genes"]
ops = ["self"]

[[annotation]]
file = "grin/recurrentgenes_GRCh37.bed.gz"
columns = [4]
names = ["recurrent_genes"]
ops = ["self"]

[[postannotation]]
fields = ["exac_ac_all", "exac_an_all"]
name = "exac_af_all"
op = "lua:divfunc(exac_ac_all[1],exac_an_all)"
type = "Float"

This is the lua:

function divfunc(a, b)
        if(a == 0) then 
                return 0.0
        else 
        return string.format("%.9f", a / b)
    end
end

When I create a geminidb, the database has exac_an_all and exac_af_all but not exac_ac_all. Why is that so?

If you have encountered an error, please include:

minimal conf and lua files that you are using.
urls or actual files for annotations in conf file.
minimal query file.
the command you used to invoke vcfanno
the full error message

too many open files

Hi,

I just ran a test annotation with 7 annotation VCF against a query VCF file. I received the following error (at chromosome 16):

open cosmic/cosmic.vcf.gz: too many open files

I am curious as to why the program ends with this message.

add built-in command setID available as a postannotation

the function in example/custom.lua works (after fixing bugs) but it's slow and now that it's available, it'd be nice to avoid the (small) overhead of making a call to lua and to make this more integrated as it's a useful feature.

Example error

I was trying to run vcfanno(v 0.0.011) example file, but it return error messange as follow:

vcfanno.go:156: Info Error: lua_start not found in row >> this error may occur many times. reporting once here...

Could you provide any idea to solve this problem?

go build --> undefined: interfaces.RandomGetter

Hi, Brent. This project is exactly what I have been looking for.

I'm just getting going with go, so this is probably my ignorance, but here is what the master branch shows on building....

$ go build
# github.com/brentp/vcfanno/api
../../brentp/vcfanno/api/api.go:164: o.Id undefined (type interfaces.IVariant has no field or method Id)
../../brentp/vcfanno/api/api.go:435: undefined: interfaces.RandomGetter
../../brentp/vcfanno/api/api.go:439: undefined: interfaces.RandomGetter
../../brentp/vcfanno/api/api.go:456: assignment count mismatch: 2 = 1
../../brentp/vcfanno/api/api.go:469: undefined: interfaces.RandomGetter

Any thoughts?

tsv output

Can tsv's be output instead of vcf's?

Faster find?

https://github.com/AndreasBriese/bmatch
Benchmark use in vcfgo

Truncated annotations

I'm running into a bizarre issue that at least so far seems to be tracing back to one particular variant found in Clinvar (chr7:140481403 C>A). When annotating a VCF that contains this variant I end up getting a truncated annotation info line on the output. (I see clinvar_significance= with no value followed by a tab and then the genotype fields).

Using clinvar_20160203.tidy.vcf and vcfanno 0.10

I have attached my vcfanno.conf and vcfanno.lua files, which are only slight modifications of those that were posted for basically the full GEMINI style annotation sources.

test_files.zip

-ends handling is messy

the -ends AnnotateEnds function is a major source of pain and bugs. See #5.

there's not a great way to handle this. It could be that only annotations that are tabixed can be used for -ends annotations. So, it would annotate the main interval, then if -ends is requested, it does a tabix call (now in go, but still does some file-handle stuffs) and appends the relateds for each annotation file.

That is probably the only option because they won't be sorted by the CIPOS start so the sweep won't work anyway.

error when trying to use a non-existent field in a lua op

currently, e.g.: lua:func(xxxx) will not give an error that 'xxxx' does not exist.

error using AF calc example

If you have encountered an error, please include:

minimal conf and lua files that you are using.
urls or actual files for annotations in conf file.
minimal query file.
the command you used to invoke vcfanno
the full error message

CONF

[[postannotation]]
fields=[AC, AN]
op=lua:AC / AN
name=AF
type=Float

QUERY

tmp.vcf

COMMAND

$ vcfanno add_af.toml tmp.vcf > tmp_af.vcf

ERROR

=============================================
vcfanno version 0.0.11-beta [built with devel +5ec87ba Thu Apr 28 15:36:34 2016 +0000]

see: https://github.com/brentp/vcfanno
=============================================
panic: Near line 2 (last key parsed 'postannotation.fields'): Expected value but found 'A' instead.

goroutine 1 [running]:
panic(0x7bee60, 0xc820015d00)
        /usr/local/src/go-git/src/runtime/panic.go:500 +0x18c
main.main()
        /usr/local/src/gocode/src/github.com/brentp/vcfanno/vcfanno.go:81 +0x17f5

Keep annotation source field Description

When OP is "self" or "first", is it possible to keep the annotation source's field Description instead of a message like "transfered from matched variants in..."

Max operation

[[postannotation]]
fields=["af_exac_all", "af_exac_afr", "af_exac_amr", "af_exac_eas", "af_exac_nfe", "af_exac_oth", "af_exac_sas"]
op="max"
name="max_aaf_all"
type="Float"

I am trying to do similar operation with 1K Frequency like
file="ALL.wgs.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf.gz"
[[postannotation]]
fields=["EAS_AF", "AMR_AF", "AFR_AF", "EUR_AF", "SAS_AF"]
name= "1G_MA"
op="max"
type="Float

But some how I am getting any results .
:vcfanno.go:156: Info Error: SAS_AF not found in header >> this error may occur many times. reporting once here...

Seems like for records SAS_AF is not available and hence not reporting any data.
Could you please help on this.

error early when lua op is used without -lua arg

currently, only error is to stdout which messes up the VCF.

too many open files

Hi @brentp ,

Having downloaded the latest binaries, the too many open files problem has returned. For me, vcfanno works with the Linux binary you created for me previously (http://home.chpc.utah.edu/~u6000771/vcfanno_09b), but not with the latest release.

best,
Sigve

brentp / vcfanno Goto Github PK

vcfanno's Introduction

vcfanno

Overview

Usage

Example

Typecasting values

Operations

PostAnnotation

Additional Usage

-ends

-permissive-overlap

-p

-lua

Mailing List

Installation

Multi-Allelics

vcfanno's People

Contributors

Stargazers

Watchers

Forkers

vcfanno's Issues

Here is the log:

see: https://github.com/brentp/vcfanno

fileformat=VCFv4.0

CHROM POS REF ALT INFO

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

CONF

QUERY

COMMAND

ERROR

Recommend Projects

Recommend Topics

Recommend Org