Giter Club home page Giter Club logo

cyvcf's Introduction

CyVCF

A Cython port of the PyVCF library maintained by @jamescasbon.

The goal of this project is to provide a very fast Python library for parsing and manipulating large VCF files. Cython has been used to optimize speed. This version is approximately 4 times faster than PyVCF, and the parsing speed is essentially identical to that of C/C++ libraries provided by PLINKSEQ and VCFLIB.

The functionality and interface are currently the same as documented here: http://pyvcf.rtfd.org/

Installation

python setup.py build python setup.py install

Testing

python setup.py test

Basic usage

>>> import cyvcf
>>> vcf_reader = cyvcf.Reader(open('test/example-4.0.vcf', 'rb'))
>>> for record in vcf_reader:
...     print record
20  14370   G       A       29.0    .       H2=True;NS=3;DB=True;DP=14;AF=0.5       GT:GQ:DP:HQ     0|0:48:1:51,51  1|0:48:8:51,51  1/1:43:5:.,.
20  17330   T       A       3.0     q10     NS=3;DP=11;AF=0.017     GT:GQ:DP:HQ     0|0:49:3:58,50  0|1:3:5:65,3    0/0:41:3:.
20  1110696 A       G,T     67.0    .       AA=T;NS=2;DB=True;DP=10;AF=0.333,0.667  GT:GQ:DP:HQ     1|2:21:6:23,27  2|1:2:0:18,2    2/2:35:4:.
20  1230237 T       .       47.0    .       AA=T;NS=3;DP=13 GT:GQ:DP:HQ     0|0:54:7:56,60  0|0:48:4:51,51  0/0:61:2:.
20  1234567 GTCT    G,GTACT 50.0    .       AA=G;NS=3;DP=9  GT:GQ:DP        ./.     0/2:17:2        1/1:40:3

cyvcf's People

Contributors

arq5x avatar brentp avatar chapmanb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cyvcf's Issues

pickle support

Hello, cyvcf seems to have the same problem as jamescasbon/PyVCF#108:

>>> import cyvcf
>>> import cPickle
>>> f = open(path)
>>> r = cyvcf.Reader(f)
>>> cPickle.dumps(next(r))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.virtualenvs/bio-env/lib64/python2.6/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle _Record objects

Try to install cyvcf but get an error message for missing cyvcf/parser.c

I:cyvcf xing$ python setup.py build
running build
running build_py
running build_ext
building 'cyvcf.parser' extension
cc -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c cyvcf/parser.c -o build/temp.macosx-10.11-intel-2.7/cyvcf/parser.o
clang: error: no such file or directory: 'cyvcf/parser.c'
clang: error: no input files

error: command 'cc' failed with exit status 1

Missing parser.c file?

Hi,

I would like to use your cyvcf parser, but had problems with setup. It appears that the parser.c file is missing from this repo. Could you please upload it?

Thank you,
Emily

Handle VCF with no genotypes

Some of the VCF files from dbSNP exclude any sample information and use VCF as a container for a "generic variant". That appears to be the source of this error:

/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/cyvcf/parser.so in cyvcf.parser.Reader.__next__ (cyvcf/parser.c:13509)()
/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/cyvcf/parser.so in cyvcf.parser._Record.__cinit__ (cyvcf/parser.c:4232)()
TypeError: expected string or Unicode object, NoneType found

Is there a possibility of supporting these (arguably malformed) VCF files?

Thanks,
Sean

utils.walk_together() is incorrect

Although I actually found the issue in the
https://github.com/sein-tao/cyvcf
fork, that one doesn't allow issues, while this one seems no longer active.

issue occurs when 2 files are being walked, and both have values at the same position. They are returned in 2 different yields ie
yield (rec1, None)
yield (None,rec2)

instead of being a single yield of
yield (rec1, rec2)

potentially inconsistent values in 'gt_types'

When there are missing genotypes (./.) in the VCF, they get assigned the value 2 in gt_types. I think this probably calls for some convention to represent missing genotypes -- will -1 be appropriate?

Example:

>>> import cyvcf
>>>
>>> vcf_reader = cyvcf.Reader(open('test.vcf', 'rb'))
>>> for record in vcf_reader:
...     print record
... 
1       10014   .       A       C       .       .       .     GT:AD:DP:DP4:FREQ:RD    0/0:9,0:8:8,0,0,0:0.0:8 ./.     0/1:13,3:15:12,0,3,0:0.2:12
1       10043   .       T       TAA     .       .       .    GT:AD:DP:DP4:FREQ:RD    0/0:0:13:12,1,0,0:0.0:13        0/1:2:10:8,0,2,0:0.2:8  ./.
>>> 
>>> vcf_reader = cyvcf.Reader(open('test.vcf', 'rb'))
>>> 
>>> for record in vcf_reader:
...     print record.alleles
... 
['A', 'C']
['T', 'TAA']
>>>
>>> vcf_reader = cyvcf.Reader(open('test.vcf', 'rb'))
>>> for record in vcf_reader:
...     print record.gt_types
... 
[0, 2, 1]
[0, 1, 2]

Parsing variant FILTER column

Hi,

I've used cyvcf for a long time, and really enjoy the speed compared to PyVCF. Lately I've discovered some discrepancies however that make analysis somewhat cumbersome. It specifically deals with parsing the FILTER column. It seems as if cyvcf treats an undefined filter (i.e. '.', often provided with raw,unfiltered calls from GATK) and variants that are deemed reliable after filtering (i.e. 'PASS') in the same way. Both '.' and 'PASS' in the VCF filter column are parsed into None objects by cyvcf. PyVCF treats them differently (None and an empty list, respectively). I discovered this with cyvcf 0.1.11 and PyVCF 0.6.7.

Installation of cyvcf

Hi,

I am installing cyvcf on my new Mac, but ran into trouble I have not experienced before. Cython v0.20.1 is used here. Do you have any idea why I get this error?

best,
Sigve

python setup.py build
running build
running build_py
creating build
creating build/lib.macosx-10.9-intel-2.7
creating build/lib.macosx-10.9-intel-2.7/cyvcf
copying cyvcf/init.py -> build/lib.macosx-10.9-intel-2.7/cyvcf
copying cyvcf/filters.py -> build/lib.macosx-10.9-intel-2.7/cyvcf
copying cyvcf/utils.py -> build/lib.macosx-10.9-intel-2.7/cyvcf
copying cyvcf/version.py -> build/lib.macosx-10.9-intel-2.7/cyvcf
running build_ext
cythoning cyvcf/parser.pyx to cyvcf/parser.c
warning: cyvcf/parser.pyx:352:37: cdef variable 'gt_bases' declared after it is used

Error compiling Cython file:

...
raise StopIteration

    #CHROM
    cdef bytes chrom = row[0]
    if self._prepend_chr:
        chrom = 'chr' + chrom
                     ^

cyvcf/parser.pyx:1172:26: Cannot convert 'str' to 'bytes' implicitly. This is not portable.

Error compiling Cython file:

...
cdef list row = line.split('\t')

    #CHROM
    cdef bytes chrom = row[0]
    if other._prepend_chr:
        chrom = 'chr' + chrom
                     ^

cyvcf/parser.pyx:1232:26: Cannot convert 'str' to 'bytes' implicitly. This is not portable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.