Giter Club home page Giter Club logo

pysam-developers / pysam Goto Github PK

View Code? Open in Web Editor NEW
752.0 48.0 271.0 19.59 MB

Pysam is a Python package for reading, manipulating, and writing genomics data such as SAM/BAM/CRAM and VCF/BCF files. It's a lightweight wrapper of the HTSlib API, the same one that powers samtools, bcftools, and tabix.

Home Page: https://pysam.readthedocs.io/en/latest/

License: MIT License

Python 35.75% Makefile 0.31% C 7.54% Perl 0.15% Shell 1.10% Cython 55.13% Dockerfile 0.02%

pysam's Introduction

Pysam

build status Documentation Status

Pysam is a python module for reading and manipulating files in the SAM/BAM format. The SAM/BAM format is a way to store efficiently large numbers of alignments (Li 2009), such as those routinely created by next-generation sequencing methods.

Pysam is a lightweight wrapper of the samtools C-API. Pysam also includes an interface for tabix.

If you are using the conda packaging manager (e.g. miniconda or anaconda), you can install pysam from the bioconda channel:

conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install pysam

Installation through bioconda is the recommended way to install pysam as it resolves non-python dependencies and uses pre-configured compilation options. Especially for OS X this will potentially save a lot of trouble.

The current version of pysam wraps 3rd-party code from htslib-1.18, samtools-1.18, and bcftools-1.18.

Pysam is available through pypi. To install, type:

pip install pysam

Pysam documentation is available here

Questions and comments are very welcome and should be sent to the pysam user group

pysam's People

Contributors

0xaf1f avatar abjonnes avatar amblina avatar andreasheger avatar andreashegergenomics avatar benjschiller avatar bioinformed avatar dannon avatar dpryan79 avatar explodingcabbage avatar jeromekelleher avatar jmarshall avatar juliangehring avatar jvkersch avatar kevinjacobs-progenity avatar kpalin avatar kyleabeauchamp avatar marcelm avatar mckinsel avatar mdpearson avatar misha-at-genestack avatar mvdbeek avatar nh13 avatar nsoranzo avatar petehaitch avatar ramyala avatar terrycojones avatar tfwillems avatar tyberiusprime avatar wckdouglas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pysam's Issues

pileupcolumn.pileups is broken in pysam 0.8.0 and 0.8.1

Here is a quick script to test it:

#!/usr/bin/python3

import pysam


def BAMtest(bamfile_name):

    # Set bamfile 'settings'
    bamfile = pysam.Samfile(bamfile_name, 'rb')

    for refs in bamfile.references:
        print(refs)
        for pileupcolumn in bamfile.pileup(refs):
            for pileupread in pileupcolumn.pileups:
                print(pileupread)
                print(str(pileupread))


def RunModule(bamfile_name):
    """Run the module."""
    BAMtest(bamfile_name)

if __name__ == "__main__":
    from sys import argv
    RunModule(argv[1])

This script errors out with:

python3 ~/aa.py Taes.bam
Taes_c1
Traceback (most recent call last):
  File "/home/francisco/aa.py", line 25, in <module>
    RunModule(argv[1])
  File "/home/francisco/aa.py", line 21, in RunModule
    BAMtest(bamfile_name)
  File "/home/francisco/aa.py", line 15, in BAMtest
    print(pileupread)
  File "calignmentfile.pyx", line 3495, in pysam.calignmentfile.PileupRead.__str__ (pysam/calignmentfile.c:37217)
  File "calignmentfile.pyx", line 3527, in pysam.calignmentfile.PileupRead.is_refskip.__get__ (pysam/calignmentfile.c:37807)
AttributeError: 'pysam.calignmentfile.PileupRead' object has no attribute '_is_refskip'

It throws no errors with pysam 0.7.8.
Tested on python 2.7, 3.2 and 3.4

pysam_set_flag should accept flags greater than 255

In [1]: import pysam

In [2]: pysam.AlignedRead
pysam.AlignedRead

In [2]: pysam.AlignedRead?

In [3]: bam = pysam.Samfile("mybam.bam")

In [4]: for i in bam:
   ...:     break
   ...: 

In [6]: i.flag
Out[6]: 2048

In [7]: i.flag=257
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-8-d40e59acfd40> in <module>()
----> 1 i.flag=257

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pysam/csamfile.so in pysam.csamfile.AlignedRead.flag.__set__ (pysam/csamfile.c:27922)()

OverflowError: value too large to convert to uint8_t

Import Error after clean pip install

Hello,

I'm getting the following error when trying to import pysam:

import pysam Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/share/apps/local/python/lib/python2.7/site-packages/pysam/__init__.py", line 13, in <module> import pysam.csamtools as csamtools ImportError: /share/apps/local/python/lib/python2.7/site-packages/pysam/csamtools.so: undefined symbol: regcompA

PySam was installed with pip install on my machine, no errors so far. Any ideas? I've been googling without luck.

GTFProxy.__getattr__ fails for unquoted values

If the value assigned to a GTF attribute is not quoted (for example "exon_number 1;" in at least some GENCODE v19 GTF files), then GTFProxy.getattr returns the complete substring starting from the value of the requested attribute, including any remaining attributes and field separators. For comparison, GTFProxy.asDict correctly parses such values:

print(gtf_record.asDict())
{'gene_status': 'NOVEL', 'exon_number': 1, 'level': 2, 'transcript_type': 'lincRNA', 'tag': 'not_best_in_genome_evidence', 'gene_id': 'ENSG00000243485.2', 'exon_id': 'ENSE00001947070.1', 'transcript_id': 'ENST00000473358.1', 'havana_transcript': 'OTTHUMT00000002840.1', 'havana_gene': 'OTTHUMG00000000959.2', 'transcript_name': 'MIR1302-11-001', 'gene_type': 'lincRNA', 'transcript_status': 'KNOWN', 'gene_name': 'MIR1302-11'}

print(gtf_record.exon_number)
1; exon_id "ENSE00001947070.1"; level 2; tag "not_best_in_genome_evidence"; havana_gene "OTTHUMG00000000959.2"; havana_transcript "OTTHUMT00000002840.1";

Extracting reference positions of all read-positions (including soft-clips)

I would like to create a list of mapped read-positions, with None for read-positions that aren't in the reference, e.g. insertions and soft-clips, so that the resulting list is of the same length as the seq property. Neither the positions nor aligned_pairs property quite give me what I'd like, although the aligned_pairs property gets close.

For example, consider the following read:

import pysam
read = pysam.AlignedRead()
read.qname = "DJTPB5M1:306:D278CACXX:3:2308:20000:70860"
read.flag = 99
read.rname = 0
read.pos = 3232562
read.seq = "AGACATTAGGAAATAATCAAACTCATTGCTGAAATCAACCATGTAGAAACATGAACTATTCAAAGAATCAACCAAAGGAGGAGTTGGTTCTTTGAGAAAA"
read.qual = "AB@ABAA@CCBA@@?@?A@@@?@A@??BAAA@@@?A@@@A@?A?>BA@A?@?B@@?@???A@@@B@@?A@@?A?@?BB@BBAB?AA>@@CDAADBDA@BB"
read.mapq = 40
read.cigar = [(4, 30), (0, 20), (1, 3), (0, 47)]
read.rnext = 0
read.pnext = 3232634
read.tlen = 172

Then I get the following (pysam v0.7.5):

# Doesn't contain insertions or soft-clips
read.positions 
[3232562L, 3232563L, 3232564L, 3232565L, 3232566L, 3232567L, 3232568L, 3232569L, 3232570L, 3232571L, 3232572L, 3232573L, 3232574L, 3232575L, 3232576L, 3232577L, 3232578L, 3232579L, 3232580L, 3232581L, 3232582L, 3232583L, 3232584L, 3232585L, 3232586L, 3232587L, 3232588L, 3232589L, 3232590L, 3232591L, 3232592L, 3232593L, 3232594L, 3232595L, 3232596L, 3232597L, 3232598L, 3232599L, 3232600L, 3232601L, 3232602L, 3232603L, 3232604L, 3232605L, 3232606L, 3232607L, 3232608L, 3232609L, 3232610L, 3232611L, 3232612L, 3232613L, 3232614L, 3232615L, 3232616L, 3232617L, 3232618L, 3232619L, 3232620L, 3232621L, 3232622L, 3232623L, 3232624L, 3232625L, 3232626L, 3232627L, 3232628L]

len(read.positions)
67
len(read.positions) == len(read.seq)
False

# Contains insertions but doesn't contain soft-clips
[y[1] for y in read.aligned_pairs if not y[0] is None] # "if not y[0] is None" in the list-comprehension is to handle deletions.
[3232562L, 3232563L, 3232564L, 3232565L, 3232566L, 3232567L, 3232568L, 3232569L, 3232570L, 3232571L, 3232572L, 3232573L, 3232574L, 3232575L, 3232576L, 3232577L, 3232578L, 3232579L, 3232580L, 3232581L, None, None, None, 3232582L, 3232583L, 3232584L, 3232585L, 3232586L, 3232587L, 3232588L, 3232589L, 3232590L, 3232591L, 3232592L, 3232593L, 3232594L, 3232595L, 3232596L, 3232597L, 3232598L, 3232599L, 3232600L, 3232601L, 3232602L, 3232603L, 3232604L, 3232605L, 3232606L, 3232607L, 3232608L, 3232609L, 3232610L, 3232611L, 3232612L, 3232613L, 3232614L, 3232615L, 3232616L, 3232617L, 3232618L, 3232619L, 3232620L, 3232621L, 3232622L, 3232623L, 3232624L, 3232625L, 3232626L, 3232627L, 3232628L]

len([y[1] for y in read.aligned_pairs if not y[0] is None])
70
len([y[1] for y in read.aligned_pairs if not y[0] is None]) == len(read.seq)
False

I can get what I want using this function:

def get_read_positions(read):
  """Get read positions while allowing for inserted and soft-clipped bases.

  Args:
      read: A pysam.AlignedRead instance.

  Returns:
      A list of read positions equal in length to read.seq. The result is identical to read.positions if the read does not contain any insertions or soft-clips. Read-positions that are insertions or soft-clips have None as the corresponding element in the returned list.
  """
  # Check read actually has CIGAR
  if read.cigar is None:
    # No CIGAR string so positions must be [] because there is no alignment.
    read_positions = []
  else:
    # From the SAM spec (http://samtools.github.io/hts-specs/SAMv1.pdf), "S may only have H operations between them and the ends of the CIGAR string".
    n = len(read.cigar)
    # If first CIGAR operation is H (5), check whether second is S (4).
    if read.cigar[0][0] == 5:
      if n > 1:
        if read.cigar[1][0] == 4:
          read_positions = [None] * read.cigar[1][1]
        else:
          read_positions = []
    # Check if first CIGAR operation is S (4).
    elif read.cigar[0][0] == 4:
      read_positions = [None] * read.cigar[0][1]
    # Otherwise there can't be any leftmost soft-clipping.
    else:
      read_positions = []
    # Add "internal" read-positions, which do not contain S/H operations and so can be extracted from the aligned_pairs property
    read_positions = read_positions + [y[1] for y in read.aligned_pairs if not y[0] is None]
    # If last CIGAR operation is H (5), check whether second-last is S (4).
    if read.cigar[n - 1][0] == 5:
      if n > 1:
        # If second-last positions is S (4), then need to pad but otherwise nothing to do (and also no need for "closing" else).
        if read.cigar[n - 2][0] == 4:
          read_positions = read_positions + [None] * read.cigar[n - 2][1]
    # Check if last CIGAR operation is S (4).
    elif read.cigar[n - 1][0] == 4:
      read_positions = read_positions + [None] * read.cigar[n - 1][1]
  return read_positions

# Includes both insertions and soft-clipped positions.
get_read_positions(read)
[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, 3232562L, 3232563L, 3232564L, 3232565L, 3232566L, 3232567L, 3232568L, 3232569L, 3232570L, 3232571L, 3232572L, 3232573L, 3232574L, 3232575L, 3232576L, 3232577L, 3232578L, 3232579L, 3232580L, 3232581L, None, None, None, 3232582L, 3232583L, 3232584L, 3232585L, 3232586L, 3232587L, 3232588L, 3232589L, 3232590L, 3232591L, 3232592L, 3232593L, 3232594L, 3232595L, 3232596L, 3232597L, 3232598L, 3232599L, 3232600L, 3232601L, 3232602L, 3232603L, 3232604L, 3232605L, 3232606L, 3232607L, 3232608L, 3232609L, 3232610L, 3232611L, 3232612L, 3232613L, 3232614L, 3232615L, 3232616L, 3232617L, 3232618L, 3232619L, 3232620L, 3232621L, 3232622L, 3232623L, 3232624L, 3232625L, 3232626L, 3232627L, 3232628L]

len(get_read_positions(read))
100
len(get_read_positions(read)) == len(read.seq)
True

But this makes me wonder whether this might be better done at a lower level because I call get_read_positions on most reads in the BAM files I am parsing.

For example, I feel like I could get what I want by modifying the aligned_pairs property, which is implemented in Cython (correct?)

So, finally, to my questions:

  1. Have I completely overlooked a simpler/faster solution?
  2. Is it possible to move my get_read_positions to the Cython level as a property of an AlignedRead object and am I likely to get much of a performance improvement?

CIGAR property cannot be cleared

The current implementation of the "AlignedRead.cigar" property does not allow one to clear this value, as both None and empty sequences are silently ignored during assignment.

As far as I can see, this means that if you want to (for example) change a mapped read to an unmapped read, you have to create a new AlignedRead instance and copy over all relevant properties, rather than just clearing the relevant fields (mapq, CIGAR, etc.). Clearing the CIGAR is necessary as Picard ValidateSamFile.jar considers unmapped reads with CIGAR strings as erroneous.

Possible query alignment length bug?

It's very possible that I've missed something strange about this alignment, but at first glance this appears to be a bug because since query_alignment_length is supposed to be qend - qstart (according to the documentation), and it's not:

>>> import pysam
>>> bf = pysam.AlignmentFile('example.sam')
>>> read = bf.next()
>>> read.query_length
51
>>> read.query_alignment_length
51
>>> read.query_alignment_start
0
>>> read.query_alignment_end
35
>>> len(read.query_alignment_sequence)
35

example.sam can be retrieved for reproducibility from: https://gist.githubusercontent.com/vsbuffalo/94be7da4654d2af37c06/raw/c28fd61b34cd1ab811a8b3c1580c70887e685d1a/example.sam (it's a single read from 1KG data).

>>> pysam.__version__
'0.8.1'

Installed via pip.

pysam.view inconsistently throws SamtoolsError in python3

Upgrading to Python3 I run into an error that is only seen intermittently. If I wait a few seconds it seems to go away. It never occurs the first time pysam.view is called, but occurs if pysam.view is called shortly thereafter.

b1 = pysam.view('b', 'my_bam_file', *list_of_accession_numbers)
b2 = pysam.view('b', 'my_bam_file', *list_of_accession_numbers)
pysam.SamtoolsError: 'csamtools returned with error 1: '
b3 = pysam.view('b', 'my_bam_file', *list_of_accession_numbers)

I temporarily solved it by putting pysam.view in a loop and calling time.sleep(1) after each failed attempt.
If I print the exception message I get:
view: invalid option -- ' '
Where the value between the single quotes is a random character or byte.

Parser asVCF does not work

Thanks for the nice work. This module could potentially be very useful to me.

I'd like to use your module to access a remote bgzip file that has been indexed with tabix. I'm not sure if I'm not using your code correctly or if you have a bug. Can you please advise?

I'd like to use your code to parse the VCF data, but it fails:

import pysam
file1 = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz"
vcf = pysam.VCF()
vcf.connect(file1)
it = vcf.fetch(region="1:10000000-20000000")
it.next()

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-31-54f0920595b2> in <module>()
----> 1 it.next()

/home/unix/slowikow/.local/lib/python2.7/site-packages/pysam-0.7.7-py2.7-linux-x86_64.egg/pysam/ctabix.so in pysam.ctabix.TabixIteratorParsed.__next__ (pysam/ctabix.c:5878)()

/home/unix/slowikow/.local/lib/python2.7/site-packages/pysam-0.7.7-py2.7-linux-x86_64.egg/pysam/ctabix.so in pysam.ctabix.Parser.parse (pysam/ctabix.c:5133)()

NotImplementedError: 

I can get the raw text, but it is not parsed by your module:

import pysam
file1 = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz"
vcf = pysam.VCF()
vcf.connect(file1)
it = vcf.tabixfile.fetch(region="1:10000000-20000000")
it.next()

'1\t10000400\trs1237370\tT\tA\t.\tPASS\tDP=1502;AF=0.628;CB=UM,BI,NCBI;EUR_R2=0.854;AFR_R2=0.826 ...

If I try to pass a parser, it fails:

import pysam
file1 = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz"
vcf = pysam.VCF()
vcf.connect(file1)
it = vcf.tabixfile.fetch(region="1:10000000-20000000", parser=pysam.asVCF)
it.next()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-0adad7e36da5> in <module>()
      3 vcf = pysam.VCF()
      4 vcf.connect(file1)
----> 5 it = vcf.tabixfile.fetch(region="1:10000000-20000000", parser=pysam.asVCF)
      6 it.next()

/home/unix/slowikow/.local/lib/python2.7/site-packages/pysam-0.7.7-py2.7-linux-x86_64.egg/pysam/ctabix.so in pysam.ctabix.Tabixfile.fetch (pysam/ctabix.c:3888)()

TypeError: Argument 'parser' has incorrect type (expected pysam.ctabix.Parser, got type)

BAM file created by BWA with -R argument causes 'ValueError: unknown field code 'SM' in record 'PG''

The BAM file was created with a BWA 0.7.9a-r786.
It inserted the following @pg header line:

@PG     ID:bwa  PN:bwa  VN:0.7.9a-r786  CL:bwa mem -p -t 8 -M -R @RG    ID:None SM:None /mnt/data/hg19.fa /mnt/analysis/default-0.fastq

The -R argument to bwa is used to supply the read group header, and BWA help instructs the user to supply tab-delimted fields to that argument:

       -R STR     read group header line such as '@RG\tID:foo\tSM:bar' [null]

The problem is that BWA puts the whole command-line, including the tab-delimited -R argument in the @PG CL: field. But the SAM header format apparently doesn't have a way to escape tabs in the values, so pysam is interpreting the SM:None field as a field in the @PG header, rather than as part of the CL: string. Seems that the real problem is in the SAM spec. Should pysam be made more forgiving of this case?

In [1]: import pysam

In [2]: sf =pysam.Samfile("/mnt/data/output.bam")

In [3]: sf.header
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-40307eadca76> in <module>()
----> 1 sf.header

/mnt/home/pat/code/pl2/devpipes/anaconda/1.9.2/lib/python2.7/site-packages/pysam/csamfile.so in pysam.csamfile.Samfile.header.__get__ (pysam/csamfile.c:12899)()

ValueError: unknown field code 'SM' in record 'PG'

In [4]: pysam.version.__version__
Out[4]: '0.8.0'

fancy_str method broken?

Hello good people,

I try to invoke the fancy_str method as follows (note, I am on python 3.2):

with pysam.Samfile( in_fp, "rb" ) as samfile:
    for a in samfile.fetch( ref, start, end ):
        print( a.fancy_str() )

However, an error is raised:

  File "csamtools.pyx", line 3244, in pysam.csamtools.AlignedRead.fancy_str (pysam/csamtools.c:32762)
AttributeError: 'pysam.csamtools.AlignedRead' object has no attribute '__dict__'

-Lee

Writing whole alignment to new sam/bam

Hello

I'm doing a script to filter reads by alignment score tag. I've been reading the docs but I don't know how to print the whole read in sam/bam input file to my filtered sam/bam output.
I would like to preserve the original sam header from input file and reads passing my filter. So I'm not manipulating alignment attributes.
Do I've to reconstruct the read manually from AlignedRead object?

Thanks !!

pip upgrade bug

I am trying to use pip to upgrade pysam but I am having this issue

cc1: error: unrecognized command line option "-Wno-error=declaration-after-statement"

Is there anything you can do for that or should we download and use setup.py instead ?

Thanks

Under Python 3, .seq and .qual are no longer strings

I would expect .seq and .qual to remain as strings under Python 3 (i.e. unicode strings, not bytes). This is important for writing Python code which runs under both Python 2 and 3 without modification.

$ python2.6
Python 2.6.8 (unknown, Mar  9 2014, 22:16:00) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
>>> pysam.__version__
'0.8.0'
>>> read = next(pysam.Samfile("ex8.sam", "r"))
>>> read
<pysam.csamfile.AlignedRead object at 0x100e52170>
>>> read.seq
'CATGAAGAACCGCTGGGTATGGAGCACACCTCACCTGATGGACAGTTGATTATGCTCACCTTAACGCTAATTGAGAGCAGCACAAGAGGACTGGAAACTAGAATTTACTCCTCATCTCCGAAGATGTGAATATTCTAAATTCAGCTTGCCTCTTGCTTC'
>>> type(read.seq)
<type 'str'>
>>> read.qual
'IID7757111/=;?///:D>777;EEGAAAEEIHHIIIIIIIIIIIIIIBBBIIIIH==<<<DDGEEE;<<<A><<<DEDDA>>>D?1112544556::03---//25.22=;DD?;;;>BDDDEEEGGGA<888<BAA888<GGGGGEB?9::DD551'
>>> type(read.qual)
<type 'str'>
>>> quit()

Compared to:

$ python3.3
Python 3.3.3rc1 (default, Nov  4 2013, 14:57:57) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pysam
>>> pysam.__version__
'0.8.0'
>>> read = next(pysam.Samfile("ex8.sam", "r"))
>>> read
<pysam.csamfile.AlignedRead object at 0x1039754b0>
>>> read.seq
b'CATGAAGAACCGCTGGGTATGGAGCACACCTCACCTGATGGACAGTTGATTATGCTCACCTTAACGCTAATTGAGAGCAGCACAAGAGGACTGGAAACTAGAATTTACTCCTCATCTCCGAAGATGTGAATATTCTAAATTCAGCTTGCCTCTTGCTTC'
>>> type(read.seq)
<class 'bytes'>
>>> read.qual
b'IID7757111/=;?///:D>777;EEGAAAEEIHHIIIIIIIIIIIIIIBBBIIIIH==<<<DDGEEE;<<<A><<<DEDDA>>>D?1112544556::03---//25.22=;DD?;;;>BDDDEEEGGGA<888<BAA888<GGGGGEB?9::DD551'
>>> type(read.qual)
<class 'bytes'>
>>> quit()

Note that the SAM vs BAM behaviour is consistent :)

pysam.Tabixfile.fetch with start and/or end broken on Python 3

With pysam 0.8.0 it seems all string inputs are required to be bytestrings. (I think this is not a good decision, but let's leave that discussion to #29.)

However, even if I provide a bytestring chromosome name to Tabixfile.fetch, it is formatted to a unicode region string together with start and/or end position(s).

The following session on Python 3.4 illustrates this. Note that it works without position arguments:

>>> import pysam
>>> tabix = pysam.Tabixfile('vcf/test/tb.vcf.gz')
>>> tabix.fetch('20')
TypeError: expected bytes, str found
>>> tabix.fetch(b'20')
<pysam.ctabix.TabixIterator at 0x7f21fa607fc0>
>>> tabix.fetch(b'20', 500, 5000)
TypeError: expected bytes, str found

The culprit is here: https://github.com/pysam-developers/pysam/blob/master/pysam/ctabix.pyx#L326

Changing the region formatting to yield a bytestring would be an easy fix. But (briefly coming back to this discussion), I'd much rather see the move to bytestrings reverted as soon as possible and a strategy as suggested by @gotgenes implemented. The unicode/bytestring issues are certainly not handled properly in PyVCF, but I think that would be the only hope for fixing things in a sane way.

Install Error

Hello,

today I tried to install pysam into my new Python3.4 environment with

pip install --user pysam

It failed with the following error log:

Downloading/unpacking pysam
  Downloading pysam-0.7.7.tar.gz (1.5MB): 1.5MB downloaded
  Running setup.py (path:/tmp/pip_build_dominik/pysam/setup.py) egg_info for package pysam

    warning: no files found matching 'distribute_setup.py'
    warning: no files found matching 'pysam/csamtools.c'
    warning: no files found matching 'pysam/ctabix.c'
    warning: no files found matching 'pysam/TabProxies.c'
    warning: no files found matching 'pysam/cvcf.c'
Requirement already satisfied (use --upgrade to upgrade): cython>=0.17 in /usr/lib/python3.4/site-packages (from pysam)
Installing collected packages: pysam
  Running setup.py install for pysam
    Fixing build/lib.linux-x86_64-3.4/pysam/__init__.py build/lib.linux-x86_64-3.4/pysam/namedtuple.py build/lib.linux-x86_64-3.4/pysam/Pileup.py build/lib.linux-x86_64-3.4/pysam/version.py build/lib.linux-x86_64-3.4/pysam/include/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/win32/__init__.py build/lib.linux-x86_64-3.4/pysam/include/tabix/__init__.py
    Skipping implicit fixer: buffer
    Skipping implicit fixer: idioms
    Skipping implicit fixer: set_literal
    Skipping implicit fixer: ws_comma
    Fixing build/lib.linux-x86_64-3.4/pysam/__init__.py build/lib.linux-x86_64-3.4/pysam/namedtuple.py build/lib.linux-x86_64-3.4/pysam/Pileup.py build/lib.linux-x86_64-3.4/pysam/version.py build/lib.linux-x86_64-3.4/pysam/include/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/win32/__init__.py build/lib.linux-x86_64-3.4/pysam/include/tabix/__init__.py
    Skipping implicit fixer: buffer
    Skipping implicit fixer: idioms
    Skipping implicit fixer: set_literal
    Skipping implicit fixer: ws_comma
    cythoning pysam/csamtools.pyx to pysam/csamtools.c
    building 'pysam.csamtools' extension
    gcc -pthread -Wno-unused-result -Werror=declaration-after-statement -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE= -Isamtools -Ipysam -I/usr/include/python3.4m -c pysam/csamtools.c -o build/temp.linux-x86_64-3.4/pysam/csamtools.o
    In file included from pysam/csamtools.c:351:0:
    samtools/bam.h:383:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]
      void *bam_strmap_init();
      ^
    samtools/bam.h:398:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]
      bam_header_t *bam_header_init();
      ^
    In file included from pysam/csamtools.c:354:0:
    pysam/pysam_util.h:22:1: warning: function declaration isn't a prototype [-Wstrict-prototypes]
     void pysam_unset_stderr();
     ^
    pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_20fetch':
    pysam/csamtools.c:13539:7: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]
           __pyx_t_5 = __Pyx_PyInt_From_int(bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_rtid, __pyx_v_rstart, __pyx_v_rend, ((void *)__pyx_v_callback), __pyx_f_5pysam_9csamtools_fetch_callback)); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1071; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
           ^
    In file included from pysam/csamtools.c:351:0:
    samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'
      int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);
          ^
    pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_22mate':
    pysam/csamtools.c:14008:3: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]
       bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_read->_delegate->core.mtid, __pyx_v_read->_delegate->core.mpos, (__pyx_v_read->_delegate->core.mpos + 1), ((void *)(&__pyx_v_mate_data)), __pyx_f_5pysam_9csamtools_mate_callback);
       ^
    In file included from pysam/csamtools.c:351:0:
    samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'
      int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);
          ^
    pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_24count':
    pysam/csamtools.c:14525:5: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]
         bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_rtid, __pyx_v_rstart, __pyx_v_rend, ((void *)(&__pyx_v_counter)), __pyx_f_5pysam_9csamtools_count_callback);
         ^
    In file included from pysam/csamtools.c:351:0:
    samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'
      int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);
          ^
    pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_26pileup':
    pysam/csamtools.c:14979:7: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]
           bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_rtid, __pyx_v_rstart, __pyx_v_rend, __pyx_v_buf, __pyx_f_5pysam_9csamtools_pileup_fetch_callback);
           ^
    In file included from pysam/csamtools.c:351:0:
    samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'
      int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);
          ^
    pysam/csamtools.c: In function '__pyx_f_5pysam_9csamtools___advance_snpcalls':
    pysam/csamtools.c:22036:7: warning: implicit declaration of function 'bam_prob_realn' [-Wimplicit-function-declaration]
           bam_prob_realn(__pyx_v_b, __pyx_v_d->seq);
           ^
    pysam/csamtools.c:22064:7: warning: implicit declaration of function 'bam_cap_mapQ' [-Wimplicit-function-declaration]
           __pyx_v_q = bam_cap_mapQ(__pyx_v_b, __pyx_v_d->seq, __pyx_v_capQ_thres);
           ^
    pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_20IteratorColumnRegion_2__next__':
    pysam/csamtools.c:23733:5: warning: passing argument 1 of '__pyx_f_5pysam_9csamtools_makePileupProxy' from incompatible pointer type [enabled by default]
         __pyx_t_2 = __pyx_f_5pysam_9csamtools_makePileupProxy((&__pyx_v_self->__pyx_base.plp), __pyx_v_self->__pyx_base.tid, __pyx_v_self->__pyx_base.pos, __pyx_v_self->__pyx_base.n_plp); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2118; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
         ^
    pysam/csamtools.c:4177:18: note: expected 'struct bam_pileup1_t **' but argument is of type 'const struct bam_pileup1_t **'
     static PyObject *__pyx_f_5pysam_9csamtools_makePileupProxy(bam_pileup1_t **__pyx_v_plp, int __pyx_v_tid, int __pyx_v_pos, int __pyx_v_n) {
                      ^
    pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_21IteratorColumnAllRefs_2__next__':
    pysam/csamtools.c:24006:7: warning: passing argument 1 of '__pyx_f_5pysam_9csamtools_makePileupProxy' from incompatible pointer type [enabled by default]
           __pyx_t_2 = __pyx_f_5pysam_9csamtools_makePileupProxy((&__pyx_v_self->__pyx_base.plp), __pyx_v_self->__pyx_base.tid, __pyx_v_self->__pyx_base.pos, __pyx_v_self->__pyx_base.n_plp); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2149; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
           ^
    pysam/csamtools.c:4177:18: note: expected 'struct bam_pileup1_t **' but argument is of type 'const struct bam_pileup1_t **'
     static PyObject *__pyx_f_5pysam_9csamtools_makePileupProxy(bam_pileup1_t **__pyx_v_plp, int __pyx_v_tid, int __pyx_v_pos, int __pyx_v_n) {
                      ^
    gcc -pthread -Wno-unused-result -Werror=declaration-after-statement -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE= -Isamtools -Ipysam -I/usr/include/python3.4m -c pysam/pysam_util.c -o build/temp.linux-x86_64-3.4/pysam/pysam_util.o
    In file included from pysam/pysam_util.c:3:0:
    samtools/bam.h:383:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]
      void *bam_strmap_init();
      ^
    samtools/bam.h:398:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]
      bam_header_t *bam_header_init();
      ^
    In file included from pysam/pysam_util.c:6:0:
    samtools/bam_endian.h:6:19: warning: function declaration isn't a prototype [-Wstrict-prototypes]
     static inline int bam_is_big_endian()
                       ^
    In file included from pysam/pysam_util.c:8:0:
    pysam/pysam_util.h:22:1: warning: function declaration isn't a prototype [-Wstrict-prototypes]
     void pysam_unset_stderr();
     ^
    pysam/pysam_util.c:30:6: warning: function declaration isn't a prototype [-Wstrict-prototypes]
     void pysam_unset_stderr()
          ^
    In file included from pysam/pysam_util.c:4:0:
    samtools/khash.h:168:23: warning: function declaration isn't a prototype [-Wstrict-prototypes]
      SCOPE kh_##name##_t *kh_init_##name() {        \
                           ^
    samtools/khash.h:307:2: note: in expansion of macro 'KHASH_INIT2'
      KHASH_INIT2(name, static inline, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)
      ^
    samtools/khash.h:495:2: note: in expansion of macro 'KHASH_INIT'
      KHASH_INIT(name, khint32_t, khval_t, 1, kh_int_hash_func, kh_int_hash_equal)
      ^
    pysam/pysam_util.c:66:1: note: in expansion of macro 'KHASH_MAP_INIT_INT'
     KHASH_MAP_INIT_INT(i, bam_binlist_t);
     ^
    samtools/khash.h:168:23: warning: function declaration isn't a prototype [-Wstrict-prototypes]
      SCOPE kh_##name##_t *kh_init_##name() {        \
                           ^
    samtools/khash.h:307:2: note: in expansion of macro 'KHASH_INIT2'
      KHASH_INIT2(name, static inline, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)
      ^
    samtools/khash.h:526:2: note: in expansion of macro 'KHASH_INIT'
      KHASH_INIT(name, kh_cstr_t, khval_t, 1, kh_str_hash_func, kh_str_hash_equal)
      ^
    pysam/pysam_util.c:67:1: note: in expansion of macro 'KHASH_MAP_INIT_STR'
     KHASH_MAP_INIT_STR(s, int)
     ^
    pysam/pysam_util.c:99:19: warning: function declaration isn't a prototype [-Wstrict-prototypes]
     static mempool_t *mp_init()
                       ^
    pysam/pysam_util.c: In function 'pysam_pileup_next':
    pysam/pysam_util.c:196:8: warning: assignment discards 'const' qualifier from pointer target type [enabled by default]
       *plp = bam_plp_next(buf->iter, tid, pos, n_plp);
            ^
    pysam/pysam_util.c: In function 'pysam_dispatch':
    pysam/pysam_util.c:340:3: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
       int retval = 0;
       ^
    In file included from pysam/pysam_util.h:4:0,
                     from pysam/pysam_util.c:8:
    pysam/pysam_util.c: At top level:
    samtools/kseq.h:152:16: warning: 'kseq_init' defined but not used [-Wunused-function]
      SCOPE kseq_t *kseq_init(type_t fd)         \
                    ^
    samtools/kseq.h:223:2: note: in expansion of macro '__KSEQ_BASIC'
      __KSEQ_BASIC(SCOPE, type_t)     \
      ^
    samtools/kseq.h:226:35: note: in expansion of macro 'KSEQ_INIT2'
     #define KSEQ_INIT(type_t, __read) KSEQ_INIT2(static, type_t, __read)
                                       ^
    pysam/pysam_util.h:8:1: note: in expansion of macro 'KSEQ_INIT'
     KSEQ_INIT(gzFile, gzread)
     ^
    samtools/kseq.h:158:13: warning: 'kseq_destroy' defined but not used [-Wunused-function]
      SCOPE void kseq_destroy(kseq_t *ks)         \
                 ^
    samtools/kseq.h:223:2: note: in expansion of macro '__KSEQ_BASIC'
      __KSEQ_BASIC(SCOPE, type_t)     \
      ^
    samtools/kseq.h:226:35: note: in expansion of macro 'KSEQ_INIT2'
     #define KSEQ_INIT(type_t, __read) KSEQ_INIT2(static, type_t, __read)
                                       ^
    pysam/pysam_util.h:8:1: note: in expansion of macro 'KSEQ_INIT'
     KSEQ_INIT(gzFile, gzread)
     ^
    samtools/kseq.h:172:12: warning: 'kseq_read' defined but not used [-Wunused-function]
      SCOPE int kseq_read(kseq_t *seq) \
                ^
    samtools/kseq.h:224:2: note: in expansion of macro '__KSEQ_READ'
      __KSEQ_READ(SCOPE)
      ^
    samtools/kseq.h:226:35: note: in expansion of macro 'KSEQ_INIT2'
     #define KSEQ_INIT(type_t, __read) KSEQ_INIT2(static, type_t, __read)
                                       ^
    pysam/pysam_util.h:8:1: note: in expansion of macro 'KSEQ_INIT'
     KSEQ_INIT(gzFile, gzread)
     ^
    pysam/pysam_util.c:99:19: warning: 'mp_init' defined but not used [-Wunused-function]
     static mempool_t *mp_init()
                       ^
    pysam/pysam_util.c:105:13: warning: 'mp_destroy' defined but not used [-Wunused-function]
     static void mp_destroy(mempool_t *mp)
                 ^
    cc1: some warnings being treated as errors
    warning: pysam/csamtools.pyx:316:55: Unreachable code
    warning: pysam/csamtools.pyx:322:32: Unreachable code
    warning: pysam/csamtools.pyx:331:32: Unreachable code
    warning: pysam/csamtools.pyx:2165:18: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
    warning: pysam/csamtools.pyx:2188:18: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
    warning: pysam/csamtools.pyx:2327:20: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
    warning: pysam/csamtools.pyx:2327:24: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
    error: command 'gcc' failed with exit status 1
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_dominik/pysam/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-ktckw0ld-record/install-record.txt --single-version-externally-managed --compile --user:
    running install

running build

running build_py

creating build

creating build/lib.linux-x86_64-3.4

creating build/lib.linux-x86_64-3.4/pysam

copying pysam/__init__.py -> build/lib.linux-x86_64-3.4/pysam

copying pysam/namedtuple.py -> build/lib.linux-x86_64-3.4/pysam

copying pysam/Pileup.py -> build/lib.linux-x86_64-3.4/pysam

copying pysam/version.py -> build/lib.linux-x86_64-3.4/pysam

creating build/lib.linux-x86_64-3.4/pysam/include

copying pysam/include/__init__.py -> build/lib.linux-x86_64-3.4/pysam/include

creating build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/__init__.py -> build/lib.linux-x86_64-3.4/pysam/include/samtools

creating build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools

copying samtools/bcftools/__init__.py -> build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools

creating build/lib.linux-x86_64-3.4/pysam/include/samtools/win32

copying samtools/win32/__init__.py -> build/lib.linux-x86_64-3.4/pysam/include/samtools/win32

creating build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/__init__.py -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying pysam/ctabix.pxd -> build/lib.linux-x86_64-3.4/pysam

copying pysam/csamtools.pxd -> build/lib.linux-x86_64-3.4/pysam

copying pysam/cvcf.pxd -> build/lib.linux-x86_64-3.4/pysam

copying pysam/TabProxies.pxd -> build/lib.linux-x86_64-3.4/pysam

copying pysam/tabix_util.h -> build/lib.linux-x86_64-3.4/pysam

copying pysam/pysam_util.h -> build/lib.linux-x86_64-3.4/pysam

copying samtools/kstring.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/ksort.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/kaln.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/bam_endian.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/faidx.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/knetfile.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/sam.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/bam_tview.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/bgzf.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/pysam.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/sample.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/kseq.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/khash.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/errmod.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/bam2bcf.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/kprobaln.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/sam_header.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/razf.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/klist.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/bam.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools

copying samtools/bcftools/bcf.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools

copying samtools/bcftools/kmin.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools

copying samtools/bcftools/prob1.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools

copying samtools/win32/xcurses.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools/win32

copying samtools/win32/zlib.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools/win32

copying samtools/win32/zconf.h -> build/lib.linux-x86_64-3.4/pysam/include/samtools/win32

copying tabix/kstring.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/ksort.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/bam_endian.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/knetfile.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/bgzf.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/pysam.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/kseq.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/tabix.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

copying tabix/khash.h -> build/lib.linux-x86_64-3.4/pysam/include/tabix

Fixing build/lib.linux-x86_64-3.4/pysam/__init__.py build/lib.linux-x86_64-3.4/pysam/namedtuple.py build/lib.linux-x86_64-3.4/pysam/Pileup.py build/lib.linux-x86_64-3.4/pysam/version.py build/lib.linux-x86_64-3.4/pysam/include/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/win32/__init__.py build/lib.linux-x86_64-3.4/pysam/include/tabix/__init__.py

Skipping implicit fixer: buffer

Skipping implicit fixer: idioms

Skipping implicit fixer: set_literal

Skipping implicit fixer: ws_comma

Fixing build/lib.linux-x86_64-3.4/pysam/__init__.py build/lib.linux-x86_64-3.4/pysam/namedtuple.py build/lib.linux-x86_64-3.4/pysam/Pileup.py build/lib.linux-x86_64-3.4/pysam/version.py build/lib.linux-x86_64-3.4/pysam/include/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/bcftools/__init__.py build/lib.linux-x86_64-3.4/pysam/include/samtools/win32/__init__.py build/lib.linux-x86_64-3.4/pysam/include/tabix/__init__.py

Skipping implicit fixer: buffer

Skipping implicit fixer: idioms

Skipping implicit fixer: set_literal

Skipping implicit fixer: ws_comma

running build_ext

cythoning pysam/csamtools.pyx to pysam/csamtools.c

building 'pysam.csamtools' extension

creating build/temp.linux-x86_64-3.4

creating build/temp.linux-x86_64-3.4/pysam

creating build/temp.linux-x86_64-3.4/samtools

creating build/temp.linux-x86_64-3.4/samtools/bcftools

creating build/temp.linux-x86_64-3.4/samtools/misc

gcc -pthread -Wno-unused-result -Werror=declaration-after-statement -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE= -Isamtools -Ipysam -I/usr/include/python3.4m -c pysam/csamtools.c -o build/temp.linux-x86_64-3.4/pysam/csamtools.o

In file included from pysam/csamtools.c:351:0:

samtools/bam.h:383:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]

  void *bam_strmap_init();

  ^

samtools/bam.h:398:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]

  bam_header_t *bam_header_init();

  ^

In file included from pysam/csamtools.c:354:0:

pysam/pysam_util.h:22:1: warning: function declaration isn't a prototype [-Wstrict-prototypes]

 void pysam_unset_stderr();

 ^

pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_20fetch':

pysam/csamtools.c:13539:7: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]

       __pyx_t_5 = __Pyx_PyInt_From_int(bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_rtid, __pyx_v_rstart, __pyx_v_rend, ((void *)__pyx_v_callback), __pyx_f_5pysam_9csamtools_fetch_callback)); if (unlikely(!__pyx_t_5)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1071; __pyx_clineno = __LINE__; goto __pyx_L1_error;}

       ^

In file included from pysam/csamtools.c:351:0:

samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'

  int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);

      ^

pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_22mate':

pysam/csamtools.c:14008:3: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]

   bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_read->_delegate->core.mtid, __pyx_v_read->_delegate->core.mpos, (__pyx_v_read->_delegate->core.mpos + 1), ((void *)(&__pyx_v_mate_data)), __pyx_f_5pysam_9csamtools_mate_callback);

   ^

In file included from pysam/csamtools.c:351:0:

samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'

  int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);

      ^

pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_24count':

pysam/csamtools.c:14525:5: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]

     bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_rtid, __pyx_v_rstart, __pyx_v_rend, ((void *)(&__pyx_v_counter)), __pyx_f_5pysam_9csamtools_count_callback);

     ^

In file included from pysam/csamtools.c:351:0:

samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'

  int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);

      ^

pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_7Samfile_26pileup':

pysam/csamtools.c:14979:7: warning: passing argument 7 of 'bam_fetch' from incompatible pointer type [enabled by default]

       bam_fetch(__pyx_v_self->samfile->x.bam, __pyx_v_self->index, __pyx_v_rtid, __pyx_v_rstart, __pyx_v_rend, __pyx_v_buf, __pyx_f_5pysam_9csamtools_pileup_fetch_callback);

       ^

In file included from pysam/csamtools.c:351:0:

samtools/bam.h:644:6: note: expected 'bam_fetch_f' but argument is of type 'int (*)(struct bam1_t *, void *)'

  int bam_fetch(bamFile fp, const bam_index_t *idx, int tid, int beg, int end, void *data, bam_fetch_f func);

      ^

pysam/csamtools.c: In function '__pyx_f_5pysam_9csamtools___advance_snpcalls':

pysam/csamtools.c:22036:7: warning: implicit declaration of function 'bam_prob_realn' [-Wimplicit-function-declaration]

       bam_prob_realn(__pyx_v_b, __pyx_v_d->seq);

       ^

pysam/csamtools.c:22064:7: warning: implicit declaration of function 'bam_cap_mapQ' [-Wimplicit-function-declaration]

       __pyx_v_q = bam_cap_mapQ(__pyx_v_b, __pyx_v_d->seq, __pyx_v_capQ_thres);

       ^

pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_20IteratorColumnRegion_2__next__':

pysam/csamtools.c:23733:5: warning: passing argument 1 of '__pyx_f_5pysam_9csamtools_makePileupProxy' from incompatible pointer type [enabled by default]

     __pyx_t_2 = __pyx_f_5pysam_9csamtools_makePileupProxy((&__pyx_v_self->__pyx_base.plp), __pyx_v_self->__pyx_base.tid, __pyx_v_self->__pyx_base.pos, __pyx_v_self->__pyx_base.n_plp); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2118; __pyx_clineno = __LINE__; goto __pyx_L1_error;}

     ^

pysam/csamtools.c:4177:18: note: expected 'struct bam_pileup1_t **' but argument is of type 'const struct bam_pileup1_t **'

 static PyObject *__pyx_f_5pysam_9csamtools_makePileupProxy(bam_pileup1_t **__pyx_v_plp, int __pyx_v_tid, int __pyx_v_pos, int __pyx_v_n) {

                  ^

pysam/csamtools.c: In function '__pyx_pf_5pysam_9csamtools_21IteratorColumnAllRefs_2__next__':

pysam/csamtools.c:24006:7: warning: passing argument 1 of '__pyx_f_5pysam_9csamtools_makePileupProxy' from incompatible pointer type [enabled by default]

       __pyx_t_2 = __pyx_f_5pysam_9csamtools_makePileupProxy((&__pyx_v_self->__pyx_base.plp), __pyx_v_self->__pyx_base.tid, __pyx_v_self->__pyx_base.pos, __pyx_v_self->__pyx_base.n_plp); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2149; __pyx_clineno = __LINE__; goto __pyx_L1_error;}

       ^

pysam/csamtools.c:4177:18: note: expected 'struct bam_pileup1_t **' but argument is of type 'const struct bam_pileup1_t **'

 static PyObject *__pyx_f_5pysam_9csamtools_makePileupProxy(bam_pileup1_t **__pyx_v_plp, int __pyx_v_tid, int __pyx_v_pos, int __pyx_v_n) {

                  ^

gcc -pthread -Wno-unused-result -Werror=declaration-after-statement -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE= -Isamtools -Ipysam -I/usr/include/python3.4m -c pysam/pysam_util.c -o build/temp.linux-x86_64-3.4/pysam/pysam_util.o

In file included from pysam/pysam_util.c:3:0:

samtools/bam.h:383:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]

  void *bam_strmap_init();

  ^

samtools/bam.h:398:2: warning: function declaration isn't a prototype [-Wstrict-prototypes]

  bam_header_t *bam_header_init();

  ^

In file included from pysam/pysam_util.c:6:0:

samtools/bam_endian.h:6:19: warning: function declaration isn't a prototype [-Wstrict-prototypes]

 static inline int bam_is_big_endian()

                   ^

In file included from pysam/pysam_util.c:8:0:

pysam/pysam_util.h:22:1: warning: function declaration isn't a prototype [-Wstrict-prototypes]

 void pysam_unset_stderr();

 ^

pysam/pysam_util.c:30:6: warning: function declaration isn't a prototype [-Wstrict-prototypes]

 void pysam_unset_stderr()

      ^

In file included from pysam/pysam_util.c:4:0:

samtools/khash.h:168:23: warning: function declaration isn't a prototype [-Wstrict-prototypes]

  SCOPE kh_##name##_t *kh_init_##name() {        \

                       ^

samtools/khash.h:307:2: note: in expansion of macro 'KHASH_INIT2'

  KHASH_INIT2(name, static inline, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)

  ^

samtools/khash.h:495:2: note: in expansion of macro 'KHASH_INIT'

  KHASH_INIT(name, khint32_t, khval_t, 1, kh_int_hash_func, kh_int_hash_equal)

  ^

pysam/pysam_util.c:66:1: note: in expansion of macro 'KHASH_MAP_INIT_INT'

 KHASH_MAP_INIT_INT(i, bam_binlist_t);

 ^

samtools/khash.h:168:23: warning: function declaration isn't a prototype [-Wstrict-prototypes]

  SCOPE kh_##name##_t *kh_init_##name() {        \

                       ^

samtools/khash.h:307:2: note: in expansion of macro 'KHASH_INIT2'

  KHASH_INIT2(name, static inline, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)

  ^

samtools/khash.h:526:2: note: in expansion of macro 'KHASH_INIT'

  KHASH_INIT(name, kh_cstr_t, khval_t, 1, kh_str_hash_func, kh_str_hash_equal)

  ^

pysam/pysam_util.c:67:1: note: in expansion of macro 'KHASH_MAP_INIT_STR'

 KHASH_MAP_INIT_STR(s, int)

 ^

pysam/pysam_util.c:99:19: warning: function declaration isn't a prototype [-Wstrict-prototypes]

 static mempool_t *mp_init()

                   ^

pysam/pysam_util.c: In function 'pysam_pileup_next':

pysam/pysam_util.c:196:8: warning: assignment discards 'const' qualifier from pointer target type [enabled by default]

   *plp = bam_plp_next(buf->iter, tid, pos, n_plp);

        ^

pysam/pysam_util.c: In function 'pysam_dispatch':

pysam/pysam_util.c:340:3: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]

   int retval = 0;

   ^

In file included from pysam/pysam_util.h:4:0,

                 from pysam/pysam_util.c:8:

pysam/pysam_util.c: At top level:

samtools/kseq.h:152:16: warning: 'kseq_init' defined but not used [-Wunused-function]

  SCOPE kseq_t *kseq_init(type_t fd)         \

                ^

samtools/kseq.h:223:2: note: in expansion of macro '__KSEQ_BASIC'

  __KSEQ_BASIC(SCOPE, type_t)     \

  ^

samtools/kseq.h:226:35: note: in expansion of macro 'KSEQ_INIT2'

 #define KSEQ_INIT(type_t, __read) KSEQ_INIT2(static, type_t, __read)

                                   ^

pysam/pysam_util.h:8:1: note: in expansion of macro 'KSEQ_INIT'

 KSEQ_INIT(gzFile, gzread)

 ^

samtools/kseq.h:158:13: warning: 'kseq_destroy' defined but not used [-Wunused-function]

  SCOPE void kseq_destroy(kseq_t *ks)         \

             ^

samtools/kseq.h:223:2: note: in expansion of macro '__KSEQ_BASIC'

  __KSEQ_BASIC(SCOPE, type_t)     \

  ^

samtools/kseq.h:226:35: note: in expansion of macro 'KSEQ_INIT2'

 #define KSEQ_INIT(type_t, __read) KSEQ_INIT2(static, type_t, __read)

                                   ^

pysam/pysam_util.h:8:1: note: in expansion of macro 'KSEQ_INIT'

 KSEQ_INIT(gzFile, gzread)

 ^

samtools/kseq.h:172:12: warning: 'kseq_read' defined but not used [-Wunused-function]

  SCOPE int kseq_read(kseq_t *seq) \

            ^

samtools/kseq.h:224:2: note: in expansion of macro '__KSEQ_READ'

  __KSEQ_READ(SCOPE)

  ^

samtools/kseq.h:226:35: note: in expansion of macro 'KSEQ_INIT2'

 #define KSEQ_INIT(type_t, __read) KSEQ_INIT2(static, type_t, __read)

                                   ^

pysam/pysam_util.h:8:1: note: in expansion of macro 'KSEQ_INIT'

 KSEQ_INIT(gzFile, gzread)

 ^

pysam/pysam_util.c:99:19: warning: 'mp_init' defined but not used [-Wunused-function]

 static mempool_t *mp_init()

                   ^

pysam/pysam_util.c:105:13: warning: 'mp_destroy' defined but not used [-Wunused-function]

 static void mp_destroy(mempool_t *mp)

             ^

cc1: some warnings being treated as errors

warning: pysam/csamtools.pyx:316:55: Unreachable code

warning: pysam/csamtools.pyx:322:32: Unreachable code

warning: pysam/csamtools.pyx:331:32: Unreachable code

warning: pysam/csamtools.pyx:2165:18: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

warning: pysam/csamtools.pyx:2188:18: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

warning: pysam/csamtools.pyx:2327:20: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

warning: pysam/csamtools.pyx:2327:24: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

error: command 'gcc' failed with exit status 1

----------------------------------------
Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_dominik/pysam/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-ktckw0ld-record/install-record.txt --single-version-externally-managed --compile --user failed with error code 1 in /tmp/pip_build_dominik/pysam
Storing debug log for failure in /home/dominik/.pip/pip.log

I think that the lines 141

pysam/pysam_util.c:340:3: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]

and 591

pysam/pysam_util.c:340:3: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]

might give an answer!

Cheers,
Dominik

Edit: I am sorry, I forgot to write that I use Arch Linux, Python 3.4 and gcc 4.8.2!

PileupRead() indel property and deletions

I'm observing instances where .indel = 0 and .is_del = 1. The base upstream of the deletion, is flagged as -1. Is this an error? Based on the documentation, I would expect deletion positions to have .indel = -1 and .is_del = 1.

Rename pysam.Fastqfile to pysam.Fastxfile (or something else)

kseq.h is providing iteration for both file types, or even mixed for that matter, so users should know that they can use this class for fastx file iteration.

The test sequence file (test.seq) delivered with kseq.h (http://lh3lh3.users.sourceforge.net/kseq.shtml):

>1
acgtacgtacgtagc
>2 test
acgatcgatc
@3 test2
cgctagcatagc
cgatatgactta
+
78wo82usd980
d88fau

238ud8

Output:

In [1]: import pysam

In [2]: pysam.__version__
Out[2]: '0.8.1'

In [3]: for record in pysam.FastqFile("test.seq"):
    print record.name, record.sequence, record.comment, record.quality
   ...:     
1 acgtacgtacgtagc None None
2 acgatcgatc test None
3 cgctagcatagccgatatgactta test2 78wo82usd980d88fau238ud8

errors about the installation of pysam

I have downloaded the pysam-master.zip, and then run the following commands: unzip pysam-master.zip; cd pysam-master; python setup.py build, but some errors occured. The terminal outputted the following information:
Traceback (most recent call last):
File "setup.py", line 208, in
from ez_setup import use_setuptools
ImportError: No module named ez_setup
I thought it was caused by the loss of the module ez_setup, so I downloaded ez_setup-0.9 and run the command "python ez_setup.py", and the terminal didn't output any apparent errors. After this, I tried to install python by running 'python setup.py build' again, but the same errors occured.(ImportError: No module named ez_setup) In this case, what should I do in order to install pysam successfully?
Thanks!

python3 VCF connect to tabix

Hi,

I'm getting an error when trying to open a tabix-indexed VCF in python3. I tabixed an example VCF:

cd pysam/tests/vcf-examples
bgzip 10.vcf 
tabix -p vcf 10.vcf.gz

I tried to connect to the tabix file in python3:

python3
import pysam
tabix_filename = "10.vcf.gz"
vcf = pysam.VCF()
vcf.connect(tabix_filename)

I get an error message:

  File "cvcf.pyx", line 1022, in pysam.cvcf.VCF.connect (pysam/cvcf.c:23388)
  File "cvcf.pyx", line 893, in pysam.cvcf.VCF._parse_header (pysam/cvcf.c:20458)
TypeError: startswith first arg must be bytes or a tuple of bytes, not str

I'm using pysam 0.7.7 and python 3.3. Any help would be appreciated.

Thanks,
Jeremy

Example in the pysam/doc/faq.rst fails with AttributeError

Hello good people.
I cannot thank you enough for spending your time to develop this amazing tool.

I tested the following code snippet that I found in your FAQ doc (currently lines 101-108 in pysam/doc/faq.rst). I am using python3.2:

i = pysam.Samfile(bam_fp, "rb" ).pileup( 'chr1', 1000, 1010)
p = i.next()
for pp in p.pileups:
    print( pp )
## -- End pasted text --
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/newb/sandbox/<ipython-input-4-1fa3345011d0> in <module>()
      1 i = pysam.Samfile(bam_fp, "rb" ).pileup( 'chr1', 1000, 1010)
----> 2 p = i.next()
      3 for pp in p.pileups:
      4     print( pp )
AttributeError: 'pysam.csamtools.IteratorColumnRegion' object has no attribute 'next'

I was disappointed when I got this error because I would like to access the next() method of an IteratorRowAll object and I thought this FAQ code snippet may help figure how to explicitly control the iteration over all the reads in my bam file.

cheers,
Lee

iterator bug in Pysam

Hi,

I ran into what looks like a bug related to iterators in Pysam. Calling fetch appears to throw off original Samfile iterator. Example code:

File Edit Options Buffers Tools Python Outline Help                                                                                
import pysam

# test 1: correct result                                                                                                           
bam = pysam.Samfile("./pysam.bam", "rb")
other_reads = bam.fetch(reference="chrRibo", start=None, end=None)
n = 0
for r in bam:
    n += 1
print "First: Got %d reads" %(n)

# test 2: incorrect result                                                                                                         
bam = pysam.Samfile("./accepted_hits.bam", "rb")
other_reads = bam.fetch(reference="chrRibo", start=None, end=None)
# Now we iterate through fetched reads                                                                                             
for x in other_reads:
    pass
n = 0
for r in bam:
    n += 1
print "Second time: Got %d reads" %(n)

Example file is:

BAM: http://genes.mit.edu/burgelab/yarden/pysam.bam
BAM index: http://genes.mit.edu/burgelab/yarden/pysam.bam.bai

This gives:

$ python pysam.py
First: Got 2112633 reads
Second time: Got 62089 reads

It seems like iterating through the results of a separate fetch call shouldn't affect original iterator. Even if it does, not sure how to interpret the numbers of reads that are left in the file.

Is this a bug? if so, how can this be fixed?

I am using pysam '0.8.1'.

Update: I believe setting multiple_iterators=True fixes this, but I found it quite counterintuitive. Maybe it would be easier to have multiple_iterators=True by default?

Thanks!

samtools cannot read files written with pysam

Hi there,

when I write a bam file with pysam and try to read it with samtools afterwards, I get an error like this:

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "test.bam".

As an example I used this code: http://pysam.readthedocs.org/en/latest/usage.html#creating-sam-bam-files-from-scratch

I am not sure whether this is a pysam or a samtools problem, but it happens with samtools versions 0.1.16, 0.1.18 and 0.1.19. Pysam is at version 0.8.0.

Thanks

Skip test requiring network access when it is not available.

Dear Andreas,

for the Debian package of Pysam, we need to skip tests requiring network, since it is unavailable by design in some of our build farm. Would you like to provide such an offline facility in Pysam ?

Here is how I would do on one file (inspired by bits and pieces saw on StackOverflow).

--- a/tests/pysam_test.py
+++ b/tests/pysam_test.py
@@ -7,6 +7,7 @@ and data files located there.

 import pysam
 import unittest
+import urllib2
 import os
 import shutil
 import sys
@@ -21,6 +22,12 @@ SAMTOOLS = "samtools"
 WORKDIR = "pysam_test_work"
 DATADIR = "pysam_data"

+def internet_on():
+    try:
+        response=urllib2.urlopen('http://ftp.1000genomes.ebi.ac.uk',timeout=1)
+        return True
+    except urllib2.URLError as err: pass
+    return False

 class BasicTestBAMFetch(unittest.TestCase):

@@ -973,6 +980,8 @@ class TestHeaderFromRefs(unittest.TestCase):

 class TestHeader1000Genomes(unittest.TestCase):
     '''see issue 110'''
+    if internet_on() == False:
+        self.skipTest('Internet access required for this test')
     # bamfile = "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/phase2b_alignment/data/NA07048/exome_alignment/NA07048.unmapped.ILLUMINA.bwa.CEU.exome.20120522_p2b.bam"
     bamfile = "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/phase3_EX_or_LC_only_alignment/data/HG00104/alignment/HG00104.chrom11.ILLUMINA.bwa.GBR.low_coverage.20130415.bam"

However, multiple files have tests requiring the network. Is there a better solution than cut-and-paste?

Have a nice day,

Charles

pysam Samfile gives error reading bamfile

pysam Samfile give the following error when trying to read bamfile

Traceback (most recent call last):
File "sam_flanker.py", line 10, in
bam_file = pysam.Samfile(input_file, "rb")
File "csamfile.pyx", line 314, in pysam.csamfile.Samfile.cinit (pysam/csamfile.c:4820)
File "csamfile.pyx", line 491, in pysam.csamfile.Samfile._open (pysam/csamfile.c:6730)
ValueError: file header is empty (mode='rb') - is it SAM/BAM format?

Dynamic linking to htslib ?

Dear Pysam developers,

now that the HTSlib has been released by its authors as version 1.0, I wonder if Pysam could be linked dynamically to it instead of being built on a embedded code copy.

For Linux distributions like Debian, that place a strong emphasis on the reduction of code duplication, to better propagate fixes for bugs and security issues, the possibility of linking to dynamically to the HTSlib, which we distribute, would be very useful.

Have a nice day,

Charles Plessy
Debian Med packaging team
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan

pysam.is_unmapped doesn't work

Hello,

When I try to filter a 5.3 GB coordinate sorted BAM file for unmapped reads, I end up with only 27 reads
in the resulting file.

bam = pysam.Samfile("mybam.bam", "rb")

for i in bam.fetch():
if i.is_unmapped:
print(i)

or

for i in bam.fetch():
if i.flag == 4:
print(i)

doesn't work as expected.

I have a small (238 KB) test file that I could provide.

Sincerely,
Kemal

property AlignedRead.pos updates bin using old pos value

The set function for property 'pos' of AlignedRead recalculates the bin number for the record when the pos value is updated. However, the bin-number is calculated using the OLD value, resulting in a wrong bin number unless you set pos to the same value twice in a row:

    def __set__(self, pos):
        ## setting the cigar string also updates the "bin" attribute
        cdef bam1_t * src
        src = self._delegate
        if src.core.n_cigar:
            src.core.bin = bam_reg2bin( src.core.pos, bam_calend( &src.core, bam1_cigar(src)) )
        else:
            src.core.bin = bam_reg2bin( src.core.pos, src.core.pos + 1)
        self._delegate.core.pos = pos

Cheers

pysam consumes too much memory

Hello,
I have been working with pysam and lately I found a very weird case. I have a set of 96 sorted bam files with their bai. In a moment of my code I am trying to get the number of uniquely mapped reads using
stats = pysam.flagstat(filename)
mapped_reads = int(stats[2].split()[0])

but it takes incredibly amount of times and start using the whole system memory ( 64Gb ).

Is that normal?, or there is any kind of problem in my data or program?

Thanks

pysam.index segfaults

Using pysam.index on a sorted bamfile will segfault:
Example code:

import pysam
pysam.index("file.bam", "file.bam.bai")

This fails with pysam 0.8.1 under python 2.7, 3.2 and 3.4 (at least - I don't have more versions readily available to test), but succedes under pysam 0.8.0, so this looks like a regression to me.
Sorry, I can't be of further assistance. I just can't seem to figure out where the error is coming from....

Example output from running the above code:

[1]    19235 abort (core dumped)  python3 pysam_index.py

Lack of stable name sort or incorrect logic

When I sort by query name using pysam and there are redundant read pair names in a file the sort mixes the order of the pairs, not reporting the correct order. Samtools sort does not have this problem. I am not sure if the sort is not stable or the sort logic is different than that of samtools.

AlignedRead.inferred_length is incorrect

Hello good people,

The documentation defines the inferred_length attribute of the AlignedRead class as:
"inferred read length from CIGAR string."
http://pysam.readthedocs.org/en/latest/api.html#pysam.AlignedRead.inferred_length

However, I found that this is not the case. In the example that I show below, the sum of CIGAR operation lengths do not equal the number reported by inferred_length.

example_d = {'INFERRED_LENGTH': 24, 
             'QLEN': 24, 
             'ALEN': 25,
             'SEQ': b'GTCCATCTAAACTCCCCAATGCCT', 
             'OPTDICT': [('XT', 'U'), ('NM', 1), ('X0', 1), ('X1', 0), ('XM', 0), ('XO', 1), ('XG', 1), ('MD', '8^A16')], 
             'CIGARSTRING': '8M1D16M', 
             'CIGAR': [(0, 8), (2, 1), (0, 16)]}

#here, I will calculate the correct answer
correct_inferred_length = 0
for operation, length in example_d['CIGAR']:
    correct_inferred_length += length
print(correct_inferred_length)
#this prints 25

As you can see, the length inferred from the CIGAR field is 25. I guess I may be misinterpreting the purpose of the inferred_length field. Is it supposed to be the same as ALEN?

If the inferred_length field is supposed to be the length of the aligned region then I think the following code could be used. Note that insertion operations should be excluded from the calculation.

length_sum = 0
for operation, length in example_d['CIGAR']:
    if operation != 1:
        length_sum += length
    else:
        pass

Thank you for your time.
Sincerely,
Lee

Please tag commits with releases

Subject says what's requested. If the pysam maintainers aren't familiar with tagging, the Git book offers a nice primer.

Always make it easy to identify which commit was used to generate which release. Your users will appreciate it, and in time, you, the maintainers, will appreciate it, too.

Cannot add a new tag

Hello,

When I'm trying to add a new tag to aligned read, it creates an error.
To add the tag I used:
Read.tags = Read.tags + [("RG",0)]
exactly as it suggested in manual. I'm getting an error message:

Read.tags = Read.tags + [("RG",0)]
File "csamtools.pyx", line 2840, in pysam.csamtools.AlignedRead.tags.set (pysam/csamtools.c:28104)
struct.error: byte format requires -128 <= number <= 127

What I'm doing wrong?
Thanks.

Alex

Setting sequence to empty when tag is defined overwrites tag

From Jeff Hussmann:

Creating an empty AlignedRead, setting a tag value, setting the seq to '', then setting the qual to '' causes 0xff to overwrite the first character of the tag. If this isn't too niche of a bug, I am happy to implement a fix.

pysam flag type incorrect

Hello. I was running into issues using the flag of AlignedRead. htslib (in pysam/htslib_util.h) sets the flag as uint16_t while pysam (in pysam/calignmentfile.pxd) sets it as uint8_t. I ran into issues when the aligned read wasn't large enough for the flag I was trying to assign a read. Unless I'm missing something, the flag should be uint16_t.

Tweaking line 40 in calignmentfile.pxd is a partial fix (it's simple, but I can provide a patch if desired). Upon compiling, pysam/calignment.c:25755 also needs to be tweaked and I'm not sure where that comes from in the source code.

This occurred after pulling the most recent version from github, python 2.7.5, on a fedora machine.

Broken __str__ method under Python 3 if seq/qual is None

Sample script:

import sys
import pysam

print("Using Pysam version %s" % pysam.__version__)

for filename in sys.argv[1:]:
    if filename.endswith(".sam"):
        print("Reading SAM file %s" % filename)
        iterator = pysam.Samfile(filename, "r")
    elif filename.endswith(".bam"):
        print("Reading BAM file %s" % filename)
        iterator = pysam.Samfile(filename, "rb")
    else:
        print("Ignoring %s" % filename)
        continue
    count = 0
    for read in iterator:
        s = str(read)
        count += 1
    print("%i reads" % count)
print("Done")

Sample data generated as per script referenced in peterjc/biopython@7da8484 (test data from some experimental code), just use wget under Linux:

$ curl -O https://raw.githubusercontent.com/peterjc/biopython/7da84849b9438034c72d645691fae1fca2e2ff9b/Tests/SamBam/bins.sam
$ curl -O https://raw.githubusercontent.com/peterjc/biopython/7da84849b9438034c72d645691fae1fca2e2ff9b/Tests/SamBam/bins.bam

Sample usage - works under Python 2.6 and 2.7 using pysam 0.8 and the current code on GitHub with this SAM file:

$ python2.6 pysam_bug.py bins.sam
Using Pysam version 0.8.0
Reading SAM file bins.sam
213075 reads
Done

The BAM version works too:

$ python2.6 pysam_bug.py bins.bam
Using Pysam version 0.8.0
Reading BAM file bins.bam
213075 reads
Done

However, under Python 3.3 we get this instead:

$ python3.3 pysam_bug.py bins.sam
Using Pysam version 0.8.0
Reading SAM file bins.sam
Traceback (most recent call last):
  File "pysam_bug.py", line 18, in <module>
    s = str(read)
  File "csamfile.pyx", line 2104, in pysam.csamfile.AlignedRead.__str__ (pysam/csamfile.c:22282)
AttributeError: 'NoneType' object has no attribute 'decode'

and:

$ python3.3 pysam_bug.py bins.bam
Using Pysam version 0.8.0
Reading BAM file bins.bam
Traceback (most recent call last):
  File "pysam_bug.py", line 18, in <module>
    s = str(read)
  File "csamfile.pyx", line 2104, in pysam.csamfile.AlignedRead.__str__ (pysam/csamfile.c:22282)
AttributeError: 'NoneType' object has no attribute 'decode'

From some debugging, pysam breaks if either SEQ or QUAL is None, patch to follow.

possible typo in calignmentfile.pyx

Hi,

Shouldn't this line refer to nsegments instead of nsegmentEs?:

self.nsegmentes))) +\

After fixing that line I still get the following error when printing any PileupColumn object, not sure if this is a general problem:

'pysam.calignmentfile.PileupRead' object has no attribute '_is_refskip'

pysam.Tabixfile.fetch() iterator does not function independently

Assume we have a pysam Tabixfile object and two iterators initialized as follows:

file = pysam.Tabixfile("file.bed")
iterator1 = file.fetch(parser=pysam.asTuple())
iterator2 = file.fetch(parser=pysam.asTuple())

If iterator1.next() is called then the iterator for iterator2 will also move to the next element and vice versa.

Checking if a bam file is empty

Hello

Is there a way to use pysam to check if a bam file is empty or not ? what would be the best way of doing it if possible ?

Thanks

Rad

Uncompressed BAM output from pysam v0.8 is not readable by SAMTools v0.1.19

When using the following script:

#!/usr/bin/env python
import sys
import pysam
with pysam.Samfile(sys.argv[1]) as input_handle:
    with pysam.Samfile(sys.argv[2], "wbu", template=input_handle) as output_handle:
        for record in input_handle:
            output_handle.write(record)

and Pysam v0.7.8, both SAMTools 0.1.19 and 1.x is able to read the files. However, when using Pysam 0.8.0, only SAMTools v1.x is able to read the file, while SAMTools v0.1.19 produces the follow output:

$ python example.py input.bam output.bam
$ samtools0.1.19 view output.bam
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
[main_samview] fail to read the header from "out.bam".

This is in contrast with uncompressed output from SAMTools v1.x, which is readable by SAMTools v0.1.19.

$ samtools1.1 view -bu input.bam > output.bam
$ samtools0.1.19 view output.bam
[OK]

Similarly, the uncompressed output from Pysam 0.8.0 is not readable by Pysam 0.7.x either.

Unable to install (gcc error)

Using Python 3.4.1 official full release
Using OSX 10.6.8 with pending software updates
Using current pysam source MASTER

end-lazar-010:pysam-master kaestnerlab$ python3 setup.py install
running install
running bdist_egg
running egg_info
writing dependency_links to pysam.egg-info/dependency_links.txt
writing top-level names to pysam.egg-info/top_level.txt
writing pysam.egg-info/PKG-INFO
writing requirements to pysam.egg-info/requires.txt
reading manifest file 'pysam.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'distribute_setup.py'
warning: no files found matching 'pysam/csamtools.c'
warning: no files found matching 'pysam/ctabix.c'
warning: no files found matching 'pysam/TabProxies.c'
warning: no files found matching 'pysam/cvcf.c'
warning: no files found matching 'tests/Makefile'
warning: no files found matching 'tests/ex1.fa'
warning: no files found matching 'tests/ex1.sam.gz'
warning: no files found matching 'tests/ex3.sam'
warning: no files found matching 'tests/ex4.sam'
warning: no files found matching 'tests/ex5.sam'
warning: no files found matching 'tests/ex6.sam'
warning: no files found matching 'tests/ex7.sam'
warning: no files found matching 'tests/ex8.sam'
warning: no files found matching 'tests/ex9_fail.bam'
warning: no files found matching 'tests/ex9_nofail.bam'
warning: no files found matching 'tests/ex10.sam'
warning: no files found matching 'tests/example.py'
warning: no files found matching 'tests/segfault_tests.py'
warning: no files found matching 'tests/example__.sam'
warning: no files found matching 'tests/example_btag.bam'
warning: no files found matching 'tests/tag_bug.bam'
warning: no files found matching 'tests/example.vcf40'
warning: no files found matching 'tests/example_empty_header.bam'
warning: no files found matching 'tests/test_unaligned.bam'
warning: no files found matching 'tests/issue100.bam'
warning: no files found matching 'tests/example.gtf.gz'
warning: no files found matching 'tests/example.gtf.gz.tbi'
warning: no files found matching 'tests/example.bed.gz'
warning: no files found matching 'tests/example.bed.gz.tbi'
warning: no files found matching 'tests/vcf-examples/_.vcf'
writing manifest file 'pysam.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.6-intel/egg
running install_lib
running build_py
running build_ext
building 'pysam.csamtools' extension
gcc-4.2 -fno-strict-aliasing -Werror=declaration-after-statement -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -arch i386 -arch x86_64 -isysroot /Developer/SDKs/MacOSX10.6.sdk -g -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE= -Isamtools -Ipysam -I/Library/Frameworks/Python.framework/Versions/3.4/include/python3.4m -c pysam/csamtools.c -o build/temp.macosx-10.6-intel-3.4/pysam/csamtools.o -Wno-error=declaration-after-statement
i686-apple-darwin10-gcc-4.2.1: pysam/csamtools.c: No such file or directory
i686-apple-darwin10-gcc-4.2.1: no input files
i686-apple-darwin10-gcc-4.2.1: pysam/csamtools.c: No such file or directory
i686-apple-darwin10-gcc-4.2.1: no input files
lipo: can't figure out the architecture type of: /var/folders/ZK/ZKFULcHKGyCo3EbzubUDak+++TY/-Tmp-//ccgxhFTh.out
error: command 'gcc-4.2' failed with exit status 1

Using pip3: (showing only the last line, displayed on terminal window in RED)
Command /Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4 -c "import setuptools, tokenize;file='/private/var/folders/ZK/ZKFULcHKGyCo3EbzubUDak+++TY/-Tmp-/pip_build_kaestnerlab/pysam/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /var/folders/ZK/ZKFULcHKGyCo3EbzubUDak+++TY/-Tmp-/pip-f5omggeu-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /private/var/folders/ZK/ZKFULcHKGyCo3EbzubUDak+++TY/-Tmp-/pip_build_kaestnerlab/pysam
Storing debug log for failure in /Users/kaestnerlab/.pip/pip.log

https support

pysam supports http and ftp but there is no https support. Is there an easy way to load files over https? Is something like this even on the roadmap?

pileupread.qpos seems to be broken in latest git.

As the title says.
Here is a test script:

#!/usr/bin/python3

import pysam


def BAMtest(bamfile_name):

    # Set bamfile 'settings'
    bamfile = pysam.Samfile(bamfile_name, 'rb')

    for refs in bamfile.references:
        print(refs)
        for pileupcolumn in bamfile.pileup(refs):
            for pileupread in pileupcolumn.pileups:
                print(pileupread.qpos)
                print(str(pileupread.qpos))


def RunModule(bamfile_name):
    """Run the module."""
    BAMtest(bamfile_name)

if __name__ == "__main__":
    from sys import argv
    RunModule(argv[1])

I'll try to find out what caused it and fix it if I can.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.