flo-compbio / pyaffy Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 3.0 107 KB

pyAffy: Processing raw data from Affymetrix expression microarrays in Python.

License: GNU General Public License v3.0

Python 9.94% HTML 90.06%

pyaffy's People

Stargazers

Watchers

Forkers

maximz ruxi numpde kamilturan

pyaffy's Issues

Estimation of `mu` parameter sometimes fails

For arrays with "extreme" intensity ranges (much smaller or much larger than those typical for the samples in the MAQC study, the current estimation procedure for the mu parameter (with hard-coded histogram range of 0-500 and bin size of 4.0) can fail.

According to my tests, this is simple to fix, by calculating a histogram for the range between the minimum intensity value and the 75%-ile, and using 100 equal-sized bins.

Make alpha background parameter value configurable

It seems like there are situations where users would benefit from the ability to tweak the value of the alpha parameter used in RMA background correction (default = 0.03).

Include link to toy example in README

Tests?

Do you have any tests, or examples with data?
I have data that can be made public is that is a problem.
(I am the guy that asked the 6-year-old StackOverflow question)

File handle leak

There's an error in parse_celfile_v4() that causes it to leak one file handle for every file that gets read. Eventually this causes it to crash. To fix it, add this line to the finally block.

fclose(fp)

Implement a Cython-based parser for the Command Console CEL format

This feature is currently missing.

See: http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html#calvin

python3 compatible? requirements?

Is this a python 3 compatible package?

Got an error: missing module urllib2.

Not sure if its an one-off thing that can be fixed by adding the lib to requirements.txt, or if this package was developed from the ground up in python 2.

Issue with installation of pyaffy

@flo-compbio I am trying to install pyaffy on python 3.7 and there is an issue with the compatible version of scipy. Could you please let me know which version of python and scipy is required for installing pyaffy?

File "pyaffy/cdfparser.pyx", line 193, in pyaffy.cdfparser.parse_cdf AssertionError

While trying to run this code:
from pyaffy import rma
from collections import OrderedDict

from os import listdir
from os.path import isfile, join
my_cel_files = [f for f in listdir('GSE14245_RAW') if isfile(join('GSE14245_RAW', f))]
sample_cel_files = OrderedDict([
('Sample %d' % (i+1), 'GSE14245_RAW/'+path) for i, path in enumerate(my_cel_files)
])
cdf_file = 'HG-U133_Plus_2.cdf'
genes, samples, X = rma(cdf_file, sample_cel_files)

I got the following error message:
Traceback (most recent call last):
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1741, in
main()
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/gihad/PycharmProjects/cancerproject/data.py", line 13, in
genes, samples, X = rma(cdf_file, sample_cel_files)
File "/home/gihad/PycharmProjects/cancerproject/venv/lib/python3.6/site-packages/pyaffy/process.py", line 118, in rma
parse_cdf(cdf_file, probe_type=probe_type)
File "pyaffy/cdfparser.pyx", line 193, in pyaffy.cdfparser.parse_cdf
AssertionError

Source of CDF files (Brainarray website)
The pyAffy preprint (see https://peerj.com/preprints/1790/)

Dependency not declared in setup.py (configparser)

The configparser package (a backport of Python 3's configparser module to Python 2), used in celparser.pyx, is not declared as a dependency in setup.py.

Seems like CEL files must be gzipped beforehand? Why is that?

[2017-08-17 15:28:53] INFO: Parsing CDF file.
[2017-08-17 15:28:56] INFO: CDF file parsing time: 3.70 s
[2017-08-17 15:28:56] INFO: CDF array design name: b'HTA-2_0.r1.gene'
[2017-08-17 15:28:56] INFO: CDF rows / columns: 2572 x 2680
[2017-08-17 15:28:56] INFO: Parsing CEL files...
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-17-afc2e0487185> in <module>()
      3 cdf_file = "/Users/alex/Desktop/KVH/data/CELfiles/HTA-2_0.r1.gene.cdf"
      4 
----> 5 genes, samples, X = rma(cdf_file, sample_cel_files)

~/Desktop/KVH/venv/lib/python3.6/site-packages/pyaffy/process.py in rma(cdf_file, sample_cel_files, pm_probes_only, bg_correct, quantile_normalize, medianpolish)
    139         logger.debug('Parsing CEL file for sample "%s": %s', sample, cel_file)
    140         samples.append(sample)
--> 141         y = parse_cel(cel_file)
    142         Y[:,j] = y[pm_sel]
    143     sub_logger.setLevel(logging.NOTSET)

pyaffy/celparser.pyx in pyaffy.celparser.parse_cel()

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in read(self, size)
    274             import errno
    275             raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 276         return self._buffer.read(size)
    277 
    278     def read1(self, size=-1):

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compression.py in readinto(self, b)
     66     def readinto(self, b):
     67         with memoryview(b) as view, view.cast("B") as byte_view:
---> 68             data = self.read(len(byte_view))
     69             byte_view[:len(data)] = data
     70         return len(data)

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in read(self, size)
    461                 # jump to the next member, if there is one.
    462                 self._init_read()
--> 463                 if not self._read_gzip_header():
    464                     self._size = self._pos
    465                     return b""

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in _read_gzip_header(self)
    409 
    410         if magic != b'\037\213':
--> 411             raise OSError('Not a gzipped file (%r)' % magic)
    412 
    413         (method, flag,

OSError: Not a gzipped file (b'\x00\x00')

Error caused by inconsistent bytes/str

When I try to use pyAffy on Python 3.6, it fails with this error:

  File "/home/peastman/pyaffy/pyaffy/process.py", line 141, in rma
    y = parse_cel(cel_file)
  File "pyaffy/celparser.pyx", line 696, in pyaffy.celparser.parse_cel
  File "pyaffy/celparser.pyx", line 614, in pyaffy.celparser.parse_celfile_cc
  File "pyaffy/celparser.pyx", line 523, in pyaffy.celparser.parse_celfile_cc.read_data_header
  File "pyaffy/celparser.pyx", line 490, in pyaffy.celparser.parse_celfile_cc.read_header_param
TypeError: a bytes-like object is required, not 'str'

The error is caused by this line in read_header_param():

 v2 = decode_unicode(v2.rstrip('\x00'))

bytes.rstrip() requires its argument to be a bytes, not a str. The line should be changed to

 v2 = decode_unicode(v2.rstrip(b'\x00'))

and likewise in line 492.

flo-compbio / pyaffy Goto Github PK

pyaffy's People

Stargazers

Watchers

Forkers

pyaffy's Issues

Recommend Projects

Recommend Topics

Recommend Org