flo-compbio / pyaffy Goto Github PK
View Code? Open in Web Editor NEWpyAffy: Processing raw data from Affymetrix expression microarrays in Python.
License: GNU General Public License v3.0
pyAffy: Processing raw data from Affymetrix expression microarrays in Python.
License: GNU General Public License v3.0
For arrays with "extreme" intensity ranges (much smaller or much larger than those typical for the samples in the MAQC study, the current estimation procedure for the mu
parameter (with hard-coded histogram range of 0-500 and bin size of 4.0) can fail.
According to my tests, this is simple to fix, by calculating a histogram for the range between the minimum intensity value and the 75%-ile, and using 100 equal-sized bins.
It seems like there are situations where users would benefit from the ability to tweak the value of the alpha parameter used in RMA background correction (default = 0.03).
Do you have any tests, or examples with data?
I have data that can be made public is that is a problem.
(I am the guy that asked the 6-year-old StackOverflow question)
There's an error in parse_celfile_v4()
that causes it to leak one file handle for every file that gets read. Eventually this causes it to crash. To fix it, add this line to the finally
block.
fclose(fp)
This feature is currently missing.
See: http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html#calvin
Is this a python 3 compatible package?
Got an error: missing module urllib2.
Not sure if its an one-off thing that can be fixed by adding the lib to requirements.txt, or if this package was developed from the ground up in python 2.
@flo-compbio I am trying to install pyaffy on python 3.7 and there is an issue with the compatible version of scipy. Could you please let me know which version of python and scipy is required for installing pyaffy?
While trying to run this code:
from pyaffy import rma
from collections import OrderedDict
from os import listdir
from os.path import isfile, join
my_cel_files = [f for f in listdir('GSE14245_RAW') if isfile(join('GSE14245_RAW', f))]
sample_cel_files = OrderedDict([
('Sample %d' % (i+1), 'GSE14245_RAW/'+path) for i, path in enumerate(my_cel_files)
])
cdf_file = 'HG-U133_Plus_2.cdf'
genes, samples, X = rma(cdf_file, sample_cel_files)
I got the following error message:
Traceback (most recent call last):
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1741, in
main()
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/gihad/PycharmProjects/cancerproject/data.py", line 13, in
genes, samples, X = rma(cdf_file, sample_cel_files)
File "/home/gihad/PycharmProjects/cancerproject/venv/lib/python3.6/site-packages/pyaffy/process.py", line 118, in rma
parse_cdf(cdf_file, probe_type=probe_type)
File "pyaffy/cdfparser.pyx", line 193, in pyaffy.cdfparser.parse_cdf
AssertionError
there is no pypackage website named cdfparser.
where should i go?
See CEL "Version 4" format specification here:
http://media.affymetrix.com/support/developer/powertools/changelog/gcos-agcc/cel.html#V4
Apparently, CEL Version 4 fields of type "char" can files contain single-byte non-ASCII characters, suggesting that the ISO-8859-1 encoding is used. In one particular case, I encountered a no-breaking space (byte value 160).
Currently, README fails to mention:
The configparser package (a backport of Python 3's configparser module to Python 2), used in celparser.pyx, is not declared as a dependency in setup.py.
[2017-08-17 15:28:53] INFO: Parsing CDF file.
[2017-08-17 15:28:56] INFO: CDF file parsing time: 3.70 s
[2017-08-17 15:28:56] INFO: CDF array design name: b'HTA-2_0.r1.gene'
[2017-08-17 15:28:56] INFO: CDF rows / columns: 2572 x 2680
[2017-08-17 15:28:56] INFO: Parsing CEL files...
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-17-afc2e0487185> in <module>()
3 cdf_file = "/Users/alex/Desktop/KVH/data/CELfiles/HTA-2_0.r1.gene.cdf"
4
----> 5 genes, samples, X = rma(cdf_file, sample_cel_files)
~/Desktop/KVH/venv/lib/python3.6/site-packages/pyaffy/process.py in rma(cdf_file, sample_cel_files, pm_probes_only, bg_correct, quantile_normalize, medianpolish)
139 logger.debug('Parsing CEL file for sample "%s": %s', sample, cel_file)
140 samples.append(sample)
--> 141 y = parse_cel(cel_file)
142 Y[:,j] = y[pm_sel]
143 sub_logger.setLevel(logging.NOTSET)
pyaffy/celparser.pyx in pyaffy.celparser.parse_cel()
/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in read(self, size)
274 import errno
275 raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 276 return self._buffer.read(size)
277
278 def read1(self, size=-1):
/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compression.py in readinto(self, b)
66 def readinto(self, b):
67 with memoryview(b) as view, view.cast("B") as byte_view:
---> 68 data = self.read(len(byte_view))
69 byte_view[:len(data)] = data
70 return len(data)
/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in read(self, size)
461 # jump to the next member, if there is one.
462 self._init_read()
--> 463 if not self._read_gzip_header():
464 self._size = self._pos
465 return b""
/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in _read_gzip_header(self)
409
410 if magic != b'\037\213':
--> 411 raise OSError('Not a gzipped file (%r)' % magic)
412
413 (method, flag,
OSError: Not a gzipped file (b'\x00\x00')
When I try to use pyAffy on Python 3.6, it fails with this error:
File "/home/peastman/pyaffy/pyaffy/process.py", line 141, in rma
y = parse_cel(cel_file)
File "pyaffy/celparser.pyx", line 696, in pyaffy.celparser.parse_cel
File "pyaffy/celparser.pyx", line 614, in pyaffy.celparser.parse_celfile_cc
File "pyaffy/celparser.pyx", line 523, in pyaffy.celparser.parse_celfile_cc.read_data_header
File "pyaffy/celparser.pyx", line 490, in pyaffy.celparser.parse_celfile_cc.read_header_param
TypeError: a bytes-like object is required, not 'str'
The error is caused by this line in read_header_param()
:
v2 = decode_unicode(v2.rstrip('\x00'))
bytes.rstrip()
requires its argument to be a bytes
, not a str
. The line should be changed to
v2 = decode_unicode(v2.rstrip(b'\x00'))
and likewise in line 492.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.