Giter Club home page Giter Club logo

pyaffy's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyaffy's Issues

Estimation of `mu` parameter sometimes fails

For arrays with "extreme" intensity ranges (much smaller or much larger than those typical for the samples in the MAQC study, the current estimation procedure for the mu parameter (with hard-coded histogram range of 0-500 and bin size of 4.0) can fail.

According to my tests, this is simple to fix, by calculating a histogram for the range between the minimum intensity value and the 75%-ile, and using 100 equal-sized bins.

Tests?

Do you have any tests, or examples with data?
I have data that can be made public is that is a problem.
(I am the guy that asked the 6-year-old StackOverflow question)

File handle leak

There's an error in parse_celfile_v4() that causes it to leak one file handle for every file that gets read. Eventually this causes it to crash. To fix it, add this line to the finally block.

fclose(fp)

python3 compatible? requirements?

Is this a python 3 compatible package?

Got an error: missing module urllib2.

Not sure if its an one-off thing that can be fixed by adding the lib to requirements.txt, or if this package was developed from the ground up in python 2.

Issue with installation of pyaffy

@flo-compbio I am trying to install pyaffy on python 3.7 and there is an issue with the compatible version of scipy. Could you please let me know which version of python and scipy is required for installing pyaffy?

File "pyaffy/cdfparser.pyx", line 193, in pyaffy.cdfparser.parse_cdf AssertionError

While trying to run this code:
from pyaffy import rma
from collections import OrderedDict

from os import listdir
from os.path import isfile, join
my_cel_files = [f for f in listdir('GSE14245_RAW') if isfile(join('GSE14245_RAW', f))]
sample_cel_files = OrderedDict([
('Sample %d' % (i+1), 'GSE14245_RAW/'+path) for i, path in enumerate(my_cel_files)
])
cdf_file = 'HG-U133_Plus_2.cdf'
genes, samples, X = rma(cdf_file, sample_cel_files)

I got the following error message:
Traceback (most recent call last):
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1741, in
main()
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/gihad/Downloads/pycharm-professional-191.6014.12/pycharm-191.6014.12/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/gihad/PycharmProjects/cancerproject/data.py", line 13, in
genes, samples, X = rma(cdf_file, sample_cel_files)
File "/home/gihad/PycharmProjects/cancerproject/venv/lib/python3.6/site-packages/pyaffy/process.py", line 118, in rma
parse_cdf(cdf_file, probe_type=probe_type)
File "pyaffy/cdfparser.pyx", line 193, in pyaffy.cdfparser.parse_cdf
AssertionError

Seems like CEL files must be gzipped beforehand? Why is that?

[2017-08-17 15:28:53] INFO: Parsing CDF file.
[2017-08-17 15:28:56] INFO: CDF file parsing time: 3.70 s
[2017-08-17 15:28:56] INFO: CDF array design name: b'HTA-2_0.r1.gene'
[2017-08-17 15:28:56] INFO: CDF rows / columns: 2572 x 2680
[2017-08-17 15:28:56] INFO: Parsing CEL files...
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-17-afc2e0487185> in <module>()
      3 cdf_file = "/Users/alex/Desktop/KVH/data/CELfiles/HTA-2_0.r1.gene.cdf"
      4 
----> 5 genes, samples, X = rma(cdf_file, sample_cel_files)

~/Desktop/KVH/venv/lib/python3.6/site-packages/pyaffy/process.py in rma(cdf_file, sample_cel_files, pm_probes_only, bg_correct, quantile_normalize, medianpolish)
    139         logger.debug('Parsing CEL file for sample "%s": %s', sample, cel_file)
    140         samples.append(sample)
--> 141         y = parse_cel(cel_file)
    142         Y[:,j] = y[pm_sel]
    143     sub_logger.setLevel(logging.NOTSET)

pyaffy/celparser.pyx in pyaffy.celparser.parse_cel()

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in read(self, size)
    274             import errno
    275             raise OSError(errno.EBADF, "read() on write-only GzipFile object")
--> 276         return self._buffer.read(size)
    277 
    278     def read1(self, size=-1):

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compression.py in readinto(self, b)
     66     def readinto(self, b):
     67         with memoryview(b) as view, view.cast("B") as byte_view:
---> 68             data = self.read(len(byte_view))
     69             byte_view[:len(data)] = data
     70         return len(data)

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in read(self, size)
    461                 # jump to the next member, if there is one.
    462                 self._init_read()
--> 463                 if not self._read_gzip_header():
    464                     self._size = self._pos
    465                     return b""

/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/gzip.py in _read_gzip_header(self)
    409 
    410         if magic != b'\037\213':
--> 411             raise OSError('Not a gzipped file (%r)' % magic)
    412 
    413         (method, flag,

OSError: Not a gzipped file (b'\x00\x00')

Error caused by inconsistent bytes/str

When I try to use pyAffy on Python 3.6, it fails with this error:

  File "/home/peastman/pyaffy/pyaffy/process.py", line 141, in rma
    y = parse_cel(cel_file)
  File "pyaffy/celparser.pyx", line 696, in pyaffy.celparser.parse_cel
  File "pyaffy/celparser.pyx", line 614, in pyaffy.celparser.parse_celfile_cc
  File "pyaffy/celparser.pyx", line 523, in pyaffy.celparser.parse_celfile_cc.read_data_header
  File "pyaffy/celparser.pyx", line 490, in pyaffy.celparser.parse_celfile_cc.read_header_param
TypeError: a bytes-like object is required, not 'str'

The error is caused by this line in read_header_param():

 v2 = decode_unicode(v2.rstrip('\x00'))

bytes.rstrip() requires its argument to be a bytes, not a str. The line should be changed to

 v2 = decode_unicode(v2.rstrip(b'\x00'))

and likewise in line 492.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.