pymc-devs / pymc Goto Github PK

Bayesian Modeling and Probabilistic Programming in Python

License: Other

Shell 0.05% Python 99.86% Dockerfile 0.04% Makefile 0.05%

python statistical-analysis bayesian-inference mcmc variational-inference probabilistic-programming pytensor

pymc's Introduction

PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.

Check out the PyMC overview, or one of the many examples! For questions on PyMC, head on over to our PyMC Discourse forum.

Features

Intuitive model specification syntax, for example, x ~ N(0,1) translates to x = Normal('x',0,1)
Powerful sampling algorithms, such as the No U-Turn Sampler, allow complex models with thousands of parameters with little specialized knowledge of fitting algorithms.
Variational inference: ADVI for fast approximate posterior estimation as well as mini-batch ADVI for large data sets.
Relies on PyTensor which provides:
- Computation optimization and dynamic C or JAX compilation
- NumPy broadcasting and advanced indexing
- Linear algebra operators
- Simple extensibility
Transparent support for missing value imputation

Getting started

If you already know about Bayesian statistics:

Learn Bayesian statistics with a book together with PyMC

Bayesian Analysis with Python (third edition) by Osvaldo Martin: Great introductory book.
Probabilistic Programming and Bayesian Methods for Hackers: Fantastic book with many applied code examples.
PyMC port of the book "Doing Bayesian Data Analysis" by John Kruschke as well as the first edition.
PyMC port of the book "Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath
PyMC port of the book "Bayesian Cognitive Modeling" by Michael Lee and EJ Wagenmakers: Focused on using Bayesian statistics in cognitive modeling.

Audio & Video

Here is a YouTube playlist gathering several talks on PyMC.
You can also find all the talks given at PyMCon 2020 here.
The "Learning Bayesian Statistics" podcast helps you discover and stay up-to-date with the vast Bayesian community. Bonus: it's hosted by Alex Andorra, one of the PyMC core devs!

Installation

To install PyMC on your system, follow the instructions on the installation guide.

Citing PyMC

Please choose from the following:

PyMC: A Modern and Comprehensive Probabilistic Programming Framework in Python, Abril-Pla O, Andreani V, Carroll C, Dong L, Fonnesbeck CJ, Kochurov M, Kumar R, Lao J, Luhmann CC, Martin OA, Osthege M, Vieira R, Wiecki T, Zinkov R. (2023)
A DOI for all versions.
DOIs for specific versions are shown on Zenodo and under Releases

Contact

We are using discourse.pymc.io as our main communication channel.

To ask a question regarding modeling or usage of PyMC we encourage posting to our Discourse forum under the “Questions” Category. You can also suggest feature in the “Development” Category.

You can also follow us on these social media platforms for updates and other announcements:

To report an issue with PyMC please use the issue tracker.

Finally, if you need to get in touch for non-technical information about the project, send us an e-mail.

License

Apache License, Version 2.0

Software using PyMC

General purpose

Bambi: BAyesian Model-Building Interface (BAMBI) in Python.
calibr8: A toolbox for constructing detailed observation models to be used as likelihoods in PyMC.
gumbi: A high-level interface for building GP models.
SunODE: Fast ODE solver, much faster than the one that comes with PyMC.
pymc-learn: Custom PyMC models built on top of pymc3_models/scikit-learn API

Domain specific

Exoplanet: a toolkit for modeling of transit and/or radial velocity observations of exoplanets and other astronomical time series.
beat: Bayesian Earthquake Analysis Tool.
CausalPy: A package focussing on causal inference in quasi-experimental settings.

Please contact us if your software is not listed here.

Papers citing PyMC

See Google Scholar here and here for a continuously updated list.

Contributors

See the GitHub contributor page. Also read our Code of Conduct guidelines for a better contributing experience.

Support

PyMC is a non-profit project under NumFOCUS umbrella. If you want to support PyMC financially, you can donate here.

Professional Consulting Support

You can get professional consulting support from PyMC Labs.

Thanks to our contributors

pymc's People

Contributors

Stargazers

Watchers

Forkers

wesm fannix twiecki rabbitmcrabbit jbwhit borisaqua guniorobot jseabold barentsen takluyver sgenoud isofer flashus lmoustakas roban memmett jgomezdans frrp jakebiesinger luoq cosmologytaskforce greeness jeetsukumaran neurodebian qccode hasantayyar wqren brandonmayer-zz xiangdiuxiu liyanghua lijinhui nlhowell rayleyva hieuza hasrahali eteq kalep zinnirahkasim liuqingjie06 judyhappy310 haiyundao astraw minrk bitburg-chef vencentyoung keflavich johnbachman johnmcdonnell danielonicolo jsalvatier romovpa camdavidsonpilon damapl jgors kforeman wf8 rodrigob jvangael sczesla wanderlustzoe cushychicken kdere sotte gabekron zenourn cdeil jiangdonal tokestermw zonca ebilionis jasonmhite haraldschilly kyleam jfeng3 chadheyne johnicholas bjedwards hihihippp jscholz hongseok-yang siworgan lemonhall shangma dybwall1234 rayvr lbybee mattions carlzhangxuan ethansr bensonlee5 hkaushalya mdekauwe omidi terhardt drdark nargess62 ethan-log-kim aaronmartin0303 bag-of-projects k-ishiguro

pymc's Issues

Disable figure pop-ups while running tests?

When running pymc.test() there are a lot of annoying figure pop-ups (seem to be empty). Can these be disabled?

Also I got an error and 4 failures, although I think there are other open tickets for these already:

In [2]: pymc.__version__
Out[2]: '2.2grad'

In [3]: pymc.test()
Running unit tests for pymc.tests
NumPy version 1.6.1
NumPy is installed in /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy
Python version 2.7.2 (default, Jul 23 2011, 13:25:29) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
nose version 1.0.0
..................F.........................S....../opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/database/hdf5.py:448: UserWarning: 
Error tallying K, will not try to tally it again this chain.
Did you make all the samevariables and step methods tallyable
as were tallyable last time you used the database file?

Error:

Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/database/hdf5.py", line 438, in tally
    self._traces[name].tally(chain)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/database/hdf5.py", line 57, in tally
    self._vlarrays[chain].append(self._getfunc())
IndexError: list index out of range

  %s"""%(name, ''.join(traceback.format_exception(cls, inst, tb))))
FFF.............................................................................................E...SS.....................
======================================================================
ERROR: test_gradients (pymc.tests.test_gradients.test_gradients)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/tests/test_gradients.py", line 340, in test_gradients
    negative_binomial = NegativeBinomial('negative_binomial', mu = c, alpha = d )
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/distributions.py", line 269, in __init__
    Stochastic.__init__(self, logp=logp, random=random, logp_partial_gradients = logp_partial_gradients, dtype=dtype, **arg_dict_out)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 691, in __init__
    verbose=verbose)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/Node.py", line 193, in __init__
    Node.__init__(self, doc, name, parents, cache_depth, verbose=verbose)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/Node.py", line 111, in __init__
    self.parents = parents
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/Node.py", line 132, in _set_parents
    self.gen_lazy_function()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/PyMCObjects.py", line 731, in gen_lazy_function
    self._logp.force_compute()
  File "LazyFunction.pyx", line 256, in pymc.LazyFunction.LazyFunction.force_compute
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/distributions.py", line 2918, in wrapper
    return f(value, **kwds)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/distributions.py", line 2076, in negative_binomial_like
    if alpha > 1e10:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

======================================================================
FAIL: test_simple (pymc.tests.test_convergence.test_raftery_lewis)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/tests/test_convergence.py", line 72, in test_simple
    assert(0.8 < (float(nprec)/kmind) / nmin < 1.2)
AssertionError

======================================================================
FAIL: test_simple_sample (pymc.tests.test_database.testHDF5Objects)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/tests/test_database.py", line 287, in test_simple_sample
    assert_array_equal(self.S.trace('K')[:].shape, (10,))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 707, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

(mismatch 100.0%)
 x: array([0])
 y: array([10])

======================================================================
FAIL: test_xload (pymc.tests.test_database.testHDF5Objects)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/tests/test_database.py", line 300, in test_xload
    assert_array_equal(db.K().shape, (10,))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 707, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

(mismatch 100.0%)
 x: array([0])
 y: array([10])

======================================================================
FAIL: test_yconnect_and_sample (pymc.tests.test_database.testHDF5Objects)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymc/tests/test_database.py", line 311, in test_yconnect_and_sample
    assert_array_equal(db.K(chain=1).shape, (10,))
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 707, in assert_array_equal
    verbose=verbose, header='Arrays are not equal')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not equal

(mismatch 100.0%)
 x: array([0])
 y: array([10])

----------------------------------------------------------------------
Ran 174 tests in 33.907s

FAILED (SKIP=3, errors=1, failures=4)
Out[3]: <nose.result.TextTestResult run=174 errors=1 failures=4>

2.1alpha tarball

seems to still have files missing in pymc/gp, 'gp_submodel.py' and 'step_methods.py'. Downloading and placing them in the folder seems to fix the issue.

Proposal: argument check when creating a Node

I spent an hour today trying to understand why my simple linear model does not converge to the true values.

#create data for a simple linear model
k = 4
n = 100
sigma = 0.3;
#create beta
true_beta = np.array([1, 0.5, -2, 1.7])
#create x martix
x = np.random.normal(0,1,size= (n,k))
#data = x*beta + noise
data = dot(x,true_beta) + np.random.randn(n)*sigma

#create linear model
beta = pm.Normal('beta', 0, 100, size=k)
xbeta = pm.Lambda('xbeta', lambda x=x,beta=beta:dot(x,beta))
tau = pm.Uniform('tau', 0,1000, value=0.2)
y = pm.Normal(xbeta, tau, observed=True,value=data)  ##!!!!!!!## my bug is here

#MCMC
mc = pm.MCMC([xbeta, tau, y, beta])
mc.sample(10000,5000)

the model runs, and even converges, but not to the true values. the reason is that the first argument that was passed to pm.Normal when the y node was created is xbeta and not the node's name.

I think it would be extermly helpful to add argument checking when a node is initialize to prevent users from creating an incorrect model.

Fortran likelihoods need to check shapes of input arguments

In order to avoid difficult debugging of errors.

Allow for arrays of multivariate stochastics

Currently, the MvNormal object (as well as other multivariates) does not allow for vectors of multivariate normals; individual objects need to be created for each MVN variable.

Improve example model set

It would be nice to incorporate some of the better examples posted on the website into the examples in the source tree. There isn't a great deal of variety in the current example set, and several are very similar. I invite proposals for models to remove from and add to the current set.

Proposal: Add multiprocessor/MPI support

Hi,

clearly it would be nice if multiple chains could be run in parallel (using multiprocessing and/or mpi4py). I actually already hacked something together which does this, but it wasn't very clean. I would volunteer to implement this feature if there is enough interest. First, however, I thought I'd ask for some pointers on what the best implementation would be.

One thing I think is unavoidable (although kinda ugly): writing the traces of individual runs to separate db files (best backend for this?) and then in a next step reloading them and adding them into one trace (with different individual chains).

Also, I think this would have to be a subclass of MCMC (ParallelMCMC?) that behind the scene creates multiple MCMC instances.

Feedback?

-Thomas

docstring incorrect for Exponential / exponential_like

Hello

Firstly -- fantastic package. I just starting using it. Keep it up.

I got stuck earlier trying to figure out why my priors weren't behaving as expected. Then I realised that Exponential (and exponential_like) are misdocument. Half of this problem has been corrected between the 2.1beta and the current commit, but the docstrings are still incorrect.

So part of Exponential's docstring reads:

E = Exponential(name, beta, value=None, observed=False, size=1, trace=True, rseed=True, doc=None, verbose=None, debug=False)

Stochastic variable with Exponential distribution.
Parents are: beta.

.. math:: f(x \mid \beta) = \beta e^{-\beta x}

:Parameters:
- x : x > 0
- beta : Scale parameter (beta > 0).

.. note::
- :math:E(X) = \beta
- :math:Var(X) = \beta^2

In commit 5dce4d9, the math form was corrected from the parameterisation
f(x \mid \beta) = \frac{1}{\beta}e^{-x/\beta}
whereas the current form reflects what's actually being calculated. However the expectations need to be changed as well -- in the current form, E(X) = 1/beta, and Var(X) = 1/beta^2.

Even still, if this is the preferred form (giving 1/E(X) as the argument, rather than E(X)), then to be consistent with the standard literature, the parameter should be called \lambda and not \beta:
http://itl.nist.gov/div898/handbook/eda/section3/eda3667.htm
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.exponential.html
http://mathworld.wolfram.com/ExponentialDistribution.html

It seems to be consensus that one parameterises the exponential distribution either as
f(x \mid \beta) = \frac{1}{\beta}e^{-x/\beta}
f(x \mid \beta) = \lambda e^{-\lambda x}

This was my stumbling block...

Disable progress bar in isample

The progress bar is not designed to work well with other feedback from the model. Since isample() is interactive, it does not play well with the progress bar, so the latter should be disabled.

Port docs to Sphinx

I am just about finished this; see the docrev branch for the revised docs. I will merge this with the master branch when it has been completed and proofed (proofreading volunteers welcome!).

Cannot change StepMethod after sampling

If a model has been initialized and sampled from, it is apparently not possible to change the StepMethod for any of its stochastics. use_step_method() works fine before sampling has occurred, but if a subsequent chain is sampled after changing the StepMethod, it uses the original StepMethod that it was initialized with.

This post describes an example: http://groups.google.com/group/pymc/browse_thread/thread/2e7a7e3be91bc90f

Convert PyMC extensions to numexpr or cython

See this discussion on the Google Code tracker for details.

MAP's fmin is unconstrained and breaks nodes by walking outside the support.

Hi,

I found it very useful to just use the MAP for complex models during model exploration. However, I'm having the problem that fmin sometimes explores parameter spaces outside the support of my distributions (Uniform). In this case, rather than return -Inf, they raise an exception. I'm not sure what the best way would be to fix it, but in this case I think this behavior makes no sense, so either MAP should ignore out-side-of-support-errors or the nodes should have a flag not to raise that error.

Thanks,
Thomas

linalg_utils missing dpotrf_wrap

On Fedora 13, with the latest development version from github, I get this import error when importing pymc:

In [1]: import pymc

ImportError Traceback (most recent call last)

/usr/lib/python2.6/site-packages/pymc/init.pyc in ()
45 from CircularStochastic import *
46 import distributions
---> 47 import gp
48
49 # Optional modules

/usr/lib/python2.6/site-packages/pymc/gp/init.py in ()
20 from GPutils import *
21 from Mean import *
---> 22 from Covariance import *
23 from BasisCovariance import *
24 from FullRankCovariance import *

/usr/lib/python2.6/site-packages/pymc/gp/Covariance.py in ()
8 from numpy.linalg import cholesky, LinAlgError
9 from GPutils import regularize_array, trisolve, square_and_sum
---> 10 from linalg_utils import diag_call, dpotrf_wrap
11 from incomplete_chol import ichol, ichol_continue
12

ImportError: cannot import name dpotrf_wrap

I can verify using nm that linalg_utils.so does not contain dpotrf_wrap.

IndexError in test_gradients

Two identical failures occur during calls to check_gradients in the test_gradients unit tests:

Traceback (most recent call last):
  File "/Users/fonnescj/Code/pymc/pymc/tests/test_gradients.py", line 286, in test_gradients
    check_gradients(norm2)
  File "/Users/fonnescj/Code/pymc/pymc/tests/test_gradients.py", line 107, in check_gradients
    numeric_gradient = get_numeric_gradient(stochastics, s)
  File "/Users/fonnescj/Code/pymc/pymc/tests/test_gradients.py", line 134, in get_numeric_gradient
    delta[unravel_index(i,i_shape or (1,))] += e
IndexError: 0-d arrays can only use a single () or a list of newaxes (and a single ...) as an index

Also occurs on line 360 of test_gradients.

pymc.Matplot.plot sometimes ignore path argument

If you initialize a model using vars() and then plot it, the "compound" objects (e.g. MVN) don't use the specified path when saving.

a = pymc.MvNormalCov('a',[10,10],[[1,0],[0,1]])
b = pymc.Normal('b',10,1)

m = pymc.MCMC(vars())
m.sample(10)

pymc.Matplot.plot(m.a, path='C:/tmp', format='pdf')
pymc.Matplot.plot(m.b, path='C:/tmp', format='pdf')

In this case b.pdf will be saved in C:/tmp, but a.pdf will be placed in the working directory. Tested in both 2.1alpha and latest dev version.

gelman_rubin returns wrong values

x = pm.Normal('x',0,1,size=5)

@pm.deterministic
def y(x=x):
return x[2]

m = pm.MCMC([y,x])
m.sample(100,0)
m.sample(100,0)
m.sample(100,0)
pm.diagnostics.gelman_rubin(m)

I ran this code and got different values for y and for x[2].
for instance,
{'x': [0.96274904628327884,
1.0617412110342699,
0.94988095858687061,
0.94650618423398925,
1.0581700972182297],
'y': 1.0180891431696331}

HDF5 failures on testHDF5Objects

The unit tests consistently fail testHDF5Objects in test_database. It appears to choke on updating the K node when sampling the second chain, as it refuses to tally:

UserWarning:             ] Iterations: 0
Error tallying K, will not try to tally it again this chain.
Did you make all the samevariables and step methods tallyable
as were tallyable last time you used the database file?

This subsequently causes the chain length assertion tests to fail.

Running on OS X 10.7 with HDF5 1.8.5.

GP Examples not in installed package code

The GP examples in the source do not appear in the examples file in the installed code for the pymc package. (All the other examples do seem to be there, though.)

Drop MySQL support?

Has anyone, ever, used this backend other than me?

pymc.gp.observe doesn't handle undefined OMP_NUM_THREADS

On my system, the environment variable OMP_NUM_THREADS is not defined, so I get a KeyError when executing pymc.gp.observe. A suggested patch is below, but I'm not sure whether this is consistent with the rest of the code in threadpool.py, for example lines 307-310 which appear to be there to handle this case.

John

diff --git a/pymc/threadpool.py b/pymc/threadpool.py
index 7f06f8c..83e2907 100644
--- a/pymc/threadpool.py
+++ b/pymc/threadpool.py
@@ -372,7 +372,7 @@ def get_threadpool_size():

def thread_partition_array(x):
"Partition work arrays for multithreaded addition and multiplication"

n_threads = int(os.environ['OMP_NUM_THREADS'])
n_threads = int(os.environ.get('OMP_NUM_THREADS','1'))
if len(x.shape)>1:
maxind = x.shape[1]
else:

ArrayContainers don't like ufuncs

Need to add ufunc-like methods for Variables. See this issue on the Google Code tracker.

eliminating special casing in getitem

Currently when getitem gets a scalar value, it returns an Index deterministic, otherwise it returns a regular deterministic that indexes into the array. My understanding is that Index exists so that linear models can be special cased. I would like to find a way to eliminate the special casing since it's a bit inelegant. Could Index be made to handle non-scalars?

Poor performance of DiscreteMetropolis with categorical stochastics

At present, the DiscreteMetropolis step method works poorly with categorical stochastics. In particular, it proposes values that are outside the bounds of the discrete set of indices that comprises the support of such variables. Perhaps a subclass of DiscreteMetropolis is required that only proposes valid integer values (i.e. {0, 1, ..., len(p)}). See the koala example for this behavior.

Large data vectors break hdf5

This line: https://github.com/pymc-devs/pymc/blob/master/pymc/database/hdf5.py#L416 causes the following error if an observed stochastic has length of around 10k.

HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
#000: H5Adeprec.c line 165 in H5Acreate1(): unable to create attribute
major: Attribute
minor: Unable to initialize object
#1: H5A.c line 492 in H5A_create(): unable to create attribute in object header
major: Attribute
minor: Unable to insert object
#2: H5Oattribute.c line 346 in H5O_attr_create(): unable to create new attribute in header
major: Attribute
minor: Unable to insert object
#3: H5Omessage.c line 224 in H5O_msg_append_real(): unable to create new message
major: Object header
minor: No space available for allocation
#4: H5Omessage.c line 1925 in H5O_msg_alloc(): unable to allocate space for message
major: Object header
minor: Unable to initialize object
#5: H5Oalloc.c line 1135 in H5O_alloc(): object header message is too large
major: Object header
minor: Unable to initialize object
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
#000: H5A.c line 916 in H5Awrite(): not an attribute
major: Invalid arguments to routine
minor: Inappropriate type

cannot import pymc while Sampler is running in another session

(This issue was moved from the Google Code tracker - #356)

What steps will reproduce the problem?

I've run the following:

In [1]: import pymc

In [3]: from pymc.examples import DisasterModel

In [4]: t = pymc.MCMC(DisasterModel)

In [5]: t.sample(50000, thin=100)


while the sampler was running I opened a new python session in another terminal and got an error while trying to import pymc.


$ ipython -q4thread
Enthought Python Distribution -- http://www.enthought.com

Python 2.7.1 |EPD 7.0-2 (64-bit)| (r271:86832, Dec  3 2010, 15:56:20) 
Type "copyright", "credits" or "license" for more information.

IPython 0.10.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: import pymc
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)

/Users/imrisofer/<ipython console> in <module>()

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/__init__.py in <module>()
     16 
     17 # Core modules

---> 18 from threadpool import *
     19 try:
     20     import Container_values

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/threadpool.py in <module>()
    310 except:
    311     import multiprocessing
--> 312     __PyMCThreadPool__ = ThreadPool(multiprocessing.cpu_count())
    313 
    314 class CountDownLatch(object):

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/multiprocessing/__init__.pyc in cpu_count()
    118         try:
    119             with os.popen('sysctl -n hw.ncpu') as p:
--> 120                 num = int(p.read())
    121         except ValueError:
    122             num = 0

IOError: [Errno 4] Interrupted system call

What version of the product are you using? On what operating system?

Mac OS X 10.6.7
pymc 2.2grad

Please provide any additional information below.

this error occurs only while the Sampler is running in the first session and only when the second session is open with the q4thread flag (I did not check other threads, but without any flags there's no problem of importing pymc)

step_method_dict raise an error

(the markdown of the following code misses a few lines)

I'm running the following:

from pymc import *
from numpy import *
from numpy.random import rand


def bla():

    t = 50
    data = rand(t)
    e = 2
    k = 5;
    R = eye(k);
    D = rand(t,3)
    Xdm = rand(t,k,e)

    h = MvNormal('h', zeros(k), eye(k))

    alpha = Uniform('alpha', np.ones(e) * 0, np.ones(e) * 100)

    gamma = MvNormal('gamma', zeros(3), eye(3))

    @deterministic
    def drift(D = D, gamma=gamma):
        return dot(D, gamma)

    sigma = InverseGamma('sigma', alpha=4, beta=4, value=10)

    @deterministic
    def all_alphaXh(Xdm=Xdm, h=h, alpha=alpha, t=t):
        signal = np.zeros(t)
        for i in range(len(alpha)):
            signal += np.dot(Xdm[:,:,i], h) * alpha[i]

        return  signal;

    y = Normal('y', all_alphaXh + drift, 1/sigma, value=data, observed=True)
    return { 'h':h, 'alpha':alpha, 'gamma':gamma, 'sigma':sigma,
             'y':y, 'drift':drift, 'all_alphaXh':all_alphaXh}

m = MCMC(bla())



an when I try to access m.step_method_dict I get the following error

In [3]: m.step_method_dict
Out[3]: ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/Users/imrisofer/<ipython console> in <module>()

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/IPython/Prompts.pyc in __call__(self, arg)
    550 
    551             # and now call a possibly user-defined print mechanism

--> 552             manipulated_val = self.display(arg)
    553 
    554             # user display hooks can change the variable to be stored in


/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/IPython/Prompts.pyc in _display(self, arg)
    576             return IPython.generics.result_display(arg)
    577         except TryNext:
--> 578             return self.shell.hooks.result_display(arg)
    579 
    580     # Assign the default display method:


/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/IPython/hooks.pyc in __call__(self, *args, **kw)
    139             #print "prio",prio,"cmd",cmd #dbg

    140             try:
--> 141                 ret = cmd(*args, **kw)
    142                 return ret
    143             except ipapi.TryNext, exc:

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/IPython/hooks.pyc in result_display(self, arg)
    169 
    170     if self.rc.pprint:
--> 171         out = pformat(arg)
    172         if '\n' in out:
    173             # So that multi-line strings line up with the left column of


/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/pprint.pyc in pformat(self, object)
    117     def pformat(self, object):
    118         sio = _StringIO()
--> 119         self._format(object, sio, 0, 0, {}, 0)
    120         return sio.getvalue()
    121 

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/pprint.pyc in _format(self, object, stream, indent, allowance, context, level)
    135             self._readable = False
    136             return
--> 137         rep = self._repr(object, context, level - 1)
    138         typ = _type(object)
    139         sepLines = _len(rep) > (self._width - 1 - indent - allowance)

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/pprint.pyc in _repr(self, object, context, level)
    228     def _repr(self, object, context, level):
    229         repr, readable, recursive = self.format(object, context.copy(),
--> 230                                                 self._depth, level)
    231         if not readable:
    232             self._readable = False

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/pprint.pyc in format(self, object, context, maxlevels, level)
    240         and whether the object represents a recursive construct.
    241         """
--> 242         return _safe_repr(object, context, maxlevels, level)
    243 
    244 

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/pprint.pyc in _safe_repr(object, context, maxlevels, level)
    282         level += 1
    283         saferepr = _safe_repr
--> 284         for k, v in _sorted(object.items()):
    285             krepr, kreadable, krecur = saferepr(k, context, maxlevels, level)
    286             vrepr, vreadable, vrecur = saferepr(v, context, maxlevels, level)

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/pprint.pyc in _sorted(iterable)
     77             warnings.filterwarnings("ignore", "comparing unequal types "
     78                                     "not supported", DeprecationWarning)
---> 79         return sorted(iterable)
     80 
     81 class PrettyPrinter:

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/CommonDeterministics.pyc in new_method(self, other)
    631                                 {'self':self, 'other':other, 'op':'__'+op_name+'__'},
    632                                 trace=False,
--> 633                                 plot=False)
    634     # Convert the function into a method for klass.

    635     new_method.__name__ = '__'+op_name+'__'

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in __init__(self, eval, doc, name, parents, dtype, trace, cache_depth, plot, verbose, jacobians, jacobian_formats)
    396                         trace=trace,
    397                         plot=plot,
--> 398                         verbose=verbose)
    399 
    400         # self._value.force_compute()


/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/Node.pyc in __init__(self, doc, name, parents, cache_depth, trace, dtype, plot, verbose)
    218         self.extended_children = set()
    219 
--> 220         Node.__init__(self, doc, name, parents, cache_depth, verbose=verbose)
    221 
    222         if self.dtype is None:

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/Node.pyc in __init__(self, doc, name, parents, cache_depth, verbose)
    136 
    137         # Initialize

--> 138         self.parents = parents
    139 
    140     def _get_parents(self):

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/Node.pyc in _set_parents(self, new_parents)
    157 
    158         # Get new lazy function

--> 159         self.gen_lazy_function()
    160 
    161     parents = property(_get_parents, _set_parents, doc="Self's parents: the variables referred to in self's declaration.")

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in gen_lazy_function(self)
    410                                     cache_depth = self._cache_depth)
    411 
--> 412         self._value.force_compute()
    413 
    414         self._jacobians = {}

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/LazyFunction.so in pymc.LazyFunction.LazyFunction.force_compute()

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/CommonDeterministics.pyc in eval_fun(self, other, op)
    625         # This code creates a Deterministic object.

    626         def eval_fun(self, other, op):
--> 627             return getattr(self, op)(other)
    628         return pm.Deterministic(eval_fun,
    629                                 'A Deterministic returning the value of %s(%s,%s)'%(op_name,self.__name__, str(other)),

ValueError: shape mismatch: objects cannot be broadcast to a single shape

in order to reproduce this bug one need to type m.step_method_dict in the interpreter. execute the same command from a file does not raise an error.

using Mac OS 10.6.7, pymc 2.2grad, python 2.7.1

TestMissing test generates a lot of errors, looks related to TWalk

I get a lot of divide by zero in log errors and invalid value errors in TestMissing. I set numpy.seterr(divide = 'raise', invalid = 'raise') and it looks like the problem comes up in TWalk

easy_install fails on Ubuntu 10.04, python2.6-dev required but not checked

Hello,

when trying to install pymc on Ubuntu 10.04 today I ran into some trouble. I will attach the error message in a moment (it's quite long).

The issue was that I needed the python2.6-dev package, after that everything ran fine. Maybe there is a way to include this in the requirements for pymc, if not then I think it should be mentioned in the installation notes for ubuntu (I see that it says that it needs python 2.6, but I needed 30 minutes to figure out that it meant that I need to install python2.6-dev).

Thanks for all the great work!
Paul

Structured Recursives

Currently the estimation of (e.g.) time dependent states is super slow, requiring a costly loop to get from a state t-1 to state t (such as the surplus production example on the pymc page). If these could somehow be passed into something faster using a keyword or decorator it would be a big deal for many kinds of ecological (and I suspect econometric) models. Could this be done? Something like:

@recursive(init=P0,size=nyears)
def P(k=k,r=r,value=Yi):
    Pmean = log(P[i-1]+r*P[i-1]*(1-P[i-1])-k*catch)
    return lognormal_like(value,mu=Pmean,tau=0.01)

cannot compute stats on a model that was loaded from a db

I cannot compute stats on a model that was loaded from a db although the model was loaded successfully.

Here is the code I used:

creating MCMC model

import pymc as pm
x = pm.Normal('x',0,1)
mc = pm.MCMC([x],db='pickle', dbname='testing.db')

sample

mc.sample(1000)

print stats (no problem here)

print mc.stats()

close the db

mc.db.close()

now, let's load the db

db = pm.database.pickle.load('testing.db')

create a new model

x = pm.Normal('x',0,1)
mc2 = pm.MCMC([x], db=db)

check that the trace was loaded

print len(mc2.trace('x')[:])

try to compute stats (which fails)

print mc2.stats()

here is the output that I got:

In [31]: import pymc as pm

In [32]: x = pm.Normal('x',0,1)

In [33]: mc = pm.MCMC([x],db='pickle', dbname='testing.db')

In [34]: mc.sample(1000)
Sampling: 100% [00000000000000000000000000000000000000000000000000000000000000000000000000000000000000] Iterations: 1000

In [35]: print mc.stats()
{'x': {'95% HPD interval': array([-1.88467957, 1.94281242]), 'n': 1000, 'quantiles': {2.5: -1.8828947855723701, 25: -0.68774516402745267, 50: -0.049887820880991689, 75: 0.665653263476219, 97.5: 1.9941128356195819}, 'standard deviation': 0.99104350597342195, 'mc error': 0.032035926857117064, 'mean': -0.030453765602493965}}

In [36]: mc.db.close()

In [37]: db = pm.database.pickle.load('testing.db')

In [38]: x = pm.Normal('x',0,1)

In [39]: mc2 = pm.MCMC([x], db=db)

In [40]: print len(mc2.trace('x')[:])
1000

In [41]: print mc2.stats()
Could not generate output statistics for x
{'x': None}

adaptive metropolis raise an error (cov_from_trace maybe??)

x = pm.Normal('x',0,1)
y = pm.Normal('y',0,1)
z = pm.MCMC([x,y])
z.sample(1000)
z.use_step_method(pm.AdaptiveMetropolis, [x,y])

LinAlgError                               Traceback (most recent call last)

/Users/imrisofer/<ipython console> in <module>()

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/MCMC.pyc in use_step_method(self, step_method_class, *args, **kwds)
     88         """
     89 
---> 90         new_method = step_method_class(*args, **kwds)
     91         if self.verbose > 1:
     92             print 'Using step method %s. Stochastics: ' % step_method_class.__name__

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/StepMethods.pyc in __init__(self, stochastic, cov, delay, interval, greedy, shrink_if_necessary, scales, verbose, tally)
    979                 self.C = self.cov_from_value(100.)
    980 
--> 981         self.updateproposal_sd()
    982 
    983         # Keep track of the internal trace length


/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/StepMethods.pyc in updateproposal_sd(self)
   1179     def updateproposal_sd(self):
   1180         """Compute the Cholesky decomposition of self.C."""
-> 1181         self.proposal_sd = np.linalg.cholesky(self.C)
   1182 
   1183     def recursive_cov(self, cov, length, mean, chain, scaling=1, epsilon=0):

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/numpy/linalg/linalg.pyc in cholesky(a)
    515     """
    516     a, wrap = _makearray(a)
--> 517     _assertRank2(a)
    518     _assertSquareness(a)
    519     t, result_t = _commonType(a)

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/numpy/linalg/linalg.pyc in _assertRank2(*arrays)
    153         if len(a.shape) != 2:
    154             raise LinAlgError, '%d-dimensional array given. Array must be \    
--> 155             two-dimensional' % len(a.shape)
    156 
    157 def _assertSquareness(*arrays):

LinAlgError: 0-dimensional array given. Array must be             two-dimensional

I think the problem is in cov_from_trace
Mc Os X 10.6.7 pymc 2.2grad, python 2.7.1

Can we get past transient state in hierarchical model?

An apparent bug in AdaptiveMetropolis, detailed in this issue on Google Code.

verbose setting is not changed by sample() in multiple chains

When running a MCMC model 2+ times, the verbose setting for sampling is not properly changed after the initial sampling run. So, for example, using verbose=2 causes tuning feedback to be displayed, even when subsequent chains are run with a lower verbosity.

Make the pymc-devs account into an organization?

Github has organizations now. Would we be OK with making pymc-devs an organization?

Test for importing from source tree masks compiler issues on Windows

A couple of windows users have reported receiving the "'You seem to be importing PyMC from inside its source tree" error, despite being outside the source tree. Rather, it appears to be a problem with their gcc compiler not building Container_values. I wonder if there is a better test for the import from source tree issue that does not mask other unrelated problems?

try:
    import Container_values
    del Container_values
except ImportError:
    raise ImportError, 'You seem to be importing PyMC from inside its source tree. Please change to another directory and try again.'

graph.dag() broken

I am no longer able to generate DAGs for any of 3 models attempted. Fails in the unit test as well. Here is the error reported:

/Users/fonnescj/Code/pymc/pymc/PyMCObjects.pyc in gen_lazy_function(self)
413
414 self._jacobians = {}
--> 415 for parameter, function in self._jacobian_functions.iteritems():
416 lazy_jacobian = LazyFunction(fun = function,
417 arguments = self.parents,

AttributeError: 'NoneType' object has no attribute 'iteritems'

Suggestion: move stats() to Trace or Database

Hi,

currently, there is no way to print the stats of a loaded database object even though this would be trivial. As it is now, stats() is a method of Node which operates on it's trace instance. If the stats method would be implemented in the Trace class itself and node would just be deferring the call, users could have both, stats() on nodes and stats() from loaded database objects.

I would implement that but wanted to make sure that this wouldn't break anything. I think I'd move stats() in base.Trace. Any input?

-Thomas

GPutils.py,l288 in observe: "raise ValueError, "These data seem extremely improbable.."

Hello guys, I really need your help on this one, pymc is critical for my current work, I understand you are busy with other issues but please give a quick read to this one (it might not be so difficult to solve).
I have recently upgraded to Fedora 14 (Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)). Before that, I was using Fedora 12 (python 2.6.2). After the upgrade I am getting the following error all the time, when trying to run the same code as before:

self._gp = _GP.observe( self._mean, self._cov, obs_mesh = self._fiber, obs_vals=numpy.ones( len(self._fiber)), obs_V = numpy.zeros(len(self._fiber) )+self._epsilon )
File "/user/alvsanch/home/bin/lib64/python2.7/site-packages/pymc/gp/GPutils.py", line 288, in observe
raise ValueError, "These data seem extremely improbable given your GP prior. \n Suggestions: decrease observation precision, or adjust the covariance to \n allow the function to be less smooth."

I am currently using 'pymc-2.1beta' (same version i was using before the upgrade) and I have also tried with 'pymc-devs-pymc-jss-gp-15-gca77b7a' , but the same ValueError is raised. I am pretty sure the code is ok, since it is the same i was running before with no problems (and i can still run it in other computers with python 2.6)... Could it be that python 2.7 is not compatible with pymc???

Please, any help would be greatly appreciated.

*The code is a bit long, I can provide it on request
*Additional information... pymc.test()

Python 2.7 (r27:82500, Sep 16 2010, 18:02:00)
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.

import pymc
pymc.test()>>> pymc.test()
Running unit tests for pymc.tests
NumPy version 1.6.0rc2
NumPy is installed in /user/alvsanch/home/bin/lib64/python2.7/site-packages/numpy
Python version 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)]
nose version 0.11.3

.....................S.............S......S.............................................................................................E.E.SS.....................

ERROR: test_gradients (pymc.tests.test_gradients.test_gradients)

Traceback (most recent call last):
File "/user/alvsanch/home/bin/lib64/python2.7/site-packages/pymc/tests/test_gradients.py", line 281, in test_gradients
check_gradients(norm2)
File "/user/alvsanch/home/bin/lib64/python2.7/site-packages/pymc/tests/test_gradients.py", line 107, in check_gradients
numeric_gradient = get_numeric_gradient(stochastics, s)
File "/user/alvsanch/home/bin/lib64/python2.7/site-packages/pymc/tests/test_gradients.py", line 129, in get_numeric_gradient
delta[unravel_index(i,i_shape)] += e
ValueError: dims must have at least one value

ERROR: test_model (pymc.tests.test_gradients.test_gradients)

Traceback (most recent call last):
File "/user/alvsanch/home/bin/lib64/python2.7/site-packages/pymc/tests/test_gradients.py", line 355, in test_model
check_gradients(model[0])
File "/user/alvsanch/home/bin/lib64/python2.7/site-packages/pymc/tests/test_gradients.py", line 107, in check_gradients
numeric_gradient = get_numeric_gradient(stochastics, s)
File "/user/alvsanch/home/bin/lib64/python2.7/site-packages/pymc/tests/test_gradients.py", line 129, in get_numeric_gradient
delta[unravel_index(i,i_shape)] += e
ValueError: dims must have at least one value

Ran 161 tests in 19.607s

FAILED (SKIP=5, errors=2)
<nose.result.TextTestResult run=161 errors=2 failures=0>

Nominate issues for 2.2 release

I would like to prepare a launch of version 2.2 over the next few weeks. This is a call to nominate issues that users and devs feel should be taken care of prior to this release. If your pet issue is not on the list already, feel free to add it. Keep in mind that this release is primarily intended to be for maintenance, as we can no longer recommend version 2.1 to users.

gradient calculations in fortran sometimes permutes results with ndim > 2

I think this is due to a fortran/c array ordering issue.

Example:

In [5]: x = array([[1, 2, 3],
       [4, 5, 6]])

In [7]: reshape(flib.normal_grad_mu(x, zeros(x.shape), 1.0),x.shape)
Out[7]: 
array([[ 1.,  4.,  2.],
       [ 5.,  3.,  6.]])

This to me is another reason why moving to numexpr or similar is a good idea.

hdf5 fails to create table when running on a large model

I'm not sure this is a pymc error, it could be an hdf5 error.
I'm getting it when I'm using MCMC with hdf5 on large models.
(Is there any backend which is recommended for work on large models?)

import pymc as pm
import numpy as np
n = 1500
a = np.empty(n, dtype=object)
for i in range(n):
    a[i] = pm.Normal('a%d' % i,0,1)

m = pm.MCMC(a, db='hdf5', dbname = 'test.db')
m.sample(100)

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/tables/table.py:720: PerformanceWarning: table ``/chain3/PyMCsamples`` is exceeding the recommended maximum number of columns (512); be ready to see PyTables asking for *lots* of memory and possibly slow I/O
  PerformanceWarning )
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
  #000: H5Ddeprec.c line 169 in H5Dcreate1(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #001: H5Dint.c line 431 in H5D_create_named(): unable to create and link to dataset
    major: Dataset
    minor: Unable to initialize object
  #002: H5L.c line 1640 in H5L_link_object(): unable to create new link to object
    major: Links
    minor: Unable to initialize object
  #003: H5L.c line 1863 in H5L_create_real(): can't insert link
    major: Symbol table
    minor: Unable to insert object
  #004: H5Gtraverse.c line 964 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #005: H5Gtraverse.c line 759 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #006: H5L.c line 1686 in H5L_link_cb(): unable to create object
    major: Object header
    minor: Unable to initialize object
  #007: H5O.c line 2976 in H5O_obj_create(): unable to open object
    major: Object header
    minor: Can't open object
  #008: H5Doh.c line 295 in H5O_dset_create(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
  #009: H5Dint.c line 1039 in H5D_create(): can't update the metadata cache
    major: Dataset
    minor: Unable to initialize object
  #010: H5Dint.c line 806 in H5D_update_oh_info(): unable to update datatype header message
    major: Dataset
    minor: Unable to initialize object
  #011: H5Omessage.c line 184 in H5O_msg_append_oh(): unable to create new message in header
    major: Attribute
    minor: Unable to insert object
  #012: H5Omessage.c line 224 in H5O_msg_append_real(): unable to create new message
    major: Object header
    minor: No space available for allocation
  #013: H5Omessage.c line 1925 in H5O_msg_alloc(): unable to allocate space for message
    major: Object header
    minor: Unable to initialize object
  #014: H5Oalloc.c line 1135 in H5O_alloc(): object header message is too large
    major: Object header
    minor: Unable to initialize object
---------------------------------------------------------------------------
HDF5ExtError                              Traceback (most recent call last)

/Users/isofer/Projects/Schizo/PySchizo/ in ()

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/MCMC.pyc in sample(self, iter, burn, thin, tune_interval, tune_throughout, save_interval, verbose, progress_bar)
    215 
    216         # Run sampler

--> 217         Sampler.sample(self, iter, length, verbose)
    218 
    219     def _loop(self):

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/Model.pyc in sample(self, iter, length, verbose)
    226         if length is None:
    227             length = iter
--> 228         self.db._initialize(self._funs_to_tally, length)
    229 
    230         # Put traces on objects


/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/pymc/database/hdf5.pyc in _initialize(self, funs_to_tally, length)
    406             title='PyMC samples', \
    407             filters=self.filter,
--> 408             expectedrows=length)
    409 
    410         self._tables.append(table)

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/tables/file.py in createTable(self, where, name, description, title, filters, expectedrows, chunkshape, byteorder, createparents)
    772                      description=description, title=title,
    773                      filters=filters, expectedrows=expectedrows,
--> 774                      chunkshape=chunkshape, byteorder=byteorder)
    775 
    776 

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/tables/table.py in __init__(self, parentNode, name, description, title, filters, expectedrows, chunkshape, byteorder, _log)
    590 
    591         super(Table, self).__init__(parentNode, name, new, filters,
--> 592                                     byteorder, _log)
    593 
    594 

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/tables/leaf.py in __init__(self, parentNode, name, new, filters, byteorder, _log)
    289         # is a lazy property that automatically handles their loading.

    290 
--> 291         super(Leaf, self).__init__(parentNode, name, _log)
    292 
    293 

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/tables/node.py in __init__(self, parentNode, name, _log)
    294             #   Create or open the node and get its object ID.

    295             if new:
--> 296                 self._v_objectID = self._g_create()
    297             else:
    298                 self._v_objectID = self._g_open()

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/tables/table.py in _g_create(self)
    745         # set because it is needed for setting attributes afterwards.

    746         self._v_objectID = self._createTable(
--> 747             self._v_new_title, self.filters.complib or '', obversion )
    748         self._v_recarray = None  # not useful anymore
    749         self._rabyteorder = None # not useful anymore

/Library/Frameworks/EPD64.framework/Versions/7.0/lib/python2.7/site-packages/tables/tableExtension.so in tables.tableExtension.Table._createTable (tables/tableExtension.c:2074)()

HDF5ExtError: Problems creating the table

Add multi-processor support

Implement multi-processor support. Initially for running multiple chains simultaneously, but could
eventually share work on same chain.

See Issue 8 on Google Code for discussion to date.

error when loading flib.pyd

I am working in windows xp + python 2.7.
the pymc packages is downloaded from github direclty.
https://github.com/downloads/pymc-devs/pymc/pymc-2.2alpha.win32-py2.7.exe
but I have a problem when run it.
It says:
" import flib
ImportError: DLL load failed: The specified module could not be found. "

After a check, I find that there are two dll file missing:
LIBGCC_S_SJLJ-1.DLL LIBGFORTRAN-3.DLL

Could you please also add the two files in the packages ? Thank you very much.

Float comparison in categorical

categorical_like checks if vector of probabilities sums to one (Ln 849 in distributions.py) and prints out the vector if test fails. If the sum is slightly off this results in a massive text output and slowdown. Should the exact comparison be replaced with approximate (abs(np.sum(p, 1)-1)<0.00001)?

Separate trace from trace flag

Currently, Variables have a trace attribute that is a flag before sampling, then becomes a reference to an actual trace from the db backend. These should be different attributes.

likelihood function refactoring (numexpr_dist)

For those interested, these are the current timings for the likelihoods I've rewritten for numexpr.

The gist seems to be that at small sizes, numexpr is much slower. At larger sizes it is comparable and occasionally a bit faster. The problem with beta_like is (I think) that it requires numerous calls to gammaln which cannot be directly included in numexpr and must instead be precomputed.

I looked into modifying numexpr to allow user functions, but it looks very nontrivial.

I am additionally experimenting with automatically generating Cython code for the distributions given strings describing the calculation, and that seems somewhat promising. numpy.newiter appears to be coming out soon, and that comes with some relevant iteration improvements.

Given as numexpr_dist time/normal time:

number of elements : [1, 10, 100, 1000, 10000, 100000]
beta_like [ 106.779 55.677 63.131 100.663 421.671 3013.556]
betabin_like [ 52.397 20.327 3.46 0.913 0.824 0.721]
cauchy_like [ 39.158 13.16 14.841 4.276 1.919 1.241]
chi2_like [ 28.572 9.255 5.228 1.697 0.931 0.759]
exponential_like [ 42.131 11.882 7.805 1.144 0.677 0.324]
gamma_like [ 16.13 12.087 7.695 1.901 0.979 0.728]
geometric_like [ 28.92 18.659 16.294 5.197 2.161 1.235]
half_cauchy_like [ 3.568 2.686 1.298 0.991 0.993 0.976]
half_normal_like [ 70.651 33.408 28.134 9.692 2.772 1.295]
inverse_gamma_like [ 35.528 14.217 6.688 1.722 1.027 0.759]
laplace_like [ 5.171 5.481 3.755 1.37 0.658 0.425]
lognormal_like [ 112.921 56.943 27.621 12.079 3.219 2.084]
normal_like [ 60.752 27.092 24.34 9.203 2.694 1.37 ]
t_like [ 28.571 9.71 4.602 1.392 0.952 0.832]
noncentral_t_like [ 22.946 7.275 4.347 1.423 0.976 0.769]
uniform_like [ 31.675 16.325 16.355 20.127 57.899 276.854]
weibull_like [ 71.362 39.418 18.42 4.069 1.639 0.876]

To run these yourselves, run pymc/tests/speed_test.py in numexpr_dist.

naming of numpy based deterministics functions

Many of the deterministic functions have the same names as numpy functions and builtins. I imagine this can cause problems sometimes. It has already caused one internal problem (with any() conflicting with builtin any()). The best way to deal with this seems to me to move them into a subpackage, but I am not sure what name it should have. Suggestions?

flib.normal (and normal_like) returns nan instead of -inf

flib.normal (and normal_like) returns nan instead of -inf for very low values, resulting in sampler craching. For example: normal_like(-1e99,1e99,1e999)

Could be fixed by modifying normal_like:
res = flib.normal(x, mu, tau)
if np.isnan(res): res = -np.Inf
return res

HDF5 fails with Enthought Python

Enthought Python 7, which includes Python 2.7 and HDF5, runs into issues with the HDF5 backend. See this thread on the Google Group mailing list for a failure case.

pymc-devs / pymc Goto Github PK

pymc's Introduction

Features

Getting started

If you already know about Bayesian statistics:

Learn Bayesian statistics with a book together with PyMC

Audio & Video

Installation

Citing PyMC

Contact

License

Software using PyMC

General purpose

Domain specific

Papers citing PyMC

Contributors

Support

Professional Consulting Support

Sponsors

Thanks to our contributors

pymc's People

Contributors

Stargazers

Watchers

Forkers

pymc's Issues

In [1]: import pymc

What steps will reproduce the problem?

What version of the product are you using? On what operating system?

Please provide any additional information below.

creating MCMC model

sample

print stats (no problem here)

close the db

now, let's load the db

create a new model

check that the trace was loaded

try to compute stats (which fails)

.....................S.............S......S.............................................................................................E.E.SS.....................

ERROR: test_gradients (pymc.tests.test_gradients.test_gradients)

ERROR: test_model (pymc.tests.test_gradients.test_gradients)

Recommend Projects

Recommend Topics

Recommend Org