Hello! First, a disclaimer, bayesian inference is still quite fuzz

GaussianARD in SumMultiply about bayespy HOT 4 CLOSED

bayespy commented on June 12, 2024

GaussianARD in SumMultiply

from bayespy.

Comments (4)

jluttine commented on June 12, 2024

Thanks for the message and for the very clearly written code example! :) Just a minor change to your code should fix the problem.

To understand the problem, you need to know that BayesPy considers Gaussian variables to have both shape and plates. shape defines the "shape" of the Gaussian variable and plates defines the "repetition" of these variables. plates defines sets that are independent in the posterior while shape defines the shape of the variable array in which elements are correlated in the posterior approximation. In your case, you want to (or need to) model the correlations on those axes that are summed over, thus you want to define those as shape axes and the non-summed axes as plates axes. May sound a bit complicated, especially if you're not familiar with variational Bayesian methods..

Anyway, this should fix your code:

b = bp.nodes.GaussianARD(0, 1e-3, shape=(n, c))  # use shape, not plates

However, I suppose you don't want to sum over axis s, so you need to modify the code a bit more to avoid that summing. There might be a slight difference to einsum because that key definition you used would sum over s axis in SumMultiply. So, if you don't want to sum over that axis, you need to change the code as follows:

x = np.random.normal(0, 1, (s, n, c)) # move s axis to be the first
b = bp.nodes.GaussianARD(0, 1e-3, shape=(n, c))
f = bp.nodes.SumMultiply('ij,ij', b, x)  # ignore s axis in the keys

The total shape of a GaussianARD variable is plates + shape, thus the shape axes are the trailing axes. Notice that I just moved s axes to be the first axis and the summed axes n and c are the last axes. In addition, SumMultiply can sum only shape axes, not plates axes. I can try to explain this plates and shape difference in more detail if you want. It can be a bit confusing.

Then a general comment about your approach if you're interested. :) Using a high-order polynomial to gain flexibility isn't the best way to go. You limit the possible functions to only those functions that are represented as such polynomials. I would suggest you consider Gaussian processes for the task, if you find yourself comfortable with them. They are an amazing tool for flexible non-linear regression problems using the Bayesian approach. GPs aren't yet implemented in BayesPy but they are available for instance in GPy https://github.com/SheffieldML/GPy or pyGPs https://github.com/marionmari/pyGPs or a few other toolboxes. Just a thought. But if you want to do polynomial regression, then your approach with BayesPy is good. :)

The regression code may not scale to very large datasets because of how BayesPy handles messages internally, but I hope you won't be affected by that problem. Just letting you know if you happen to hit that problem with bigger datasets that you think should be easily inside the computational limits.

Did this help? Please don't hesitate to ask further questions.

from bayespy.

azane commented on June 12, 2024

Thanks so much for the prompt and most helpful response! Those additions/changes did the trick; and the high order polynomial method successfully modeled this round of test data.
I think I understand the plates-shape differentiation, at least enough to refresh my memory next time. Looking back at the linear regression example, I see that shape is very clearly used, and not plates. Oops. : /
I still haven't quite wrapped my mind around Einstein summation...in general, so I will have to re-watch that Einstein summation playlist on youtube. ; )

Your comment on the general approach is most welcome. Per your suggestion, I looked into Gaussian processes, and they look like the ticket! But I have a question: part of the reason I wanted to use high order polynomials is because I can easily take their derivatives programatically. Zooming out even more, I am working on a reinforcement learning model where the learner attempts to climb the gradient to maximize the dependent value, so I'll need the derivative (possibly the second, third, etc.) of the inferred model. So...how difficult is it to compute the gradient of a Gaussian process model?

Thanks again for the help!

from bayespy.

jluttine commented on June 12, 2024

About GP derivatives, see section 9.4 in Gaussian Processes for Machine Learning (Rasmussen & Williams), available here: http://www.gaussianprocess.org/gpml/chapters/RW9.pdf . (Great book, I recommend you take a look at it.) In short, "since differentiation is a linear operator, the derivative of a Gaussian process is another Gaussian process". You just basically use the derivative of your covariance function as the covariance function of your derivative. Simple. So you not only get a posterior distribution over functions but also a posterior distribution over the derivative (and even higher order derivatives). I'm not sure how easily the packages I mentioned support this kind of operation, but in any case it should be relatively easy and straightforward to implement derivatives of covariance functions yourself. Cheers! :)

from bayespy.

azane commented on June 12, 2024

Wow! Thanks for the above-and-beyond assistance! That is a great resource saying exactly what I wanted to hear; looks like I have my work cut out for me.
Thanks for pointing me in the right direction. : )

from bayespy.

GaussianARD in SumMultiply about bayespy HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent