I would like to propose the following change in VarInfo</cod

After <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-

Rename logp to logjoint and cache loglikelihood by default about dynamicppl.jl HOT 11 CLOSED

turinglang commented on June 1, 2024

Rename logp to logjoint and cache loglikelihood by default

from dynamicppl.jl.

Comments (11)

trappmartin commented on June 1, 2024 2

Sounds like a very good idea to me.

Indeed, storing the likelihood values is necessary for most kinds of model selection, whether it is via WAIC or the model evidence. I'm not sure about the MLE use case.

But I feel that being able to store the log likelihood values for each observation & each sample would make sense and is in line with other PPLs (e.g. Stan has this feature and I believe PyMC too). But I would suggest that we don't store this information by default as it is only necessary in some cases.

from dynamicppl.jl.

willtebbutt commented on June 1, 2024 1

Fair enough, I'm happy now :)

from dynamicppl.jl.

mohamed82008 commented on June 1, 2024 1

Hmm storing the likelihood of each observation and each sample will require another refactor of the @preprocess macro in TuringLang/Turing.jl#965 and a few other things. I will do it in another PR since that PR is already quite big.

from dynamicppl.jl.

willtebbutt commented on June 1, 2024

Sounds reasonable, but what do you mean by log likelihood, as opposed to log joint?

from dynamicppl.jl.

mohamed82008 commented on June 1, 2024

log joint = log prior + log likelihood

from dynamicppl.jl.

mohamed82008 commented on June 1, 2024

After TuringLang/Turing.jl#965 we would be able to compute the likelihood only using the LikelihoodContext, this is useful for MLE for example. But for MCMC, we already compute the likelihood while sampling, we just don't store it. So storing it by default will allow the user to efficiently query the likelihood of the data given each parameter sample without having to recompute it.

from dynamicppl.jl.

willtebbutt commented on June 1, 2024

log joint = log prior + log likelihood

Is the distinction between these things clear in an arbitrary programme?

from dynamicppl.jl.

mohamed82008 commented on June 1, 2024

Hmm, I would define the log likelihood of the data given a set of parameters in an arbitrary Turing probabilistic programme as the logp that is produced by ignoring all the assume statements. That is every "random variable" or parameter is fixed to some value and doesn't contribute to the logp, only the observe functions contribute to the logp. From an implementation point of view, this quite clear. Do you think this is theoretically inaccurate?

from dynamicppl.jl.

willtebbutt commented on June 1, 2024

Okay, I see. So as I see it, there are two ways we could enable people to perform MLE (I'm assuming that this is the only real benefit of exposing the likelihood).

The kind of thing that you're suggesting.
Via MAP with (usually) improper priors i.e. the user specifies that MLE should be performed by picking a model for which MLE === MAP.

The main upside of 1 is a user can just specify that they want MLE to happen, and something MLE-like will happen without requiring them to modify your model. The benefit of 2 is that you avoid the main technical pitfall of 1 in that you avoid accidentally trying to "optimise" some RVs which have no effect on the likelihood. e.g.

x ~ A
y ~ B(x)
z ~ C(y) # only make observations of this guy

In this model the likelihood is only a function of y, not x, so the values you get for x will be meaningless. It also seems likely that it will make whatever optimiser you are using behave weirdly, but I'm not really sure about that.

I don't think either approach is strictly superior to the other, but I would like to think about this before committing to a design decision.

from dynamicppl.jl.

mohamed82008 commented on June 1, 2024

Beside MLE, computing the likelihood was also required for MLJ integration and to calculate things like LOO and WAIC (cc: @trappmartin and @sethaxen). Computing the log likelihood for a new observation or a new set of parameters is one thing and is already possible with TuringLang/Turing.jl#965, but this is not what I am proposing here. What I am proposing here is to store the already computed log likelihoods for the MCMC samples since we already have them. I think @trappmartin or @sethaxen might be able to comment better on whether this is useful or not.

from dynamicppl.jl.

yebai commented on June 1, 2024

We now have APIs: logjoint, logprior and loglikelihood.

Rename logp to logjoint and cache loglikelihood by default about dynamicppl.jl HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent