Giter Club home page Giter Club logo

Comments (11)

trappmartin avatar trappmartin commented on June 1, 2024 2

Sounds like a very good idea to me.

Indeed, storing the likelihood values is necessary for most kinds of model selection, whether it is via WAIC or the model evidence. I'm not sure about the MLE use case.

But I feel that being able to store the log likelihood values for each observation & each sample would make sense and is in line with other PPLs (e.g. Stan has this feature and I believe PyMC too). But I would suggest that we don't store this information by default as it is only necessary in some cases.

from dynamicppl.jl.

willtebbutt avatar willtebbutt commented on June 1, 2024 1

Fair enough, I'm happy now :)

from dynamicppl.jl.

mohamed82008 avatar mohamed82008 commented on June 1, 2024 1

Hmm storing the likelihood of each observation and each sample will require another refactor of the @preprocess macro in TuringLang/Turing.jl#965 and a few other things. I will do it in another PR since that PR is already quite big.

from dynamicppl.jl.

willtebbutt avatar willtebbutt commented on June 1, 2024

Sounds reasonable, but what do you mean by log likelihood, as opposed to log joint?

from dynamicppl.jl.

mohamed82008 avatar mohamed82008 commented on June 1, 2024

log joint = log prior + log likelihood

from dynamicppl.jl.

mohamed82008 avatar mohamed82008 commented on June 1, 2024

After TuringLang/Turing.jl#965 we would be able to compute the likelihood only using the LikelihoodContext, this is useful for MLE for example. But for MCMC, we already compute the likelihood while sampling, we just don't store it. So storing it by default will allow the user to efficiently query the likelihood of the data given each parameter sample without having to recompute it.

from dynamicppl.jl.

willtebbutt avatar willtebbutt commented on June 1, 2024

log joint = log prior + log likelihood

Is the distinction between these things clear in an arbitrary programme?

from dynamicppl.jl.

mohamed82008 avatar mohamed82008 commented on June 1, 2024

Hmm, I would define the log likelihood of the data given a set of parameters in an arbitrary Turing probabilistic programme as the logp that is produced by ignoring all the assume statements. That is every "random variable" or parameter is fixed to some value and doesn't contribute to the logp, only the observe functions contribute to the logp. From an implementation point of view, this quite clear. Do you think this is theoretically inaccurate?

from dynamicppl.jl.

willtebbutt avatar willtebbutt commented on June 1, 2024

Okay, I see. So as I see it, there are two ways we could enable people to perform MLE (I'm assuming that this is the only real benefit of exposing the likelihood).

  1. The kind of thing that you're suggesting.
  2. Via MAP with (usually) improper priors i.e. the user specifies that MLE should be performed by picking a model for which MLE === MAP.

The main upside of 1 is a user can just specify that they want MLE to happen, and something MLE-like will happen without requiring them to modify your model. The benefit of 2 is that you avoid the main technical pitfall of 1 in that you avoid accidentally trying to "optimise" some RVs which have no effect on the likelihood. e.g.

x ~ A
y ~ B(x)
z ~ C(y) # only make observations of this guy

In this model the likelihood is only a function of y, not x, so the values you get for x will be meaningless. It also seems likely that it will make whatever optimiser you are using behave weirdly, but I'm not really sure about that.

I don't think either approach is strictly superior to the other, but I would like to think about this before committing to a design decision.

from dynamicppl.jl.

mohamed82008 avatar mohamed82008 commented on June 1, 2024

Beside MLE, computing the likelihood was also required for MLJ integration and to calculate things like LOO and WAIC (cc: @trappmartin and @sethaxen). Computing the log likelihood for a new observation or a new set of parameters is one thing and is already possible with TuringLang/Turing.jl#965, but this is not what I am proposing here. What I am proposing here is to store the already computed log likelihoods for the MCMC samples since we already have them. I think @trappmartin or @sethaxen might be able to comment better on whether this is useful or not.

from dynamicppl.jl.

yebai avatar yebai commented on June 1, 2024

We now have APIs: logjoint, logprior and loglikelihood.

from dynamicppl.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.