Comments (11)
Sounds like a very good idea to me.
Indeed, storing the likelihood values is necessary for most kinds of model selection, whether it is via WAIC or the model evidence. I'm not sure about the MLE use case.
But I feel that being able to store the log likelihood values for each observation & each sample would make sense and is in line with other PPLs (e.g. Stan has this feature and I believe PyMC too). But I would suggest that we don't store this information by default as it is only necessary in some cases.
from dynamicppl.jl.
Fair enough, I'm happy now :)
from dynamicppl.jl.
Hmm storing the likelihood of each observation and each sample will require another refactor of the @preprocess
macro in TuringLang/Turing.jl#965 and a few other things. I will do it in another PR since that PR is already quite big.
from dynamicppl.jl.
Sounds reasonable, but what do you mean by log likelihood, as opposed to log joint?
from dynamicppl.jl.
log joint = log prior + log likelihood
from dynamicppl.jl.
After TuringLang/Turing.jl#965 we would be able to compute the likelihood only using the LikelihoodContext
, this is useful for MLE for example. But for MCMC, we already compute the likelihood while sampling, we just don't store it. So storing it by default will allow the user to efficiently query the likelihood of the data given each parameter sample without having to recompute it.
from dynamicppl.jl.
log joint = log prior + log likelihood
Is the distinction between these things clear in an arbitrary programme?
from dynamicppl.jl.
Hmm, I would define the log likelihood of the data given a set of parameters in an arbitrary Turing probabilistic programme as the logp
that is produced by ignoring all the assume
statements. That is every "random variable" or parameter is fixed to some value and doesn't contribute to the logp
, only the observe
functions contribute to the logp
. From an implementation point of view, this quite clear. Do you think this is theoretically inaccurate?
from dynamicppl.jl.
Okay, I see. So as I see it, there are two ways we could enable people to perform MLE (I'm assuming that this is the only real benefit of exposing the likelihood).
- The kind of thing that you're suggesting.
- Via MAP with (usually) improper priors i.e. the user specifies that MLE should be performed by picking a model for which MLE === MAP.
The main upside of 1 is a user can just specify that they want MLE to happen, and something MLE-like will happen without requiring them to modify your model. The benefit of 2 is that you avoid the main technical pitfall of 1 in that you avoid accidentally trying to "optimise" some RVs which have no effect on the likelihood. e.g.
x ~ A
y ~ B(x)
z ~ C(y) # only make observations of this guy
In this model the likelihood is only a function of y
, not x
, so the values you get for x
will be meaningless. It also seems likely that it will make whatever optimiser you are using behave weirdly, but I'm not really sure about that.
I don't think either approach is strictly superior to the other, but I would like to think about this before committing to a design decision.
from dynamicppl.jl.
Beside MLE, computing the likelihood was also required for MLJ integration and to calculate things like LOO and WAIC (cc: @trappmartin and @sethaxen). Computing the log likelihood for a new observation or a new set of parameters is one thing and is already possible with TuringLang/Turing.jl#965, but this is not what I am proposing here. What I am proposing here is to store the already computed log likelihoods for the MCMC samples since we already have them. I think @trappmartin or @sethaxen might be able to comment better on whether this is useful or not.
from dynamicppl.jl.
We now have APIs: logjoint
, logprior
and loglikelihood
.
from dynamicppl.jl.
Related Issues (20)
- Remove `NamedDist` in favour of `VarName` interpolation HOT 1
- Simplify `assume`/`observe` design HOT 1
- Error with `.~` and `rand` HOT 2
- Supporting mutating ADs in models that fill arrays of parameters HOT 7
- Roadmap for depreciating `VarInfo` in favour of `SimpleVarInfo` HOT 5
- Type instability: assigning slices in assumptions HOT 2
- "[DynamicPPL] attempt to link a linked vi" warning when aborting sampling and returning minus infinity HOT 6
- Tests fail on Julia 1.8 HOT 2
- Name clash caused by submodels is hard to debug HOT 3
- `TypedVarInfo` failing for certain models over empty vectors HOT 1
- Remove use of `threadid` HOT 9
- Models with dynamic dimensionality
- Possibly confusing `.~` meaning HOT 6
- WARNING: Method definition subsumes [...] overwritten HOT 4
- Support for linking distributions with embedded support HOT 10
- Use merge queue instead of bors? HOT 1
- InferenceObjects integration HOT 12
- Adding StatsBase.predict to the API HOT 7
- `LogDensityFunction`: Temporary variable is captured as a model parameter? HOT 3
- Conditioning with Turing Chains `name_map` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dynamicppl.jl.