Comments (35)
See the paper by A. Gelman, J. Hwang, and A. Vehtari on Understanding predictive information criteria for Bayesian models for example.
from dynamicppl.jl.
Here is a summary of some proposals in this issue (biased towards personal aesthetic)
logp(m::Model, v::VarInfo, s::Selector=undef)
computes log density/mass for modelm
at valuev::VarInfo
Sampler(m, alg)
creates asampler
formodel
usingalg
as the sampling methodrand(s::Sampler(m, alg), N)
produces a Vector{<:VarInfo} of length N.rand(s::Sampler(m, SampleFromPrior()), N)
produces a Vector{<:VarInfo} of length N, by sampling from the prior distribution while ignoring all observations (including something like5~Normal(0,1)
).sample(m, alg; N=N, args...)
provides an alias torand(Sampler(model, alg), N, args...)
v = rand(m::Model, N=1)
is an alias tov = rand(Sampler(m, SampleFromPrior()), N)
This would provide a relatively well-defined interface between the compiler and inference methods (e.g. variational methods and sampling methods). For variational methods, the most important API is the logp
function and its gradients. While for sampling methods, both the logp
function and the rand
function are necessary. This would allow us to clean up the code in many sampler's implementations. Such as examples below:
-
runmodel!(m::Model, v::VarInfo, spl::AbstractSampler)
==>logp(m::Model, v::VarInfo)
https://github.com/TuringLang/Turing.jl/blob/6d48b96b03617f4410fd90becd7fb3a6c0d7e7e8/src/core/VarReplay.jl#L106 -
v = model(v::VarInfo, s::AbstractSampler)
==>v= rand(Sampler(m, alg), 1)
https://github.com/TuringLang/Turing.jl/blob/bedbb5a10d3da794957d64677f0fcfa9542ac830/src/inference/hmc.jl#L133 -
Introduce an intermediate algorithm type:
AbstractRunner
type SampleFromPrior <: AbstractRunner end
,- similiarly,
ComputeLogDensity
,SampleFromUniform
,ParticleFiltering
- define
assume
,observe
functions forRunner
types - helpful to enable sharing code among (high-level) inference algorithms, e.g. HMC, Gibbs, VI.
https://github.com/TuringLang/Turing.jl/blob/6d48b96b03617f4410fd90becd7fb3a6c0d7e7e8/src/Turing.jl#L65
A long term goal is to build up a clear separation between the compiler and inference methods (see TuringLang/Turing.jl#456). Such that inference methods can be implemented in separate packages with a dependency on Turing.
Anything missing?
from dynamicppl.jl.
then we should take these anonymous random variables more seriously and treat them more like named random-variables.
I don't foresee a problem handling these. I can try a proof-of-concept after the VarInfo
refactor. Perhaps @trappmartin can also help with this after he reads the new compiler code.
from dynamicppl.jl.
However, I’m not sure how inference with VI would fit into those syntaxes to stay consistent without having too many entry points for the user.
My thinking here is that VI works fine with the rand(sampler(model, alg), N)
design, where the alg
parameter would be a representation of the approximate posterior distribution. Semantically this reads as: draw N samples from the posterior distribution over the random variables in model
(each of which should be a VarInfo?) using alg
. This works for both VI and MCMC, so provides a single point of entry for both kinds of samplers.
Though, I believe it would be good if we keep the sample function. It’s semantically more clear to me what this does and that I can expect it to generate samples using a MCMC sampler.
Again, my feeling here is that we wouldn't need the sample
function, as the creation of the sampler, e.g. rand(sampler(model, HMC(...)), N)
, makes it quite clear that we're drawing N
samples from model
using the specified HMC
algorithm. Although I could live with the sample
function being an alias for rand
, I suspect that rand
is more consistent with existing stuff.
I’m pretty much in favour of the vi = rand(Sampler(model)) and the logpdf(model, vi) concept.
As am I :)
A minor stylistic thing, @yebai what would your thoughts be on making the "correct" thing to do be sampler(model, alg)
rather than Sampler(model, alg)
? (At least to me) the latter says make me an object of type Sampler
whereas the former says make me a thing that can generate samples from model
using alg
, which may or may not be an object of type Sampler
. The distinction is important because the Sampler
option commits us to always producing an object of type Sampler
if we want to produce samples, whereas the former allows us to spit out whatever object we like provided that it can be used to generate samples via rand
.
from dynamicppl.jl.
How about AbstractRunner
-> AbstractTilde
from dynamicppl.jl.
I don't know that we actually need a supertype here. I mean, is there going to be a fallback that can be applied to Abstract<Thing>
? If not, we can just create concrete types as required and avoid the hassle of having to choose and name for an abstract type, and forcing things to inherit from it.
from dynamicppl.jl.
Let's stick with AbstractRunner
type for now and change it to something else when we got better ideas.
from dynamicppl.jl.
@mohamed82008 I think this is a good idea. Thanks! I'll take a closer look into this asap.
from dynamicppl.jl.
Is there anything here that has not already implemented in the meantime, or couldn't be moved to the discussions at AbstractPPL?
from dynamicppl.jl.
Nothing that I know of. Please feel free to close, and/or move some useful bits to AbstractPPL
.
from dynamicppl.jl.
@willtebbutt @xukai92
how about we always drop all observe statements in gdemo()()
, ie, we always sample all variables from the prior, even for the following case raised by @willtebbutt (slightly modified)
@model demo(y) = begin
5 ~ Normal(0,1) # rv1
y ~ Normal(0,1) . # rv2
end
gdemo()() # sample both `rv1` and `rv2` from the prior, ingoring `rv1=5`
gdemo(2)() # sample from the posterior
This leads to consistent behaviour and should be intuitive too.
from dynamicppl.jl.
I see. But what I mean by this issue is to actually have some concrete functions generated by evaluating gdemo()
(I think @willtebbutt also proposed this some time ago).
Currently behaviour:
@model demo(y) = begin
5 ~ Normal(0,1) # rv1
y ~ Normal(0,1) . # rv2
end
mf = gdemo()
mf() # => sampling from prior
But instead we could have
@model demo(y) = begin
5 ~ Normal(0,1) # rv1
y ~ Normal(0,1) . # rv2
end
mf = gdemo(???) # this returns mf::UniqueRuntimeModelType
# and create some function definitions:
logpdf(::UniqueRuntimeModelType, vi) = ... # (1) return logpdf by evaluating using vi
rand(::UniqueRuntimeModelType) = ... # (2) gdemo()() in your example
sample(::UniqueRuntimeModelType) = ... # (3) gdemo(2)() in your example
I guess it could also potentially give more space for performance because during sample we are explicitly using the specific version of logpdf
and rand
.
from dynamicppl.jl.
@yebai your proposal has the issue that the behaviour of the following programme becomes ambiguous:
@model demo() = begin
5 ~ Normal(0, 1)
end
demo() # what does this do?
This makes me nervous. On the one hand, this programme is weird, on the other it's a valid Turing programme, so we should ensure that the behaviour is well-defined. My proposal below will mostly resolve this.
@xukai92 I completely agree regarding the format of logpdf
. Regarding rand
and sample
: certainly we could define one to sample from the prior and the other the posterior, but this wouldn't be particularly intuitive (there's nothing in the names to suggest the proposed behaviour). Instead, I propose the following:
- Don't define
sample
, just userand
andlogpdf
(or maybelogp
, sincelogpdf
isn't defined for all models supported by Turing?). - The current proposal regarding
gdemo(missing)
to indicate that the prior should be used fory
seems sensible to me. - We implement
gdemo(Prior())
to indicate that all random variables should be sampled from the prior. In this case, bothrv1
andrv2
would be sample from the prior. rand(mf)
produces aVarInfo
.rand(mf, N)
produces aVector{<:VarInfo}
of lengthN
.logpdf(mf, vi)
works as proposed by @xukai92.
from dynamicppl.jl.
One the one hand, this programme is weird, on the other it's a valid Turing programme, so we should ensure that the behaviour is well-defined.
We can give an error if the model has no variables.
UniqueRuntimeModelType
This can be just const PriorModel{pvars, F} = Model{pvars, Tuple{}, F, NamedTuple{(), Tuple{}}}
which means no data, all parameters. For this case, no special sampling alg is required. More generally however, we can make use of the existing Sampler
type, which we can add model
(and vi
) to. Then we can simply do: rand(Sampler(model, alg))
or for models without parameters, it can be rand(Sampler(model))
.
Btw, after the new refactor, if you call gdemo()
, it will give you a PriorModel{pvars, F}
, with pvars
as Tuple{:y}
.
from dynamicppl.jl.
We can give an error if the model has no variables.
Agreed, but this programme definitely has a random variable. It's anonymous, but it's definitely there, and if we define sample-from-prior functionality, then presumably something should happen.
Btw, after the new refactor, if you call gdemo(), it will give you a PriorModel{pvars, F}, with pvars as Tuple{:y}.
This is probably fine for now, but I'm not sure it's appropriate long-term for the same reasons as above.
This can be just const PriorModel{pvars, F} = Model{pvars, Tuple{}, F, NamedTuple{(), Tuple{}}} which means no data, all parameters. For this case, no special sampling alg is required. More generally however, we can make use of the existing Sampler type, which we can add model (and vi) to. Then we can simply do: rand(Sampler(model, alg)) or for models without parameters, it can be rand(Sampler(model)).
I like this. We would have to think carefully about how the rand(Sampler(model, alg))
would work for MCMC though. We should also probably implement a few convenience functions as well i.e. rand(model) = rand(Sampler(model))
and similar.
from dynamicppl.jl.
It's anonymous, but it's definitely there, and if we define sample-from-prior functionality, then presumably something should happen.
Is there a practical application of sampling this anonymous random variable, or is it just for theoretical appeal? Note that implementation-wise, there is nothing random about a model with no explicit random variables; every time you run the model, you will get exactly the same thing, hence my difficulty digesting this anonymous random variable concept. I can see how 5
plays the role of data. So perhaps more generally this falls under likelihood models, where the only sensible thing to do with it, that I can think of, is to calculate the likelihood.
We should also probably implement a few convenience functions as well i.e. rand(model) = rand(Sampler(model)) and similar.
Sounds good.
from dynamicppl.jl.
Note that currently, we just ignore any numbers or arrays on the LHS of a ~
. If you want, we can assign an anonymous dvar
to it instead and assign the value on the LHS of the ~
to this anonymous dvar
. This would probably be more theoretically accurate. And it allows us to view the model with 5 ~ Normal(0, 1)
only as a LikelihoodModel
with 1 dvar
.
Edit: by ignore I mean that we don't treat it as an explicit variable, but it still affects the logp
calculation.
from dynamicppl.jl.
Is there a practical application of sampling this anonymous random variable, or is it just for theoretical appeal?
I'm just looking for consistency. I don't have a particular application in mind :)
Note that implementation-wise, there is nothing random about a model with no explicit random variables; every time you run the model, you will get exactly the same thing.
Not if you run the model in sample-from-prior mode.
If you want, we can assign an anonymous dvar to it instead and assign the value on the LHS of the ~ to this anonymous dvar. This would probably be more theoretically accurate. And it allows us to view the model with 5 ~ Normal(0, 1) only as a LikelihoodModel with 1 dvar.
This general kind of thing would be a good change. If the model is constructed in the regular way, it should be a dvar
as you suggest, but if the model is run in sample-from-prior mode, then it should presumably magically become a dvar
.
from dynamicppl.jl.
Not if you run the model in sample-from-prior mode.
@model demo() = begin
5 ~ Normal(0,1)
end
model = demo()
logp = model(Turing.VarInfo(), Turing.SampleFromPrior())
all(isequal(logp), (model(Turing.VarInfo(), Turing.SampleFromPrior()) for i in 1:100))
vi
is empty in every case, with a logp
only.
from dynamicppl.jl.
then it should presumably magically become a dvar.
This should be totally possible when constructing the Sampler
.
from dynamicppl.jl.
Sorry, I should have been clearer. I was referring to what I think should be going on rather than what actually happens at the minute. You're totally correct regarding the current behaviour. It's my feeling that if we implement a sampler-from-prior mode, then we should take these anonymous random variables more seriously and treat them more like named random-variables.
from dynamicppl.jl.
Thanks for cc‘ing me. I’m pretty much in favour of the vi = rand(Sampler(model))
and the logpdf(model, vi)
concept. Though, I believe it would be good if we keep the sample
function. It’s semantically more clear to me what this does and that I can expect it to generate samples using a MCMC sampler. I would not expect it to draw from the prior as rand.
However, I’m not sure how inference with VI would fit into those syntaxes to stay consistent without having too many entry points for the user.
from dynamicppl.jl.
Reading through your explanation of sample
v.s. Sampler
, I feel sample
is better, although in most of the cases (at this moment), sample
will return a Sampler
to us. Also I think using a function rather than a type conversion syntax is a more standard Julia style.
from dynamicppl.jl.
v = rand(m::Model, N=1) is an alias to v = rand(Sampler(m, SampleFromPrior()), N)
How about enabling this alias with support to sampling algorithm as well.
v = rand(m::Model, N=1; alg=SampleFromPrior()) = rand(Sampler(m, alg), N)
from dynamicppl.jl.
I suggest during the model compilation we generate 2 functions inside Model
:
- One for the log likelihood which replaces all the
a ~ dist
lines witha ~ Fixed()
where theassume
ofFixed
returns the variable value invi
and 0 for the logpdf. This is useful for maximum likelihood estimation. - The second is the logpdf of the joint probability distribution that we have now. This is to be used for MAP and inference.
from dynamicppl.jl.
@yebai I had some time to read through your proposal. It sound quite reasonable to me. However, I have a few questions / remarks.
- What is the purpose of the
Selector
in thelogp
function? Do we need that?
In favour of what Mohamed suggested and the latest discussions on model criticism, it might be good to have:
logjoint(m::Model, v::VarInfo, s::Selector=undef)
==logp
logpdf(m::Model, v::VarInfo, s::Selector=undef)
logppd(m::Model, v::Vector{VarInfo}, s::Selector=undef)
As proposed by Mohamed we could do this by generating respective functions in the model macro.
from dynamicppl.jl.
Could someone please link the relevant discussions here?
@trappmartin could you please explain the above proposal in a bit more depth? What are the differences between each of the functions? What are logjoin
and logppd
short for? Thanks!
@mohamed82008 could you please expand on your proposal? It's not obvious to me why we need a separate model construction to perform maximum likelihood / when we would actually be interested in performing maximum likelihood in our context: if the user has gone to the trouble to specify priors over parameters, why would they wish to ignore them?
from dynamicppl.jl.
logjoint
= log p(\theta, x)
, i.e. the log of the joint probability function of the model.
logppd
= \sum_i^N log 1/T \sum_t^T p(x_i | \theta_t)
, i.e. the log pointwise predictive density function.
from dynamicppl.jl.
I don't think we want maximum likelihood. But I agree it would be good to provide an interface to compute the log likelihood.
from dynamicppl.jl.
Sorry, I'm not familiar with the logppd
, could you explain why it's useful?
logjoin = log p(\theta, x)
, i.e. the log of the joint probability function of the model.
If we're going to rename logp
to something, could we please write logjoint
? For the cost of an extra character it's quite a bit less ambiguous haha
from dynamicppl.jl.
Uh, sorry that was a typo. I meant logjoint
. :D
The log pointwise predictive density is useful for model criticism and model selection.
from dynamicppl.jl.
Uh, sorry that was a typo. I meant logjoint. :D
haha fair enough. Glad we're on the same page with this.
The log pointwise predictive density is useful for model criticism and model selection.
Is there a good reference for this, or is this something that should be obvious?
from dynamicppl.jl.
@mohamed82008 could you please expand on your proposal? It's not obvious to me why we need a separate model construction to perform maximum likelihood / when we would actually be interested in performing maximum likelihood in our context: if the user has gone to the trouble to specify priors over parameters, why would they wish to ignore them?
I just thought it would be cool to compare ML, MAP, VI and MCMC without changing the model definition! But @trappmartin has a lot more interesting use cases :)
from dynamicppl.jl.
What is the purpose of the Selector in the
logp
function? Do we need that?
you're right - Selector
here is not necessary. This might be taken from some interface discussed in https://github.com/TuringLang/Turing.jl/issues/634#issuecomment-44965726, i.e.
# s::Sampler ==> s::Selector
logp = model(Turing.VarInfo(), s::Sampler)
Since logp
will always compute the log joint probability in the new design, we don't really need s::Selector
to select subsets of random variables.
I suggest during the model compilation we generate 2 functions inside Model:
One for the log likelihood which replaces all the a ~ dist lines with a ~ Fixed() where the assume of Fixed returns the variable value in vi and 0 for the logpdf. This is useful for maximum likelihood estimation.
The second is the logpdf of the joint probability distribution that we have now. This is to be used for MAP and inference.
I just thought it would be cool to compare ML, MAP, VI and MCMC without changing the model definition! But @trappmartin has a lot more interesting use cases :)
@mohamed82008 I'm not sure I understand the point here. For MAP and VI based learning, all we need is logp
and its gradient. We don't need to modify the compiler.
logjoint = log p(\theta, x), i.e. the log of the joint probability function of the model.
logppd= \sum_i^N log 1/T \sum_t^T p(x_i | \theta_t), i.e. the log pointwise predictive density function.
See the paper by A. Gelman, J. Hwang, and A. Vehtari on Understanding predictive information criteria for Bayesian models for example.
It seems logjoint
is the same as logp
?
I agree it's helpful to have some support for model evaluation and criticism, but I prefer to keep the standard API between compiler and inference minimal. These additional features can be supported in a separate module (e.g. MCMCChains
). In fact, it appears that logppd
can be implemented in the form of a new Runner
or even InferenceAlgorithm
.
from dynamicppl.jl.
It seems
logjoint
is the same aslogo
?
Yes, seems more telling to me than logp
I agree it's helpful to have some support for model evaluation and critism, but I prefer to keep the standard API between compiler and inference minimal. These additional features in a seprate module. In fact, it apears that
logppd
can be implemented in the form of a newRunner
or evenInferenceAlgorithm
.
Good point. I'll try to do so as it seems relevant for several people.
from dynamicppl.jl.
Related Issues (20)
- error with invlink!! and Dirichlet HOT 12
- Should we have a context to indicate that we're not performing inference?
- @model macro breaks on conditional branches with return values HOT 2
- MCMCChains for Julia < 1.9
- `logprior`, `loglikelihood`, `logjoint` are not working with `MCMCChains` in Julia 1.9 HOT 1
- Derived variables from data on the LHS of tilde HOT 10
- Decouple from Distributions.jl HOT 8
- ThreadSafeVarInfo(::SimpleVarInfo) bug with ForwardDiff HOT 1
- Need alternative to `NamedTuple` for `SimpleVarInfo` HOT 12
- Storing returned values in `VarInfo`.
- `BangBang.setindex!!(vi::SimpleVarInfo{<:AbstractDict}, val, vn::VarName)` returns eltype Any arrays when `vn` is multidimensional HOT 1
- Remove `tonamedtuple`
- Can't use `@info` in models HOT 1
- Move TestUtils to an extension
- `SimpleVarInfo` sometimes returns a `SimpleVarInfo` with `Int`-type `logp` and creates down-stream errors HOT 1
- Roadmap for removal of `Selector` shenanigans HOT 2
- change defaults: @addlogprob! not included in PriorContext() HOT 5
- Type-instability when using type-parameters HOT 1
- Type instability in unflatten for medium / large tuples HOT 6
- DynamicPPL..jl tests 23 Broken. HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dynamicppl.jl.