turinglang / dynamicppl.jl Goto Github PK
View Code? Open in Web Editor NEWImplementation of domain-specific language (DSL) for dynamic probabilistic programming
Home Page: https://turinglang.org/DynamicPPL.jl/
License: MIT License
Implementation of domain-specific language (DSL) for dynamic probabilistic programming
Home Page: https://turinglang.org/DynamicPPL.jl/
License: MIT License
@JuliaRegistrator register()
I would like to propose the following change in VarInfo
. I would like the logp
field to be called logjoint
and to have another field loglikelihood
that caches the log likelihood by default and is then returned by the Chain
. What do you think?
Currently we have 2 arguments for changing the behavior of assume
and observe
, context
and sampler
. I think:
Sampler
belongs in a field in DefaultContext
, perhaps renaming it to SamplingContext
,SampleFromPrior
should always overwrite the values in VarInfo
much like SampleFromUniform
in #78, andComputeLogpContext
can be another context which encompasses LikelihoodContext
, PriorContext
and the current behavior of DefaultContext
with SampleFromPrior
.I always found it confusing the SampleFromPrior
doesn't actually sample anything if VarInfo
already has values.
MEW:
using Random: AbstractRNG
using Distributions: DiscreteMultivariateDistribution
import Distributions: rand, logpdf
struct DSL <: DiscreteMultivariateDistribution
end
function logpdf(dsl::DSL, dsl_numeric::AbstractVector{Int})
return sum([log(0.5) * 0.5^i for i in 1:length(dsl_numeric)])
end
function rand(rng::AbstractRNG, dsl::DSL)
fst = rand(rng, [0, 1])
dsl_numeric = [fst]
while rand() < 0.5
push!(dsl_numeric, rand(rng, [0, 1]))
end
return dsl_numeric
end
using Turing
@model function mwe()
dsl ~ DSL()
end
chain = sample(mwe(), PG(10), 500)
gives
Stacktrace in the failed task:
tried to assign 2 elements to 7 destinations
throw_setindex_mismatch(::Array{Int64,1}, ::Tuple{Int64}) at indices.jl:191
setindex_shape_check at indices.jl:242 [inlined]
setindex! at array.jl:850 [inlined]
setval!(::DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::Array{Int64,1}, ::DynamicPPL.VarName{:dsl}) at varinfo.jl:284
setindex!(::DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::Array{Int64,1}, ::DynamicPPL.VarName{:dsl}) at varinfo.jl:939
assume(::DynamicPPL.Sampler{PG{()},Turing.Inference.PGState{DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},Float64}}, ::DSL, ::DynamicPPL.VarName{:dsl}, ::DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}) at AdvancedSMC.jl:272
_tilde at context_implementations.jl:39 [inlined]
tilde at context_implementations.jl:18 [inlined]
macro expansion at compiler.jl:348 [inlined]
macro expansion at In[4]:9 [inlined]
##inner_function#340#6 at compiler.jl:493 [inlined]
#_#5 at model.jl:24 [inlined]
Model at model.jl:24 [inlined]
Model at model.jl:23 [inlined]
#3 at container.jl:37 [inlined]
(::Libtask.var"#1#2"{Turing.Core.var"#3#4"{DynamicPPL.Model{var"##inner_function#340#6",NamedTuple{(),Tuple{}},DynamicPPL.ModelGen{(),var"###mwe#348",NamedTuple{(),Tuple{}}},Val{()}},DynamicPPL.Sampler{PG{()},Turing.Inference.PGState{DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},Float64}},DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}})() at taskcopy.jl:38
Soss has a pretty string representation of a model that I think would be nice to have in DPPL. We can store a string in the Model
struct with the string representation of the model.
Discussed here.
tilde_assume
and tilde_observe
currently fall back to different methods of tilde
/dot_tilde
, which are distinguished by their arguments. And these methods in turn call the assume
and observe
methods from the inference algorithm. Some inference algorithms also (or instead) overload methods of tilde
directly.
Until now the tilde_{observe,assume}
methods are only an internal wrapper and work backwards compatibly.
I propose
assume
, observe
, and dot_{assume,observe}
to be overloaded and used by inference algorithms, andtilde_{assume,observe}
and dot_tilde_{assume,observe}
for internal usage (unpacking stuff, and unifying the return values).Is the following behaviour really intended?
julia> vi
/=======================================================================
| VarInfo
|-----------------------------------------------------------------------
| Varnames : ["p"]
| Range : UnitRange{Int64}[1:1]
| Vals : Real[0.9574412442107834]
| GIDs : Set{DynamicPPL.Selector}[Set([])]
| Orders : [0]
| Logp : -0.043490923558696964
| #produce : 1
| flags : Dict("del" => [0],"trans" => [0])
\=======================================================================
julia> vi[VarName{:p}("")]
0.9574412442107834
julia> vi[VarName{:p}("")] = 0.5 # why not allow this?
ERROR: ArgumentError: indexed assignment with a single value to many locations is not supported; perhaps use broadcasting `.=` instead?
Stacktrace:
[1] setindex_shape_check(::Float64, ::Int64) at ./indices.jl:234
[2] macro expansion at ./multidimensional.jl:722 [inlined]
[3] _unsafe_setindex!(::IndexLinear, ::Array{Real,1}, ::Float64, ::UnitRange{Int64}) at ./multidimensional.jl:717
[4] _setindex! at ./multidimensional.jl:712 [inlined]
[5] setindex! at ./abstractarray.jl:1074 [inlined]
[6] setval!(::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Real}, ::Float64, ::VarName{:p}) at /home/philipp/.julia/packages/DynamicPPL/PppLw/src/varinfo.jl:311
[7] setindex!(::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Real}, ::Float64, ::VarName{:p}) at /home/philipp/.julia/packages/DynamicPPL/PppLw/src/varinfo.jl:966
[8] top-level scope at REPL[36]:1
julia> vi[VarName{:p}("")] = [0.5]
1-element Array{Float64,1}:
0.5
using Turing
@model naive_bayes(image, label, D, N, C, ::Type{T}=Float64) where {T<:Real} = begin
m = Matrix{T}(undef, D, C)
for c = 1:C
m[:,c] ~ MvNormal(fill(0, D), 10)
end
# m .~ Normal(0, 10) # this gives incorrect answer
for n = 1:N
image[:,n] ~ MvNormal(m[:,label[n]], 1)
end
end
Checking answer via
using MLDatasets
image_raw = reshape(MNIST.traintensor(Float64), 28*28, :)
label = MNIST.trainlabels() .+ 1
# Pre-processing
using MultivariateStats
D_pca = 40
pca = fit(PCA, image_raw; maxoutdim=D_pca)
image = transform(pca, image_raw)
@info "Peformed PCA to reduce the dimension to $D_pca"
# Data function
get_data(n=100) = Dict(
"C" => 10,
"D" => D_pca,
"N" => n,
"image" => image[:,1:n],
"label" => label[1:n]
)
data = get_data()
model = naive_bayes(data["image"], data["label"], data["D"], data["N"], data["C"])
alg = HMC(0.1, 4)
n_samples = 2_000
chain = sample(model, alg, n_samples)
m_data = Array(group(chain, :m))
m_bayes = mean(
map(
i -> reconstruct(pca, Matrix{Float64}(reshape(m_data[i,:], D_pca, 10))),
1_000:100:2_000
)
)
using PyPlot
function make_imggrid(imgs, n_rows, n_cols; gap::Int=1)
n = length(imgs)
d_row, d_col = size(first(imgs))
imggrid = 0.5 * ones(n_rows * (d_row + gap) + gap, n_cols * (d_col + gap) + gap)
i = 1
for row = 1:n_rows, col = 1:n_cols
if i <= n
i_row = (row - 1) * (d_row + gap) + 1
i_col = (col - 1) * (d_col + gap) + 1
imggrid[i_row+1:i_row+d_row,i_col+1:i_col+d_col] .= imgs[i]
else
break
end
i += 1
end
return imggrid
end
imgs = [reshape(m_bayes[:,i], 28, 28)' for i in 1:size(m_bayes, 2)]
plt.figure(figsize=(5, 2))
plt.imshow(make_imggrid(imgs, 2, 5), cmap="gray")
It should give something like below:
Edit (05/2021): Updated example
One optimization that Turing lacks for Gibbs samplers is the ability to re-use computations that only depend on parameters that are not updated in the current Gibbs inner iteration. This can be an expensive operation that only needs to be performed once every outer iteration.
To implement the above, the following steps can be taken:
vi.metadata.param.vals
for every parameter param
.var
s "inflicted" by this number type. Keep only the isbits
of them.isbits
, non-local variables var
s and the parameters param
s.cache::NamedTuple
of all the recorded variables var
s mapping from their symbol to a Ref
wrapper of their value.Cached
that closes over that named tuple. The function Cached
maps from sub_spl::Sampler
to a tuple of all the variables var
s that do not depend on any param ∈ getspace(sub_spl)
. This closure acts like a compile-time dictionary.Cached
closure and cache
NamedTuple
get stored in the StaticModel
type which replaces the generic Model
type.staticmodel(vi, sub_spl)
:cache = staticmode.cache
at the beginning followed by var = cache.var[]
for all the var
s in cache
, these are all isbits
.var = ...
for every var
in Cached(sub_spl)
var = ....
for every var
in cache
but not in Cached(sub_spl)
with var = ....; cache.var[] = value(var)
, where value(var)
calls ForwardDiff.value(var)
for Dual
s for example and similarly for Tracker
.To ensure the validity of the above transformation, one would need to prohibit functions that modify their arguments to be used in this form var = f(args...)
. f(args...)
can still be used in its own line without assigning to var
.
Another proposal is to make the user opt-in for some variables to be cached using @cache var = f(args...)
. Using this, the user guarantees that:
var
is not modified by f
or any other function. var
being isbits
is a sufficient but not necessary condition for that.f(args...)
without any side-effects from not modifying one of args
when we should have.var
is not a local variable in a loop which needs to change in each iteration so caching here is just wrong.I like this proposal better as it is inherently safer to use since the user opts-in knowing there could be bad consequences if @cache
is misused. This proposal will require modifying step 1, 3 and 8 above to only include variable symbols and var = ...
lines that the user has annotated with @cache
. If we allow var
to be non-isbits
, we need to also filter the lines var[i] = ...
and the likes in step 8b above.
This is a tentative plan to get some comments.
There are some big changes in the current master branch of DynamicPPL
, and some more planned; these changes likely will take some time to finish and stabilise. In this process, we might want to make minor releases to fix bugs or bump support for new dependent libraries (e.g. AbstractMCMC).
Based on these requirements, I suggest that we create a dev
branch and keep breaking changes in the dev
branch while keeping the master
branch always release ready.
@devmotion @phipsgabler @mohamed82008 Any thoughts?
This test (https://github.com/TuringLang/DynamicPPL.jl/blob/master/test/compiler.jl#L333) fails with Turing master. I am not sure why.
We should later support something like this.
@model a(...) = begin
# code
end
@model b(...) = begin
x ~ a(...)
# other code
end
The following model doesn't work on master but works on the latest release:
julia> @model demo(z) = begin
m ~ Normal()
z[1:end] ~ MvNormal(fill(m, length(z)), 1.0)
end
DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}(##generator#510, NamedTuple())
julia> sample(demo(rand(3)), HMC(0.1, 1), 100)
ERROR: UndefVarError: end not defined
Stacktrace:
[1] macro expansion at .\REPL[2]:3 [inlined]
[2] ##evaluator#509(::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Real}, ::DynamicPPL.SampleFromPrior, ::DynamicPPL.DefaultContext) at C:\Users\user\.julia\dev\DynamicPPL\src\compiler.jl:447
[3] Model at C:\Users\user\.julia\dev\DynamicPPL\src\model.jl:118 [inlined]
[4] runmodel! at C:\Users\user\.julia\dev\DynamicPPL\src\model.jl:138 [inlined]
[5] VarInfo at C:\Users\user\.julia\dev\DynamicPPL\src\varinfo.jl:110 [inlined]
[6] VarInfo at C:\Users\user\.julia\dev\DynamicPPL\src\varinfo.jl:109 [inlined]
[7] DynamicPPL.Sampler(::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.Adaptation.UnitEuclideanMetric}, ::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::DynamicPPL.Selector) at C:\Users\user\.julia\dev\Turing\src\inference\hmc.jl:304
[8] Sampler at C:\Users\user\.julia\dev\Turing\src\inference\hmc.jl:302 [inlined]
[9] sample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.Adaptation.UnitEuclideanMetric}, ::Int64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:151
[10] sample at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:151 [inlined]
[11] #sample#1 at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:141 [inlined]
[12] sample(::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.Adaptation.UnitEuclideanMetric}, ::Int64) at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:141
[13] top-level scope at REPL[3]:1
I find the following syntax for modifying log density a bit confusing since it reads like an operation on the macro @logpdf
@logpdf() += logsumexp(gamma)
Maybe change it to something easier to read, e.g.
@inc_logpdf(logsumexp(gamma))
In models with a small vector parameter with a fixed length, using a StaticArray
for the vector parameter can lead to a significant speedup. The following is a proof of concept:
using Turing, StaticArrays, Setfield, MacroTools, LinearAlgebra
macro immutable(expr)
esc(MacroTools.postwalk(expr) do x
if @capture(x, L_[i_] = R_)
:($L = Setfield.@set $L[$i] = $R)
else
x
end
end)
end
d = 10
alg = HMC(0.1, 1)
alg = IS()
# x isa `Vector` of `SVector`s and `y` is a `Vector`
@model demo1(X, y, ::Type{T} = Float64) where {T} = begin
a = similar(zero(eltype(X)), T)
for i in 1:length(a)
@immutable(a[i] ~ Normal())
end
sigma ~ Beta()
for i in 1:length(y)
y[i] ~ Normal(dot(a, X[i]), sigma)
end
end
X1 = [@SVector rand(d) for i in 1:1000]
y1 = rand(1000)
model1 = demo1(X1, y1)
sample(model1, alg, 50000);
# X isa `Matrix` and `y` is a `Vector`
@model demo2(X, y, ::Type{T} = Float64) where {T} = begin
a = similar(X, T, size(X, 2))
for i in 1:length(a)
a[i] ~ Normal()
end
means = X * a
sigma ~ Beta()
for i in 1:length(y)
y[i] ~ Normal(means[i], sigma)
end
end
X2 = rand(1000, d)
y2 = rand(1000)
model2 = demo2(X2, y2)
sample(model2, alg, 50000);
I noticed a speedup of 20-50% from using a static vector depending on d
and the algorithm used. Would be nice to formalize this some more and officially add it to DynamicPPL.
This issue contains my thoughts and comments from working with VarInfo
a lot more during the course of TuringLang/Turing.jl#793. My experience is that VarInfo
is somewhat easy to use once you get over the very steep learning curve, but that learning curve can be a powerful deterrent to development from outside folks.
I make some strong statements here to encourage discussion. I'm not trying to bash anyone's superb contributions (particularly @mohamed82008's great work on VarInfo
), I just want to see if I can provoke some high-level thinking about VarInfo
without thinking too much about what it is right now. Try to keep the context of this discussion about what VarInfo
could be and not what it is now.
VarInfo
anywhereI think we see VarInfo
too much. If I'm a non-Turing person and I'm building some kind of inference tool, I don't want to learn about our arcane system for managing variables. I just want to manipulate parameters, draw from priors, etc. Many of our functions should probably never have a do_something(vi, spl)
signature -- we should find ways to handle everything on the back end without anyone worrying about how to use VarInfo
. A better way would be to have the VarInfo
stored somewhere in a shared state or tied to a specific model.
I can imagine a case where VarInfo
is stored in some environment variable or state variable or something, and the sampler or model might just have a location to go look at where the VarInfo
is. Then you could just call logp(model)
and by default it would calculate the log probability using whatever the current state is. If you really wanted to, you could pass in a VarInfo
and work with a specific one if you're doing a lot of numerical work and such, but I think for almost all cases VarInfo
could sit far away and never be thought about.
An alternative fix would be to just have a very small handful of variables that are dead-simple to use an understand. See TuringLang/Turing.jl#886 for a better discussion.
update!(vi, new_vals)
should update the parameters.parameters(vi)
should get the current parameterization in a NamedTuple
or Dict
format.logp(vi, model)
should give you a log probability, no questions asked and no hassle.priors(vi)
should give you a NamedTuple
or Dict
of prior distributions to draw from.priors!(vi, new_priors)
should set my priors to whatever the new distributions are.VarInfo
is ultimately my biggest issue with Turing's internals. I understand why we need it and it is a masterful work of engineering, but from a usability side it is a disaster, particularly if our goal is to have a high degree of ease-of-use for inference designers.
If you asked me a question on how to do something with a VarInfo
right now, chances are very good it would take me more than an hour to think about what it is that VarInfo
is, what it does, and where in the source code I might find an answer. Add another half hour because whatever it was I though VarInfo
was is not true.
VarInfo
live?I'm not sure where the VarInfo
should go. I don't think it should be a free-floating entity like it has been in Turing's past, and I'm also not convinced that it's attachment to the sampler state as in TuringLang/Turing.jl#793 is correct either.
Is VarInfo
more a function of the model, or of the sampler? If it's more specific to the model, shouldn't we store it there? I don't really know. If it's in the model, then it's quite nice to use for non-MCMC methods, since nobody would have to add VarInfo
to their method -- they can just call the model's version. Ultimately the VarInfo
is constructed from the model, and the samplers just reference it. Right now I'm learning towards moving VarInfo
over to the model, but I'm open to discussion on that.
A downside to putting it on the model side is that it becomes harder to build new modeling tools on top of Turing, but easier to build inference methods. I think it's a trade-off that's worth considering.
Sampler.info
fieldBuild a VarInfoFlags
struct that handles all the various switches and gizmos and whatever that VarInfo
uses. Currently, all the Sampler
s have a dictionary called info
in them which will no longer be used on the inference side after TuringLang/Turing.jl#793. It'd be nice if we could remove the field entirely and separate the VarInfo
flags from the Sampler
either by storing the flags in the VarInfo
itself, or at least removing the dictionary by just storing the flags with the sampler.
This is really more mechanical than goal-oriented, and it's just something I or someone else might need to apply some elbow grease to.
It would be nice to create a new struct VariableSpace
and use that instead of Set
for Sampler space
. We can then overload Base.in
to check if vn
is inside space
including the case when space
is empty. We can overload in
now with space::Set
but that can cause confusion with the empty space
case.
When running one of my models on Julia 1.4 I ran into the following issue:
ERROR: LoadError: ArgumentError: Union{} does not have elements
Stacktrace:
[1] eltype(::Type{Union{}}) at ./array.jl:124
[2] eltype(::DynamicPPL.VarInfo{NamedTuple{..., ::DynamicPPL.SampleFromUniform) at /homes/tef30/.julia/packages/DynamicPPL/3jy49/src/varinfo.jl:912
I don't have time to come up with an MWE right now, but will try do so when I have time.
At the moment, all random variables in Turing models are declared through the tilde notation (∼
). This is also true for variables that may dynamically appear and/or disappear during runtime (i.e. varying dimensional models). This design causes many headaches for creating a consistent, intuitive scheme to re-name variables in VarInfo
.
Perhaps we should reconsider this issue and its solution. For example, if we introduce an additional assume operator ⩪
(dotsim), and let
∼
declares variables which always exist during runtime (and only once)⩪
declares variables which may dynamically exist during runtime (and/or more than once)Then, for variables declared by (1), we always use the variable name (maybe plus some additional statically generated identifier) as their internal index in VarInfo
. While, for variables declared by (2), we explicitly require the user to pass in a unique identifier through some API.
This way, we maintain 99% per cent of the convenience of automatically generating variable names, but still, allow the user to write models with dynamic dimensionality and/or stochastic control flow.
Related:
We now have changed a couple of internal things (a possibly incomplete list):
VarName
using tuples, and the interface @varname
, @vinds
, @vsym
@model
macroin
for spaces is inspace
(althogh there's a plan to replace space tuples by their own data structure)tilde
differentiated out into tilde_observe
and tilde_assume
(resp. for dotted variants); inference algorithms can still use the old tilde
form as fallbacks, but should be aware of this (and now also observe has access to variable name)Type{Model}
and Model
have somewhat changed (includes removing runmodel!
and how a VarInfo
is filled)=
, i.e., their result is the observed/assumed value.@varinfo
, @logpdf
, and @sampler
#71These should all be updated in some place of the docs. Some in the Model Internals section, and the rest on the compiler design page.
And the "Advanced Usages" page on the Turing wiki is somehow gone? Does anyone know more about that?
Directly copied from discussion in TuringLang/Turing.jl#965 to avoid side-tracking the PR:
I'm trying to work through this compiler change and have some thoughts on VarInfo that I would appreciate some feedback on. I've also left some naming-related comments that I believe would be helpful to address for readability's sake.
Regarding the internals of VarInfo / MetaData, there are a lot of things where you need to pull elements out of a field of metadata based on metadata.idcs[vn] e.g.
md.ranges[md.idcs[vn]]]
md.orders[md.idcs[vn]]
md.gids[md.idcs[vn]]
md.dists[md.idcs[vn]]
It's also the case that everything is linearised when running the model. While it's clearly necessary to be able to linearise the state of the model at some point for the current implementation of HMC, it's not clear to me that it need to be the default mode of operation, and appears to add considerable complexity to the internals.
As such I would like to propose the following design change for discussion (perhaps a separate issue is required - again, advice appreciated):
introduce a new type Variable with fields val, gid (possibly not this? Presumably it really belongs in the gibbs sampler object), name, dist, and any other information that needs to be associated with a particular random variable on a particular run. The key point here is that information specific to a particular RV is stored with a particular RV, so there would be no need to linearise the data into a vector or construct an index map from location(s) in vector to random variable.
refactor VarInfo to contain a Vector (or other container, obviously one that is amenable to Zygote-based AD and type-stability) of Variables, in addition to logp and num_produce.
This would:
enhance readability and reduce mental overhead (imho)
enable better mixing of types. Currently if the values of every variable have the same type (usually Float64) the vals vector is very simple. The proposed design avoids the need to store all of the things in one monolithic vector, thus avoiding the need for a widely typed Vector of values.
separate out linearisation of model state from running of the model. We would implement a separate function, call it to_vec, which would efficiently convert a VarInfo instance into a vector of numbers (potentially of a number of different types, or perhaps to a single mutually acceptable type), in addition to a function that converts a vector of the same length as the one produced into a VarInfo object that is identical to the current VarInfo object in every way other than the vals of the Variables. Separating things out like this has the distinct advantage that if linearisation is not required for a particular algorithm (e.g. MH, SMC, Gibbs sampling, particle MCMC etc) then linearisation never needs to happen. Conversely, if linearisation is required then it can still be achieved efficiently. Moreover, I anticipate linearisation becoming a less important feature over time as the AD systems develop and begin to rely on ChainRules / ChainRulesCore, which play nicely with interestingly-structured data types. It's also worth noting that indexing, especially scalar, is a particularly inefficient operation for most reverse-mode AD systems (inc. Zygote, Tracker) and this is unlikely to fundamentally change any time soon. As such, avoiding the conversion between linear Vector and structured data may well be desirable from a performance perspective as it is indexing-heavy.
create opportunities for mixing static and dynamic structure. It seems like this aught to make mixing of static and dynamic model components more straightforward, but this is more of a hunch than anything else as I'm not sure what that mixing really looks like at the minute.
Choice over the type of the Variable "vector". It could actually be a Vector, but this doesn't seem likely to lead to type stable code. Alternatively the variables could be stored in a NamedTuple if we're able to enumerate all possible variable names that could crop up in a programme from the programme code itself (this seems feasible to me, but I might well be missing something). If there exists an example where this doesn't hold (i.e. the number of possible variable names is dynamic -- it's fine for some of them not to exist dynamically) that would be very much appreciated.
Would appreciate any feedback. If I'm missing something glaringly obvious here, then possibly I'm missing some general piece of understanding regarding the internals.
I think tilde
and dot_tilde
belong here, with all their methods. Barebone assume
, dot_assume
, observe
and dot_observe
can also be defined in DynamicPPL and overloaded in Turing.Inference
. Also all uses of VarInfo
and VarName
in Turing.Inference
should be limited to a one-line function call with an intuitive name. This should limit the interaction of Turing.Inference
and DynamicPPL
.
Hi
Firstly thanks for your work on this project!
I'd like to suggest some changes to the compiler, specifically the additional use of MLStyle.jl for AST manipulation. Currently lots of the AST manipulation is done using MacroTools and indexing with if else. MacroTools by itself can be a bit low level creating tricky to understand code. MLStyle provides the @match
macro which simplifies manipulating ASTs and makes the transformations very explicit.
Here's a short example
getargs_dottilde(x) = nothing
function getargs_dottilde(expr::Expr)
# Check if the expression is of the form `L .~ R`.
if Meta.isexpr(expr, :call, 3) && expr.args[1] === :.~
return expr.args[2], expr.args[3]
end
# Check if the expression is of the form `(~).(L, R)`.
if Meta.isexpr(expr, :., 2) && expr.args[1] === :~ &&
Meta.isexpr(expr.args[2], :tuple, 2)
return expr.args[2].args[1], expr.args[2].args[2]
end
return
end
To
function getargs_dottilde(expr)
@match expr begin
Expr(:call, :.~, lexpr, rexpr) => (lexpr, rexpr)
Expr(:., :~, Expr(:tuple, lexpr, rexpr)) => (lexpr, rexpr)
Expr(:macrocall, _ , linenumber, innerexpr) => nothing
_ => nothing
end
end
What's particularly nice is that the matching expression is almost identical to the form printed by show_sexpr
, this takes a lot of the mental work out of manipulating ASTs.
:((~).(x, Normal(μ, σ))) |> show_sexpr
# (:., :~, (:tuple, :x, (:call, :Normal, :μ, :σ)))
Is matched by
Expr(:., :~, Expr(:tuple, lexpr, rexpr)) => (lexpr, rexpr)
A current attempt can be found in this fork all the test are passing, look at build_model_info
for a longer example. My hope would be using this style could open up other possibilities like more sophisticated syntax or better error messages etc. If you'd be interested in a such a a change I'd be happy to help.
The problem with all the tests depending on Turing is that if a breaking change happens in DynamicPPL, there will be no version of Turing that supports this new change. So we will have to:
This essentially renders the whole CI testing process useless, because I will be developing a version of DynamicPPL and another of Turing side by side on my machine and testing them together locally. Then the Travis tests will be ignored and I will only be trusting my local tests. The solution here is to have good DynamicPPL tests independent of Turing. This may mean implementing a minimal version of Turing inside the testing suite of DynamicPPL for use in testing.
Currently, the logp accumulation in an @model
block is not thread-safe. It would be nice to have a different logp accumulator in the VarInfo
for each thread, separated by a cache line in memory to avoid false sharing. The final result can then be calculated by a single thread when getlogp
is called. This would allow the user to write multi-threaded code involving ~
in the model for iid variables for instance.
@trappmartin raised an issue in TuringLang/Turing.jl#997 about what happens when a variable, e.g. x
below, is defined in chain
but is also passed in explicitly. We can either throw an error (my favorite) or over-write the value from chain
with the value passed in explicitly.
using Turing
@model demo(y, ::Type{T} = Float64) where {T} = begin
x = Vector{T}(undef, 10)
x .~ Normal()
y .~ Normal.(x, 1.0)
end
chain = sample(demo(rand(10)), HMC(0.64, 1), 1000)
# log likelihood
logprob"y = rand(10) | model = demo, chain = chain, x = rand(10)"
Could we add the mentioned convenience functions to DynamicPPL?
So after type stability issues are taken care of, one more optimization that can be worked on is how to best access fields in the VarInfo
struct? The main set of optimizations I can think of make the assumption that:
This unlocks a whole set of possible optimizations for most if not all samplers, definitely HMC:
StaticArrays.MArray
for the fields of TypedVarInfo
since it won't need to grow in size.idcs
dictionary which looks up the variable by its VarName
. Instead, we can sort the ranges
and vals
vectors by the access order of the parameters and only keep a counter for each symbol that gives us the index in ranges
which gives us the index range in vals
. This counter can be incremented inline in the model when expanding the @model
macro. This has 2 effects. Firstly, we avoid the overhead of looking up the index in idcs
and secondly we make sure the random variables are accessed in a contiguous manner in memory. This will be faster since TypedVarInfo
is typically heap-allocated.With the 2 optimizations in-place and a whole bunch of inlining, it may be possible for the Julia compiler to avoid allocating any memory on the heap for TypedVarInfo
! The Julia compiler is good at eliding allocations when mutable objects only live within a single function body. This means if we make sure all the functions which use vi::TypedVarInfo
are properly inlined, the compiler may be able to elide these allocations entirely. Unfortunately, this is not true for Base.Array
since these are implemented in C
so they have special semantics.
This optimization promises significant speedups for small and medium-sized, static, determinisitic models.
It would be nice to allow arbitrary types in UntypedVarInfo
instead of just Real
s. This should allow Turing to be used with distributions that sample letters, words, trees, images, etc. without having to represent them as an array of reals. The conversion to TypedVarInfo
should work fine whatever type is in UntypedVarInfo
so this may be as simple as changing Real
to Any
in a bunch of places.
This is an umbrella issue for models with dynamic dimensionality (or stochastic control flows).
Stochastic control flow related:
This is a summary issue of something floating around for quite some time.
We would like to be able to do this:
@model test(x) = begin
m1 ~ Normal()
@track m2 = f(m1) # some differentiable transformation
x ~ Normal(m2, 1.0)
end
The resulting chain should contain states for m1, m2
respectively.
Comments by @mohamed82008 how to approach this:
VarInfo
currently implements and exports a large number of APIs, which sometimes can be redundant, confusing and hard to maintain. It is helpful to have a clear boundary what is internal to VarInfo
and what can be reliably called by external modules such as sampling algorithms and the compiler itself.
See https://github.com/TuringLang/Turing.jl/issues/634#issuecomment-471339521 for a summary.
I am really really in favor of splitting the gigantic varinfo.jl file into multiple files. It is currently doing a whole lot of things revolving around VarInfo
but I think it can be managed better by splitting it into multiple files, most of which are relatively self-contained and do specific tasks.
For instance, the getindex
, setindex!
and push!
code is almost 400 lines of code that all attempt to define these 3 functions only. Everything else is helper code. If we put these 400 lines in indexing.jl
, we know that if we suspect a bug has something to do with indexing, that it's probably somewhere in this 400 line file which is a relatively self-contained rather than in a 1100 loc file. That makes development and debugging VarInfo
-related code easier at least for me. Even if a bug has something to do with an interaction of 2 files. It is easier for me to inspect the 2 files simultaneously and understand what's happening by switching windows or tabs, than it would be if I had to scroll up then scroll down again repeatedly.
What do you all think?
The NamedDist
approach is awkward for naming variables and doesn't fully support .~
. It would be nice to support custom names using the more straightforward syntax @rename(x ~ dist, name)
or @rename(x .~ dist, name)
.
In the MWE below, Inf
is treated as a variable rather than the literal Inf
.
using Turing
@model mwe() = begin
m ~ Normal(0, 1)
Inf ~ Normal(m, 1)
end
sample(mwe(), HMC(0.1, 2), 100)
I am opening this issue to propose a way to treat Turing models as distributions. After TuringLang/Turing.jl#997, we will have a nice syntax to query the logpdf
of a NamedTuple
of symbols given another NamedTuple
of symbols, e.g. logprob"x = [1, 2] | y = [1, 1], model = mymodel"
. This takes care of the logpdf
function in the Distributions
API. So if we can also have a way to sample a NamedTuple
from the probability distribution P(x | y= [1, 1], model = mymodel)
and the likes, we can define a simple syntax that generates such distribution structs from Turing models, something like prob"x | y = [1, 1], model = mymodel"
. Defining the posterior predictive distribution from there is also somewhat easy. The question is how to sample NamedTuple
s from Turing models.
Let the inner function called when calling an instance of the Turing.Model
be:
function inner_function(vi, sampler, ctx, model)
#function body
end
Let's assume we have random variables s
and m
in the model. The idea I initially had was to replace the above expression with:
function inner_function(vi, sampler, ctx, model)
if ctx isa DistributionContext
s = missing
m = missing
end
@inline function f()
#function body
end
model_output = f()
if ctx isa DistributionContext
return (s = s, m = m)
else
return model_output
end
end
Calling the model
in the DistributionContext
will therefore return us the NamedTuple
of random variables sampled. The DistributionContext
can wrap another context and forward the tilde
and dot_tilde
functions to the context wrapped. This will give us a way to return the sampled random variables from the model. If a random variable wasn't sampled, missing
will be returned. If a vector or matrix random variable can be partially sampled, i.e. the number of random variables is itself random, we can tell the users to initialize such variables using:
x = Vector{Union{Missing, T}}(undef, 10)`
for example inside the model body. The unsampled random variables will therefore be missing
. These semantics for "sampling from Turing models" seem reasonable to me even in the case when the number of random variables is itself random.
Given that we inline f
, I thought the Julia compiler should be able to optimize out s = missing
and m = missing
if s
and m
were defined inside #function body
, because the compiler does optimize these away when I don't use a closure. Unfortunately, the following minimal example doesn't infer properly:
function f(a)
# initialize
if a isa Int
b = missing
end
@inline function g()
# model body starts
b = [1, 1]
# Can have more stuff here and multiple return statements in branches
return 2*b
# model body ends
end
out = g() # overwrite `b` if `a isa Int`
# Return a namedtuple or the output of the model depending on the type of the input `a` (or ctx in the real example)
if a isa Int
return (b = b,)
else
return out
end
end
@code_warntype f(1) # f(1.0)
and according to Keno Fischer on Slack, this is "a limitation that needs to be addressed properly at some point". The problem is basically the closure.
So my alternative solution is to do:
function inner_function(vi, sampler, ctx, model)
if ctx isa DistributionContext
s = missing
m = missing
end
#function body
model_output = nothing
@label end_of_func
if ctx isa DistributionContext
return (s = s, m = m)
else
return model_output
end
end
and inside #function body
, we replace every return ...
statement with:
model_output = ...
@goto end_of_func
and every return
statement with:
@goto end_of_func
This eliminates the use of a closure and therefore should infer properly. At least, the following minimal example does:
function f(a)
# initialize
if a isa Int
b = missing
end
# model body starts
# can have arbitrary code here with multiple return points, some or all the random variables can be defined
b = [1, 1]
if sum(b) > 0.5
out = 2*b
@goto end_of_func
else
out = 3*b
@goto end_of_func
end
# model body ends
out = nothing # the function output is nothing if we made it here
@label end_of_func
# Return a namedtuple or the output of the model depending on the type of the input `a` (or ctx in the real example)
if a isa Int
return (b = b,)
else
return out
end
end
@code_warntype f(1)
It would be nice to allow users to use $
around some expression (e.g. a macro call) to avoid any fiddling with it on the DPPL side. This for example allows users to make use of their own DSL inside of a @model
block. This can include their own ~
and .~
and what not that we use in DPPL but they can have different semantics inside their block of code.
Currently, a model
@model mymodel(y) = begin
# model code
end
expands to about this (things ending in _1
are gensym
ed):
function mymodel_1(y)
function inner_function(vi, spl, ctx, model)
local y
# transformed model code
end
return Model(inner_function, (y = y,), ModelGen{(:y,)}(mymodel_1, NamedTuple()))
end
mymodel = ModelGen{(:y,)}(mymodel_1, NamedTuple())
Would it be feasible to make the inner function a custom closure being a subtype of a common abstract base, like the following:
struct Mymodel_Function_1 <: AbstractModelFunction end
const inner_function = Mymodel_Function_1()
function (::Mymodel_Function_1)(vi, spl, ctx, model)
local y
# transformed model code
end
function mymodel_1(y)
return Model(inner_function, (y = y,), ModelGen{(:y,)}(mymodel_1, NamedTuple()))
end
mymodel = ModelGen{(:y,)}(mymodel_1, NamedTuple())
where AbstractModelFunction <: Function
.
The reason is that I would like to dispatch on the call of the model function with IRTracker, and inner_function
being an anonymous closure makes this a bit difficult. I don't see anything that would speak against it -- if you agree, I'll make a PR.
Since this makes things even more layered, some further speculation: what complications are there that make something like the following version difficult?
struct Mymodel_1 <: Model{(:y,)}
y
missings
end
function (model::Mymodel_1)(vi, spl, ctx)
local y
# transformed model code
end
function mymodel(y)
return Mymodel_1(y, getmissings(y))
end
using Turing
@model mwe(::Type{T}=Float64) where {T<:Real} = begin
m = Matrix{T}(undef, 2, 3)
m .~ MvNormal(zeros(2), 1)
end
sample(mwe(), HMC(0.2, 4), 1_000)
gives
MethodError: no method matching dim(::MvNormal{Float64,PDMats.ScalMat{Float64},Array{Float64,1}})
Closest candidates are:
dim(!Matched::PDMats.PDMat) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/PDMats/nKT0N/src/pdmat.jl:30
dim(!Matched::PDMats.PDSparseMat) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/PDMats/nKT0N/src/pdsparsemat.jl:30
dim(!Matched::PDMats.PDiagMat) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/PDMats/nKT0N/src/pdiagmat.jl:26
...
High dimensional Gaussian in Turing is slow (> 1 hour):
using Turing
@model high_dim_gauss(D) = begin
m ~ MvNormal(fill(0, D), 1)
end
chain = sample(high_dim_gauss(10_000), HMC(0.1, 4), 2_000, progress_style=:plain)
Corresponding AHMC + Tracker impl. is fast (~ 5.5 seconds):
using Distributions, Tracker, AdvancedHMC
D = 10_000
ℓπ(x) = logpdf(MvNormal(fill(0, D), 1), x)
function ∂ℓπ∂θ(x)
y, back = Tracker.forward(ℓπ, x)
return (Tracker.data(y), Tracker.data(back(1)[1]))
end
metric = UnitEuclideanMetric(D)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)
samples, stats = sample(h, StaticTrajectory(Leapfrog(0.1), 4), rand(D), 2_000; verbose=false, progress=false)
Currently, there are essentially two methods of tilde
(resp. dot_tilde
) -- one for assume
, one for observe
. These are only distinguished by the assume
variant being passed the VarName
(and indices).
Can we just have different functions for them, like tilde_observe
and tilde_assume
, with both taking the VarName
? The motivation is that I want to use IRTracker for dependency analysis, for which having the name would really be a nice (i.e., necessary) thing.
The name of an observed variable would just be ignored in all existing implementations, but who knows, maybe it can become useful. I think both methods could actually fall back to tilde
for transition. Do any other packages besides Turing itself use them at all?
BTW: what exactly is the reason that assume
/observe
are not directly produced from the model macro?
Model using :
below is poorly supported to fetch variables from Chain
.
@model ldamodel(K, V, M, N, w, doc, alpha, β) = begin
theta = Matrix{Real}(M, K)
for m = 1:M
theta[m,:] ~ Dirichlet(alpha)
end
phi = Matrix{Real}(K, V)
for k = 1:K
phi[k,:] ~ Dirichlet(β)
end
# z = tzeros(Int, N)
# for n = 1:N
# z[n] ~ Categorical(theta[doc[n]])
# end
for n = 1:N
w[n] ~ Categorical(phi' * theta[doc[n],:])
end
end
Have to write variables in type of nest arrays at this moment.
Currently, when a user defines a model
@model function test(x, y)
x ~ Normal(0, 1)
y ~ Normal(x, 1)
end
then we define both test(x, y) = ModelGen(....)
and test(; x, y) = test(x, y)
. Since we do not define an arbitrary macro syntax but parse Julia functions, we should maybe stick more to the Julia syntax for functions and respect the user-provided function signatures, i.e., only define the function test(x, y)
if the user only provides keyword arguments. If the user wants a function with keyword arguments, she would have to write
@model function test(; x, y)
x ~ Normal(0, 1)
y ~ Normal(x, 1)
end
instead. Users could still provide default arguments for both positional and keyword arguments but the behaviour would be the same as for a regular Julia function - keyword arguments you can specify in any order but positional arguments require a user to specify arguments in a specific order. Internally we would still keep a named tuple of all arguments (positional + keyword arguments), so the change would only be user-facing by not providing more definitions of test
than what the user wrote down explicitly. IMO that would be less surprising for users since @model
would not change anymore how the function signature is interpreted.
@mohamed82008 mentioned on Slack that this might cause problems with the string macros (logpdf"...."
etc.). Can you elaborate what you had in mind? I assumed it would not affect any functionality but just require you to specify test(missing, missing)
explicitly if x
and y
are not observed.
@trappmartin brought up an issue in TuringLang/Turing.jl#997 about the syntax for pre-allocating varinfo
. After the PR is merged, we can do:
using Turing
@model demo(y, ::Type{T} = Float64) where {T} = begin
x = Vector{T}(undef, 10)
x .~ Normal()
y .~ Normal.(x, 1.0)
end
xval = rand(10)
varinfo = Turing.VarInfo(demo(xval))
logprob"y = rand(10) | model = demo, x = xval, varinfo = varinfo"
Pre-allocating is useful if we need to change xval
or y
in a loop and re-evaluate. Pre-allocating varinfo
can be useful in these cases. However, as @trappmartin indicated, the syntax seems to imply we are conditioning on varinfo
which we are not. One solution proposed in the PR was to use ;
instead of ,
before varinfo
. Another one that I prefer is to not expose this in the macro at all, and only allow it in the functional API under the hood.
This is a unsolved point mentioned in TuringLang/Turing.jl#96
Forbid variables with same name, with the following exceptions
Is there a particular reason strings are used, as opposed to some structured type?
What kinds of indexing are admissible, anyway? Would an equivalent of Tuple{Vararg{Union{Int, Colon, UnitRange{Int}}}}
suffice? Like, what @vinds
is already doing?
While working on #115, I came to realize that we may be too harsh in our criticism of VarInfo's complexity. This issue tries to explain some of the design choices we have in varinfo.jl which should spiral into a discussion on how to improve things. This issue came to being after a discussion with Hong on slack regarding #118 which I am still very much in favor of.
Function redundancy is a big concern you may have if you have played around with varinfo.jl for a while. The main one I can think of is getval
vs getindex
and setval!
vs setindex!
. The way I think about it is that getval
and setval!
get and set the vectorized version of the value. So getval
returns a vector and setval!
expects a vector in even if the value was originally a scalar or matrix. getindex
and setindex!
on the other hand should reconstruct the shape of the original variable, and link
/invlink
the value based on whether the VarInfo
is working in terms of the transformed or original values. Currently, setindex!
doesn't do exactly that and but it was "fixed" in #115 to do exactly the above. So r = vi[vn]
(or vi[vn, dist]
in the PR) will always return a value in the domain of vn
's distribution with the right shape and vi[vn] = r
expects r
to be in the domain of the distribution and in its natural shape.
Another redundancy that we have comes from the need for r = vi[vn]
and vi[vn] = r
but also r = vi[spl]
and vi[spl] = r
. The first 2 are used in a model call, while the other 2 are used in the step!
function. These need to be defined for UntypedVarInfo
as well as TypedVarInfo
. But imo not much can be done about the need for those functions. VarInfo
needs to define those functions one way or another because they are the main API of VarInfo
.
Other than the above redundancy, we have many getter and setter functions in varinfo.jl. I think each one of those functions is used at least once. For example, we need to be able to update the gids
of a variable to assign a sampler to it. This is because new variables can pop up in dynamic models and they need to be assigned to HMC for example in the next HMC call. Getters and setters for variable ranges, distributions, logp and flags are all necessary. Other functions like empty!
, haskey
, syms
, tonamedtuple
, etc are all important utility functions to have as well.
Finally regarding unit tests, while we are not exactly perfect in that department, our coverage is at 75%. The coverage of varinfo.jl is at 83%. We could definitely do better in terms of splitting Turing and DPPL tests but completely separating the 2 has proven somewhat difficult thus far. This is mostly because we need a Gibbs-HMC sampler to test the Gibbs-HMC specific components of DPPL. We need a particle sampler to test the particle sampling specific components of DPPL, etc. Will it be possible one day? I hope but this isn't the first heavily interlinked package duo that exists in the Julia ecosystem. Sometimes the need arises for 2 separate development cycles in a heavily interlinked package resulting in the splitting of the package into 2 heavily interlinked packages with 2 development cycles. It's not ideal but it's arguably better than combining the 2 packages again mostly for development needs. This is exactly the situation we are in the Turing-DPPL situation.
Now, back to why I am opening this issue. This isn't just me defending the status quo. I understand there are a few similar issues discussing problems or proposed improvements to VarInfo, e.g. #5, #7, #16, #18 and #68. I am open to criticism if you disagree with anything I said above or think there is a better way to do things but please be specific. For example, here are some questions for you to think about:
@devmotion has brought up more than once his preference to design the VarInfo
data structure around unvectorized values. I understand the appeal of that but I don't think it will necessarily simplify the code by a lot. We still need to keep track of distributions, sampler gids and varnames. We still need to get a vectorized form and set a vectorized form of the values for HMC samplers. We still need to cater for variables disappearing and popping at any time. We still need specialize the type of VarInfo to cater for mixed variable types and automatic differentiation in a type stable way. All of this would still need to be done. Will it be simpler? Maybe, but I doubt it will be much simpler.
Anyways, this issue grew to be longer than I would have liked, but please let me know what you think, whether you agree or disagree with anything I said above. Thank you.
I want to use Turing for the GP-SSM model, which is based on the PGAS algorithm. The implementation is actually straight forward thanks to Turing. However, I have noticed that the Gibbs sampler runs the whole model for every subsampler. For large models this implies that there is a huge computational overhead. One way to solve this issue would be by implementing some "hacky" macros, like:
@condition_on (:x, ;y) = begin
# Do some code here which is only executed when :x or :y appears in the sampling space
end
which would be changed to something like :
if :x in spl.alg.space || :y in spl.alg.space
# Run the code above
end
This makes the model rather difficult to design because it might not be obvious what belongs inside the brackets and what not. @torfjelde had the idea to do something like the following instead :
var = @condition_on (:x, ;y) = begin
# Compute var
end
which would be changed to something like :
if :x in spl.alg.space || :y in spl.alg.space
# Compute var
cache["var"] = var
else
var = cache["var"]
end
I am anyway planning to implement this macro for my own use and I was thinking that this might be interesting for Turing in general. What do you think?
Currently, we support a variable number of random variables but only when the random variables are elements of an array that's always at least partially sampled. That is each "symbol" in the model is assumed to be visited every time you run the model. This is enough most of the time and indeed one can group together all optional parameters in a vector and use this method, however it fails to address models of the form:
@model demo(x) = begin
a ~ Uniform()
if a > 0.5
b ~ Normal(a)
x ~ Normal(b)
else
c ~ Uniform()
x ~ Normal(c)
end
end
Note that b
and c
will not always be "visited". The current VarInfo
setup will therefore fail in those cases. One way to address this is to have an UntypedVarInfo
stored inside the TypedVarInfo
for previously unseen variable symbols. Then during the sampling, we can check at every iteration if there are new symbols sampled. If so, we create a new spl::Sampler
whose spl.state.vi
is an instance of the expanded TypedVarInfo
. If we do this right, I think we will not be paying any penalty in the case where all the symbols are always visited, and we may need to pay only a small dynamic dispatch price per new symbol seen, keeping the remaining of the code type stable.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.