Giter Club home page Giter Club logo

dynamicppl.jl's People

Contributors

alyst avatar bors[bot] avatar cpfiffer avatar devmotion avatar github-actions[bot] avatar harisorgn avatar jaimerzp avatar jonasmac16 avatar juliatagbot avatar knuesel avatar mohamed82008 avatar paradacarleton avatar pavanchaggar avatar phipsgabler avatar rikhuijzer avatar sunxd3 avatar torfjelde avatar vaibhavdixit02 avatar willtebbutt avatar wsmoses avatar xukai92 avatar yebai avatar yongchaohuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dynamicppl.jl's Issues

Rename logp to logjoint and cache loglikelihood by default

I would like to propose the following change in VarInfo. I would like the logp field to be called logjoint and to have another field loglikelihood that caches the log likelihood by default and is then returned by the Chain. What do you think?

Make Sampler a field of DefaultContext (renamed to SamplingContext)

Currently we have 2 arguments for changing the behavior of assume and observe, context and sampler. I think:

  1. Sampler belongs in a field in DefaultContext, perhaps renaming it to SamplingContext,
  2. SampleFromPrior should always overwrite the values in VarInfo much like SampleFromUniform in #78, and
  3. ComputeLogpContext can be another context which encompasses LikelihoodContext, PriorContext and the current behavior of DefaultContext with SampleFromPrior.

I always found it confusing the SampleFromPrior doesn't actually sample anything if VarInfo already has values.

Sampling directly from varied-sized prior is buggy

MEW:

using Random: AbstractRNG
using Distributions: DiscreteMultivariateDistribution
import Distributions: rand, logpdf

struct DSL <: DiscreteMultivariateDistribution
end

function logpdf(dsl::DSL, dsl_numeric::AbstractVector{Int})
    return sum([log(0.5) * 0.5^i for i in 1:length(dsl_numeric)])
end

function rand(rng::AbstractRNG, dsl::DSL)
    fst = rand(rng, [0, 1])
    dsl_numeric = [fst]
    while rand() < 0.5
        push!(dsl_numeric, rand(rng, [0, 1]))
    end
    return dsl_numeric
end

using Turing

@model function mwe()
    dsl ~ DSL()
end

chain = sample(mwe(), PG(10), 500)

gives

Stacktrace in the failed task:
tried to assign 2 elements to 7 destinations
throw_setindex_mismatch(::Array{Int64,1}, ::Tuple{Int64}) at indices.jl:191
setindex_shape_check at indices.jl:242 [inlined]
setindex! at array.jl:850 [inlined]
setval!(::DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::Array{Int64,1}, ::DynamicPPL.VarName{:dsl}) at varinfo.jl:284
setindex!(::DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::Array{Int64,1}, ::DynamicPPL.VarName{:dsl}) at varinfo.jl:939
assume(::DynamicPPL.Sampler{PG{()},Turing.Inference.PGState{DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},Float64}}, ::DSL, ::DynamicPPL.VarName{:dsl}, ::DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}) at AdvancedSMC.jl:272
_tilde at context_implementations.jl:39 [inlined]
tilde at context_implementations.jl:18 [inlined]
macro expansion at compiler.jl:348 [inlined]
macro expansion at In[4]:9 [inlined]
##inner_function#340#6 at compiler.jl:493 [inlined]
#_#5 at model.jl:24 [inlined]
Model at model.jl:24 [inlined]
Model at model.jl:23 [inlined]
#3 at container.jl:37 [inlined]
(::Libtask.var"#1#2"{Turing.Core.var"#3#4"{DynamicPPL.Model{var"##inner_function#340#6",NamedTuple{(),Tuple{}},DynamicPPL.ModelGen{(),var"###mwe#348",NamedTuple{(),Tuple{}}},Val{()}},DynamicPPL.Sampler{PG{()},Turing.Inference.PGState{DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},Float64}},DynamicPPL.VarInfo{NamedTuple{(:dsl,),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:dsl},Int64},Array{DSL,1},Array{DynamicPPL.VarName{:dsl},1},Array{Int64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}}})() at taskcopy.jl:38

Pretty printing of models

Soss has a pretty string representation of a model that I think would be nice to have in DPPL. We can store a string in the Model struct with the string representation of the model.

Refactor tilde/assume/observe interface

Discussed here.

tilde_assume and tilde_observe currently fall back to different methods of tilde/dot_tilde, which are distinguished by their arguments. And these methods in turn call the assume and observe methods from the inference algorithm. Some inference algorithms also (or instead) overload methods of tilde directly.

Until now the tilde_{observe,assume} methods are only an internal wrapper and work backwards compatibly.

I propose

  • assume, observe, and dot_{assume,observe} to be overloaded and used by inference algorithms, and
  • tilde_{assume,observe} and dot_tilde_{assume,observe} for internal usage (unpacking stuff, and unifying the return values).

Confusing error for setindex! on VarInfo

Is the following behaviour really intended?

julia> vi
/=======================================================================
| VarInfo
|-----------------------------------------------------------------------
| Varnames  :   ["p"]
| Range     :   UnitRange{Int64}[1:1]
| Vals      :   Real[0.9574412442107834]
| GIDs      :   Set{DynamicPPL.Selector}[Set([])]
| Orders    :   [0]
| Logp      :   -0.043490923558696964
| #produce  :   1
| flags     :   Dict("del" => [0],"trans" => [0])
\=======================================================================


julia> vi[VarName{:p}("")]
0.9574412442107834

julia> vi[VarName{:p}("")] = 0.5   # why not allow this?
ERROR: ArgumentError: indexed assignment with a single value to many locations is not supported; perhaps use broadcasting `.=` instead?
Stacktrace:
 [1] setindex_shape_check(::Float64, ::Int64) at ./indices.jl:234
 [2] macro expansion at ./multidimensional.jl:722 [inlined]
 [3] _unsafe_setindex!(::IndexLinear, ::Array{Real,1}, ::Float64, ::UnitRange{Int64}) at ./multidimensional.jl:717
 [4] _setindex! at ./multidimensional.jl:712 [inlined]
 [5] setindex! at ./abstractarray.jl:1074 [inlined]
 [6] setval!(::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Real}, ::Float64, ::VarName{:p}) at /home/philipp/.julia/packages/DynamicPPL/PppLw/src/varinfo.jl:311
 [7] setindex!(::VarInfo{DynamicPPL.Metadata{Dict{VarName,Int64},Array{Distribution,1},Array{VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Real}, ::Float64, ::VarName{:p}) at /home/philipp/.julia/packages/DynamicPPL/PppLw/src/varinfo.jl:966
 [8] top-level scope at REPL[36]:1

julia> vi[VarName{:p}("")] = [0.5]
1-element Array{Float64,1}:
 0.5

`.~` seems to give incorrect answers

using Turing

@model naive_bayes(image, label, D, N, C, ::Type{T}=Float64) where {T<:Real} = begin
    m = Matrix{T}(undef, D, C)
    for c = 1:C
        m[:,c] ~ MvNormal(fill(0, D), 10)
    end
    # m .~ Normal(0, 10) # this gives incorrect answer

    for n = 1:N
        image[:,n] ~ MvNormal(m[:,label[n]], 1)
    end
end

Checking answer via

using MLDatasets

image_raw = reshape(MNIST.traintensor(Float64), 28*28, :)
label = MNIST.trainlabels() .+ 1

# Pre-processing

using MultivariateStats

D_pca = 40
pca = fit(PCA, image_raw; maxoutdim=D_pca)

image = transform(pca, image_raw)

@info "Peformed PCA to reduce the dimension to $D_pca"

# Data function

get_data(n=100) = Dict(
    "C" => 10, 
    "D" => D_pca, 
    "N" => n, 
    "image" => image[:,1:n], 
    "label" => label[1:n]
)
data = get_data()
model = naive_bayes(data["image"], data["label"], data["D"], data["N"], data["C"])

alg = HMC(0.1, 4)
n_samples = 2_000

chain = sample(model, alg, n_samples)

m_data = Array(group(chain, :m))

m_bayes = mean(
    map(
        i -> reconstruct(pca, Matrix{Float64}(reshape(m_data[i,:], D_pca, 10))), 
        1_000:100:2_000
    )
)

using PyPlot

function make_imggrid(imgs, n_rows, n_cols; gap::Int=1)
    n = length(imgs)
    d_row, d_col = size(first(imgs))
    imggrid = 0.5 * ones(n_rows * (d_row + gap) + gap, n_cols * (d_col + gap) + gap)
    i = 1
    for row = 1:n_rows, col = 1:n_cols
        if i <= n
            i_row = (row - 1) * (d_row + gap) + 1
            i_col = (col - 1) * (d_col + gap) + 1
            imggrid[i_row+1:i_row+d_row,i_col+1:i_col+d_col] .= imgs[i]
        else
            break
        end
        i += 1
    end
    return imggrid
end
    
imgs = [reshape(m_bayes[:,i], 28, 28)' for i in 1:size(m_bayes, 2)]

plt.figure(figsize=(5, 2))
plt.imshow(make_imggrid(imgs, 2, 5), cmap="gray")

It should give something like below:
vis

Edit (05/2021): Updated example

Gibbs sampler caching optimization for statically structured models

One optimization that Turing lacks for Gibbs samplers is the ability to re-use computations that only depend on parameters that are not updated in the current Gibbs inner iteration. This can be an expensive operation that only needs to be performed once every outer iteration.

To implement the above, the following steps can be taken:

  1. Get all the variable symbols from the model that not local variables in loops.
  2. Run the model with a custom number type in vi.metadata.param.vals for every parameter param.
  3. Record all the variable symbols vars "inflicted" by this number type. Keep only the isbits of them.
  4. Make the dependency matching between all the isbits, non-local variables vars and the parameters params.
  5. Make a cache::NamedTuple of all the recorded variables vars mapping from their symbol to a Ref wrapper of their value.
  6. Make a function Cached that closes over that named tuple. The function Cached maps from sub_spl::Sampler to a tuple of all the variables vars that do not depend on any param ∈ getspace(sub_spl). This closure acts like a compile-time dictionary.
  7. The Cached closure and cache NamedTuple get stored in the StaticModel type which replaces the generic Model type.
  8. Make staticmodel(vi, sub_spl):
    a. Call cache = staticmode.cache at the beginning followed by var = cache.var[] for all the vars in cache, these are all isbits.
    b. Remove the lines var = ... for every var in Cached(sub_spl)
    c. Replace the lines var = .... for every var in cache but not in Cached(sub_spl) with var = ....; cache.var[] = value(var), where value(var) calls ForwardDiff.value(var) for Duals for example and similarly for Tracker.

To ensure the validity of the above transformation, one would need to prohibit functions that modify their arguments to be used in this form var = f(args...). f(args...) can still be used in its own line without assigning to var.

Another proposal is to make the user opt-in for some variables to be cached using @cache var = f(args...). Using this, the user guarantees that:

  1. var is not modified by f or any other function. var being isbits is a sufficient but not necessary condition for that.
  2. The compiler is free to remove f(args...) without any side-effects from not modifying one of args when we should have.
  3. var is not a local variable in a loop which needs to change in each iteration so caching here is just wrong.

I like this proposal better as it is inherently safer to use since the user opts-in knowing there could be bad consequences if @cache is misused. This proposal will require modifying step 1, 3 and 8 above to only include variable symbols and var = ... lines that the user has annotated with @cache. If we allow var to be non-isbits, we need to also filter the lines var[i] = ... and the likes in step 8b above.

This is a tentative plan to get some comments.

Crease a `dev` branch for breaking changes

There are some big changes in the current master branch of DynamicPPL, and some more planned; these changes likely will take some time to finish and stabilise. In this process, we might want to make minor releases to fix bugs or bump support for new dependent libraries (e.g. AbstractMCMC).

Based on these requirements, I suggest that we create a dev branch and keep breaking changes in the dev branch while keeping the master branch always release ready.

@devmotion @phipsgabler @mohamed82008 Any thoughts?

Indexing with end breaks master

The following model doesn't work on master but works on the latest release:

julia> @model demo(z) = begin
           m ~ Normal()
           z[1:end] ~ MvNormal(fill(m, length(z)), 1.0)
       end
DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}(##generator#510, NamedTuple())

julia> sample(demo(rand(3)), HMC(0.1, 1), 100)
ERROR: UndefVarError: end not defined
Stacktrace:
 [1] macro expansion at .\REPL[2]:3 [inlined]
 [2] ##evaluator#509(::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::DynamicPPL.VarInfo{DynamicPPL.Metadata{Dict{DynamicPPL.VarName,Int64},Array{Distribution,1},Array{DynamicPPL.VarName,1},Array{Real,1},Array{Set{DynamicPPL.Selector},1}},Real}, ::DynamicPPL.SampleFromPrior, ::DynamicPPL.DefaultContext) at C:\Users\user\.julia\dev\DynamicPPL\src\compiler.jl:447
 [3] Model at C:\Users\user\.julia\dev\DynamicPPL\src\model.jl:118 [inlined]
 [4] runmodel! at C:\Users\user\.julia\dev\DynamicPPL\src\model.jl:138 [inlined]
 [5] VarInfo at C:\Users\user\.julia\dev\DynamicPPL\src\varinfo.jl:110 [inlined]
 [6] VarInfo at C:\Users\user\.julia\dev\DynamicPPL\src\varinfo.jl:109 [inlined]
 [7] DynamicPPL.Sampler(::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.Adaptation.UnitEuclideanMetric}, ::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::DynamicPPL.Selector) at C:\Users\user\.julia\dev\Turing\src\inference\hmc.jl:304
 [8] Sampler at C:\Users\user\.julia\dev\Turing\src\inference\hmc.jl:302 [inlined]
 [9] sample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.Adaptation.UnitEuclideanMetric}, ::Int64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:151
 [10] sample at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:151 [inlined]
 [11] #sample#1 at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:141 [inlined]
 [12] sample(::DynamicPPL.Model{var"###evaluator#509",(:z,),Tuple{Array{Float64,1}},(),DynamicPPL.ModelGen{var"###generator#510",(:z,),(),Tuple{}}}, ::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.Adaptation.UnitEuclideanMetric}, ::Int64) at C:\Users\user\.julia\dev\Turing\src\inference\Inference.jl:141
 [13] top-level scope at REPL[3]:1

Syntax for modifying target log density

I find the following syntax for modifying log density a bit confusing since it reads like an operation on the macro @logpdf

@logpdf() += logsumexp(gamma)

Maybe change it to something easier to read, e.g.

@inc_logpdf(logsumexp(gamma))

Immutable parameters support

In models with a small vector parameter with a fixed length, using a StaticArray for the vector parameter can lead to a significant speedup. The following is a proof of concept:

using Turing, StaticArrays, Setfield, MacroTools, LinearAlgebra

macro immutable(expr)
	esc(MacroTools.postwalk(expr) do x
		if @capture(x, L_[i_] = R_)
			:($L = Setfield.@set $L[$i] = $R)
		else
			x
		end
	end)
end

d = 10
alg = HMC(0.1, 1)
alg = IS()

# x isa `Vector` of `SVector`s and `y` is a `Vector`
@model demo1(X, y, ::Type{T} = Float64) where {T} = begin
	a = similar(zero(eltype(X)), T)
	for i in 1:length(a)
		@immutable(a[i] ~ Normal())
	end
	sigma ~ Beta()
	for i in 1:length(y)
		y[i] ~ Normal(dot(a, X[i]), sigma)
	end
end

X1 = [@SVector rand(d) for i in 1:1000]
y1 = rand(1000)
model1 = demo1(X1, y1)
sample(model1, alg, 50000);


# X isa `Matrix` and `y` is a `Vector`
@model demo2(X, y, ::Type{T} = Float64) where {T} = begin
	a = similar(X, T, size(X, 2))
	for i in 1:length(a)
		a[i] ~ Normal()
	end
	means = X * a
	sigma ~ Beta()
	for i in 1:length(y)
		y[i] ~ Normal(means[i], sigma)
	end
end

X2 = rand(1000, d)
y2 = rand(1000)
model2 = demo2(X2, y2)
sample(model2, alg, 50000);

I noticed a speedup of 20-50% from using a static vector depending on d and the algorithm used. Would be nice to formalize this some more and officially add it to DynamicPPL.

VarInfo Goals

This issue contains my thoughts and comments from working with VarInfo a lot more during the course of TuringLang/Turing.jl#793. My experience is that VarInfo is somewhat easy to use once you get over the very steep learning curve, but that learning curve can be a powerful deterrent to development from outside folks.

I make some strong statements here to encourage discussion. I'm not trying to bash anyone's superb contributions (particularly @mohamed82008's great work on VarInfo), I just want to see if I can provoke some high-level thinking about VarInfo without thinking too much about what it is right now. Try to keep the context of this discussion about what VarInfo could be and not what it is now.

I don't want to see VarInfo anywhere

I think we see VarInfo too much. If I'm a non-Turing person and I'm building some kind of inference tool, I don't want to learn about our arcane system for managing variables. I just want to manipulate parameters, draw from priors, etc. Many of our functions should probably never have a do_something(vi, spl) signature -- we should find ways to handle everything on the back end without anyone worrying about how to use VarInfo. A better way would be to have the VarInfo stored somewhere in a shared state or tied to a specific model.

I can imagine a case where VarInfo is stored in some environment variable or state variable or something, and the sampler or model might just have a location to go look at where the VarInfo is. Then you could just call logp(model) and by default it would calculate the log probability using whatever the current state is. If you really wanted to, you could pass in a VarInfo and work with a specific one if you're doing a lot of numerical work and such, but I think for almost all cases VarInfo could sit far away and never be thought about.

An alternative fix would be to just have a very small handful of variables that are dead-simple to use an understand. See TuringLang/Turing.jl#886 for a better discussion.

  • update!(vi, new_vals) should update the parameters.
  • parameters(vi) should get the current parameterization in a NamedTuple or Dict format.
  • logp(vi, model) should give you a log probability, no questions asked and no hassle.
  • priors(vi) should give you a NamedTuple or Dict of prior distributions to draw from.
  • If I want to change my priors or something, we should have a way to do that too. priors!(vi, new_priors) should set my priors to whatever the new distributions are.

VarInfo is ultimately my biggest issue with Turing's internals. I understand why we need it and it is a masterful work of engineering, but from a usability side it is a disaster, particularly if our goal is to have a high degree of ease-of-use for inference designers.

If you asked me a question on how to do something with a VarInfo right now, chances are very good it would take me more than an hour to think about what it is that VarInfo is, what it does, and where in the source code I might find an answer. Add another half hour because whatever it was I though VarInfo was is not true.

Where should VarInfo live?

I'm not sure where the VarInfo should go. I don't think it should be a free-floating entity like it has been in Turing's past, and I'm also not convinced that it's attachment to the sampler state as in TuringLang/Turing.jl#793 is correct either.

Is VarInfo more a function of the model, or of the sampler? If it's more specific to the model, shouldn't we store it there? I don't really know. If it's in the model, then it's quite nice to use for non-MCMC methods, since nobody would have to add VarInfo to their method -- they can just call the model's version. Ultimately the VarInfo is constructed from the model, and the samplers just reference it. Right now I'm learning towards moving VarInfo over to the model, but I'm open to discussion on that.

A downside to putting it on the model side is that it becomes harder to build new modeling tools on top of Turing, but easier to build inference methods. I think it's a trade-off that's worth considering.

Removing the Sampler.info field

Build a VarInfoFlags struct that handles all the various switches and gizmos and whatever that VarInfo uses. Currently, all the Samplers have a dictionary called info in them which will no longer be used on the inference side after TuringLang/Turing.jl#793. It'd be nice if we could remove the field entirely and separate the VarInfo flags from the Sampler either by storing the flags in the VarInfo itself, or at least removing the dictionary by just storing the flags with the sampler.

This is really more mechanical than goal-oriented, and it's just something I or someone else might need to apply some elbow grease to.

Create a struct for VariableSpace

It would be nice to create a new struct VariableSpace and use that instead of Set for Sampler space. We can then overload Base.in to check if vn is inside space including the case when space is empty. We can overload in now with space::Set but that can cause confusion with the empty space case.

Julia 1.4 support

When running one of my models on Julia 1.4 I ran into the following issue:

ERROR: LoadError: ArgumentError: Union{} does not have elements
Stacktrace:
 [1] eltype(::Type{Union{}}) at ./array.jl:124
 [2] eltype(::DynamicPPL.VarInfo{NamedTuple{..., ::DynamicPPL.SampleFromUniform) at /homes/tef30/.julia/packages/DynamicPPL/3jy49/src/varinfo.jl:912

I don't have time to come up with an MWE right now, but will try do so when I have time.

[RFC] Introduce an additional assume operator `⩪` (dotsim) for varying dimensional models

At the moment, all random variables in Turing models are declared through the tilde notation (). This is also true for variables that may dynamically appear and/or disappear during runtime (i.e. varying dimensional models). This design causes many headaches for creating a consistent, intuitive scheme to re-name variables in VarInfo.

Perhaps we should reconsider this issue and its solution. For example, if we introduce an additional assume operator (dotsim), and let

  1. declares variables which always exist during runtime (and only once)
  2. declares variables which may dynamically exist during runtime (and/or more than once)

Then, for variables declared by (1), we always use the variable name (maybe plus some additional statically generated identifier) as their internal index in VarInfo. While, for variables declared by (2), we explicitly require the user to pass in a unique identifier through some API.

This way, we maintain 99% per cent of the convenience of automatically generating variable names, but still, allow the user to write models with dynamic dimensionality and/or stochastic control flow.

Related:

Update documentation for internals

We now have changed a couple of internal things (a possibly incomplete list):

  • VarName using tuples, and the interface @varname, @vinds, @vsym
  • Generated code of the @model macro
  • Some of the types used by model function
  • in for spaces is inspace (althogh there's a plan to replace space tuples by their own data structure)
  • tilde differentiated out into tilde_observe and tilde_assume (resp. for dotted variants); inference algorithms can still use the old tilde form as fallbacks, but should be aware of this (and now also observe has access to variable name)
  • the methods of Type{Model} and Model have somewhat changed (includes removing runmodel! and how a VarInfo is filled)
  • In progress, but the semantics of tildes in the model macro will change to behave like =, i.e., their result is the observed/assumed value.
  • Remove @varinfo, @logpdf, and @sampler #71

These should all be updated in some place of the docs. Some in the Model Internals section, and the rest on the compiler design page.

And the "Advanced Usages" page on the Turing wiki is somehow gone? Does anyone know more about that?

VarInfo refactor

Directly copied from discussion in TuringLang/Turing.jl#965 to avoid side-tracking the PR:

I'm trying to work through this compiler change and have some thoughts on VarInfo that I would appreciate some feedback on. I've also left some naming-related comments that I believe would be helpful to address for readability's sake.

Regarding the internals of VarInfo / MetaData, there are a lot of things where you need to pull elements out of a field of metadata based on metadata.idcs[vn] e.g.

md.ranges[md.idcs[vn]]]
md.orders[md.idcs[vn]]
md.gids[md.idcs[vn]]
md.dists[md.idcs[vn]]
It's also the case that everything is linearised when running the model. While it's clearly necessary to be able to linearise the state of the model at some point for the current implementation of HMC, it's not clear to me that it need to be the default mode of operation, and appears to add considerable complexity to the internals.

As such I would like to propose the following design change for discussion (perhaps a separate issue is required - again, advice appreciated):

introduce a new type Variable with fields val, gid (possibly not this? Presumably it really belongs in the gibbs sampler object), name, dist, and any other information that needs to be associated with a particular random variable on a particular run. The key point here is that information specific to a particular RV is stored with a particular RV, so there would be no need to linearise the data into a vector or construct an index map from location(s) in vector to random variable.
refactor VarInfo to contain a Vector (or other container, obviously one that is amenable to Zygote-based AD and type-stability) of Variables, in addition to logp and num_produce.
This would:

enhance readability and reduce mental overhead (imho)
enable better mixing of types. Currently if the values of every variable have the same type (usually Float64) the vals vector is very simple. The proposed design avoids the need to store all of the things in one monolithic vector, thus avoiding the need for a widely typed Vector of values.
separate out linearisation of model state from running of the model. We would implement a separate function, call it to_vec, which would efficiently convert a VarInfo instance into a vector of numbers (potentially of a number of different types, or perhaps to a single mutually acceptable type), in addition to a function that converts a vector of the same length as the one produced into a VarInfo object that is identical to the current VarInfo object in every way other than the vals of the Variables. Separating things out like this has the distinct advantage that if linearisation is not required for a particular algorithm (e.g. MH, SMC, Gibbs sampling, particle MCMC etc) then linearisation never needs to happen. Conversely, if linearisation is required then it can still be achieved efficiently. Moreover, I anticipate linearisation becoming a less important feature over time as the AD systems develop and begin to rely on ChainRules / ChainRulesCore, which play nicely with interestingly-structured data types. It's also worth noting that indexing, especially scalar, is a particularly inefficient operation for most reverse-mode AD systems (inc. Zygote, Tracker) and this is unlikely to fundamentally change any time soon. As such, avoiding the conversion between linear Vector and structured data may well be desirable from a performance perspective as it is indexing-heavy.
create opportunities for mixing static and dynamic structure. It seems like this aught to make mixing of static and dynamic model components more straightforward, but this is more of a hunch than anything else as I'm not sure what that mixing really looks like at the minute.
Choice over the type of the Variable "vector". It could actually be a Vector, but this doesn't seem likely to lead to type stable code. Alternatively the variables could be stored in a NamedTuple if we're able to enumerate all possible variable names that could crop up in a programme from the programme code itself (this seems feasible to me, but I might well be missing something). If there exists an example where this doesn't hold (i.e. the number of possible variable names is dynamic -- it's fine for some of them not to exist dynamically) that would be very much appreciated.
Would appreciate any feedback. If I'm missing something glaringly obvious here, then possibly I'm missing some general piece of understanding regarding the internals.

Move tilde here and avoid VarInfo wrangling in Turing.Inference

I think tilde and dot_tilde belong here, with all their methods. Barebone assume, dot_assume, observe and dot_observe can also be defined in DynamicPPL and overloaded in Turing.Inference. Also all uses of VarInfo and VarName in Turing.Inference should be limited to a one-line function call with an intuitive name. This should limit the interaction of Turing.Inference and DynamicPPL.

AST Manipulation

Hi

Firstly thanks for your work on this project!

I'd like to suggest some changes to the compiler, specifically the additional use of MLStyle.jl for AST manipulation. Currently lots of the AST manipulation is done using MacroTools and indexing with if else. MacroTools by itself can be a bit low level creating tricky to understand code. MLStyle provides the @match macro which simplifies manipulating ASTs and makes the transformations very explicit.

Here's a short example

getargs_dottilde(x) = nothing
function getargs_dottilde(expr::Expr)
    # Check if the expression is of the form `L .~ R`.
    if Meta.isexpr(expr, :call, 3) && expr.args[1] === :.~
        return expr.args[2], expr.args[3]
    end

    # Check if the expression is of the form `(~).(L, R)`.
    if Meta.isexpr(expr, :., 2) && expr.args[1] === :~ &&
        Meta.isexpr(expr.args[2], :tuple, 2)
        return expr.args[2].args[1], expr.args[2].args[2]
    end

    return
end

To

function getargs_dottilde(expr)
    @match expr  begin
        Expr(:call, :.~, lexpr, rexpr) => (lexpr, rexpr)
        Expr(:., :~, Expr(:tuple, lexpr, rexpr)) =>  (lexpr, rexpr)
        Expr(:macrocall, _ , linenumber, innerexpr) => nothing
        _ => nothing
    end
end

What's particularly nice is that the matching expression is almost identical to the form printed by show_sexpr, this takes a lot of the mental work out of manipulating ASTs.

:((~).(x, Normal(μ, σ))) |> show_sexpr
# (:., :~, (:tuple, :x, (:call, :Normal, :μ, :σ)))

Is matched by

Expr(:., :~, Expr(:tuple, lexpr, rexpr)) =>  (lexpr, rexpr)

A current attempt can be found in this fork all the test are passing, look at build_model_info for a longer example. My hope would be using this style could open up other possibilities like more sophisticated syntax or better error messages etc. If you'd be interested in a such a a change I'd be happy to help.

Make tests independent of Turing

The problem with all the tests depending on Turing is that if a breaking change happens in DynamicPPL, there will be no version of Turing that supports this new change. So we will have to:

  1. Merge the DynamicPPL PR and release with the CI tests failing,
  2. Make a Turing PR and release that supports the new DynamicPPL release, and
  3. Hope that there were no bugs in the last DynamicPPL release.

This essentially renders the whole CI testing process useless, because I will be developing a version of DynamicPPL and another of Turing side by side on my machine and testing them together locally. Then the Travis tests will be ignored and I will only be trusting my local tests. The solution here is to have good DynamicPPL tests independent of Turing. This may mean implementing a minimal version of Turing inside the testing suite of DynamicPPL for use in testing.

Make VarInfo thread-safe

Currently, the logp accumulation in an @model block is not thread-safe. It would be nice to have a different logp accumulator in the VarInfo for each thread, separated by a cache line in memory to avoid false sharing. The final result can then be calculated by a single thread when getlogp is called. This would allow the user to write multi-threaded code involving ~ in the model for iid variables for instance.

prob macro - conflicting chain and variable on rhs

@trappmartin raised an issue in TuringLang/Turing.jl#997 about what happens when a variable, e.g. x below, is defined in chain but is also passed in explicitly. We can either throw an error (my favorite) or over-write the value from chain with the value passed in explicitly.

using Turing

@model demo(y, ::Type{T} = Float64) where {T} = begin
    x = Vector{T}(undef, 10)
    x .~ Normal()
    y .~ Normal.(x, 1.0)
end
chain = sample(demo(rand(10)), HMC(0.64, 1), 1000)

# log likelihood
logprob"y = rand(10) | model = demo, chain = chain, x = rand(10)" 

Optimizing access pattern for static and deterministic models

So after type stability issues are taken care of, one more optimization that can be worked on is how to best access fields in the VarInfo struct? The main set of optimizations I can think of make the assumption that:

  1. The model is static, i.e. has a fixed number of parameters, and
  2. The model has deterministic access pattern for the random variables, so any 2 variables are always evaluated in the same order.

This unlocks a whole set of possible optimizations for most if not all samplers, definitely HMC:

  1. We can use StaticArrays.MArray for the fields of TypedVarInfo since it won't need to grow in size.
  2. We can avoid the need for the idcs dictionary which looks up the variable by its VarName. Instead, we can sort the ranges and vals vectors by the access order of the parameters and only keep a counter for each symbol that gives us the index in ranges which gives us the index range in vals. This counter can be incremented inline in the model when expanding the @model macro. This has 2 effects. Firstly, we avoid the overhead of looking up the index in idcs and secondly we make sure the random variables are accessed in a contiguous manner in memory. This will be faster since TypedVarInfo is typically heap-allocated.

With the 2 optimizations in-place and a whole bunch of inlining, it may be possible for the Julia compiler to avoid allocating any memory on the heap for TypedVarInfo! The Julia compiler is good at eliding allocations when mutable objects only live within a single function body. This means if we make sure all the functions which use vi::TypedVarInfo are properly inlined, the compiler may be able to elide these allocations entirely. Unfortunately, this is not true for Base.Array since these are implemented in C so they have special semantics.

This optimization promises significant speedups for small and medium-sized, static, determinisitic models.

Allow non-Real values in UntypedVarInfo

It would be nice to allow arbitrary types in UntypedVarInfo instead of just Reals. This should allow Turing to be used with distributions that sample letters, words, trees, images, etc. without having to represent them as an array of reals. The conversion to TypedVarInfo should work fine whatever type is in UntypedVarInfo so this may be as simple as changing Real to Any in a bunch of places.

Models with dynamic dimensionality

This is an umbrella issue for models with dynamic dimensionality (or stochastic control flows).

Stochastic control flow related:

Support transformed RVs & track expressions

This is a summary issue of something floating around for quite some time.

We would like to be able to do this:

@model test(x) = begin
    m1 ~ Normal()
    @track m2 = f(m1) # some differentiable transformation
    x ~ Normal(m2, 1.0)
end

The resulting chain should contain states for m1, m2 respectively.

Comments by @mohamed82008 how to approach this:

  • Define a Deterministic distribution.
  • Make sure getspace doesn't return any variable symbol whose dists field isa Vector{<:Deterministic}. This is the trickiest part.
  • Define logpdf for Deterministic to return 0, rand to return the wrapped value and bijector to return Identity.
  • Define assume for Deterministic such that it writes down the value wrapped in the RHS to vi[vn] calling ForwardDiff.value, ReverseDiff.value or Tracker.data on the wrapped value before writing it to vi[vn]. We should probably define DistributionsAD.value to unify these.

Formalise `VarInfo`/`AbstractVarInfo` API

VarInfo currently implements and exports a large number of APIs, which sometimes can be redundant, confusing and hard to maintain. It is helpful to have a clear boundary what is internal to VarInfo and what can be reliably called by external modules such as sampling algorithms and the compiler itself.

Splitting varinfo.jl

I am really really in favor of splitting the gigantic varinfo.jl file into multiple files. It is currently doing a whole lot of things revolving around VarInfo but I think it can be managed better by splitting it into multiple files, most of which are relatively self-contained and do specific tasks.

For instance, the getindex, setindex! and push! code is almost 400 lines of code that all attempt to define these 3 functions only. Everything else is helper code. If we put these 400 lines in indexing.jl, we know that if we suspect a bug has something to do with indexing, that it's probably somewhere in this 400 line file which is a relatively self-contained rather than in a 1100 loc file. That makes development and debugging VarInfo-related code easier at least for me. Even if a bug has something to do with an interaction of 2 files. It is easier for me to inspect the 2 files simultaneously and understand what's happening by switching windows or tabs, than it would be if I had to scroll up then scroll down again repeatedly.

What do you all think?

Allow custom names without NamedDist

The NamedDist approach is awkward for naming variables and doesn't fully support .~. It would be nice to support custom names using the more straightforward syntax @rename(x ~ dist, name) or @rename(x .~ dist, name).

`Inf` is treated as variable rather than data

In the MWE below, Inf is treated as a variable rather than the literal Inf.

using Turing

@model mwe() = begin
    m ~ Normal(0, 1)
    Inf ~ Normal(m, 1)
end

sample(mwe(), HMC(0.1, 2), 100)

Turing models as distributions

I am opening this issue to propose a way to treat Turing models as distributions. After TuringLang/Turing.jl#997, we will have a nice syntax to query the logpdf of a NamedTuple of symbols given another NamedTuple of symbols, e.g. logprob"x = [1, 2] | y = [1, 1], model = mymodel". This takes care of the logpdf function in the Distributions API. So if we can also have a way to sample a NamedTuple from the probability distribution P(x | y= [1, 1], model = mymodel) and the likes, we can define a simple syntax that generates such distribution structs from Turing models, something like prob"x | y = [1, 1], model = mymodel". Defining the posterior predictive distribution from there is also somewhat easy. The question is how to sample NamedTuples from Turing models.

Initial proposal

Let the inner function called when calling an instance of the Turing.Model be:

function inner_function(vi, sampler, ctx, model)
     #function body
end

Let's assume we have random variables s and m in the model. The idea I initially had was to replace the above expression with:

function inner_function(vi, sampler, ctx, model)
    if ctx isa DistributionContext
        s = missing
        m = missing
    end
    @inline function f()
        #function body
    end
    model_output = f()
    if ctx isa DistributionContext
        return (s = s, m = m)
    else
        return model_output
    end
end

Semantics

Calling the model in the DistributionContext will therefore return us the NamedTuple of random variables sampled. The DistributionContext can wrap another context and forward the tilde and dot_tilde functions to the context wrapped. This will give us a way to return the sampled random variables from the model. If a random variable wasn't sampled, missing will be returned. If a vector or matrix random variable can be partially sampled, i.e. the number of random variables is itself random, we can tell the users to initialize such variables using:

x = Vector{Union{Missing, T}}(undef, 10)`

for example inside the model body. The unsampled random variables will therefore be missing. These semantics for "sampling from Turing models" seem reasonable to me even in the case when the number of random variables is itself random.

Performance and alternative implementation

Given that we inline f, I thought the Julia compiler should be able to optimize out s = missing and m = missing if s and m were defined inside #function body, because the compiler does optimize these away when I don't use a closure. Unfortunately, the following minimal example doesn't infer properly:

function f(a)
    # initialize
    if a isa Int
        b = missing
    end

    @inline function g()
        # model body starts
        b = [1, 1]
        # Can have more stuff here and multiple return statements in branches
        return 2*b
        # model body ends
    end
    out = g() # overwrite `b` if `a isa Int`

    # Return a namedtuple or the output of the model depending on the type of the input `a` (or ctx in the real example)
    if a isa Int
        return (b = b,)
    else
        return out
    end
end
@code_warntype f(1) # f(1.0)

and according to Keno Fischer on Slack, this is "a limitation that needs to be addressed properly at some point". The problem is basically the closure.

So my alternative solution is to do:

function inner_function(vi, sampler, ctx, model)
    if ctx isa DistributionContext
        s = missing
        m = missing
    end
    #function body
    model_output = nothing
    @label end_of_func
    if ctx isa DistributionContext
        return (s = s, m = m)
    else
        return model_output
    end
end

and inside #function body, we replace every return ... statement with:

model_output = ...
@goto end_of_func

and every return statement with:

@goto end_of_func

This eliminates the use of a closure and therefore should infer properly. At least, the following minimal example does:

function f(a)
    # initialize
    if a isa Int
        b = missing
    end

    # model body starts
    # can have arbitrary code here with multiple return points, some or all the random variables can be defined
    b = [1, 1]
    if sum(b) > 0.5
        out = 2*b
        @goto end_of_func
    else
        out = 3*b
        @goto end_of_func
    end
    # model body ends
    out = nothing # the function output is nothing if we made it here

    @label end_of_func

    # Return a namedtuple or the output of the model depending on the type of the input `a` (or ctx in the real example)    
    if a isa Int
        return (b = b,)
    else
        return out
    end
end
@code_warntype f(1)

Allow users to use their own macros and `~` inside a DPPL model with $

It would be nice to allow users to use $ around some expression (e.g. a macro call) to avoid any fiddling with it on the DPPL side. This for example allows users to make use of their own DSL inside of a @model block. This can include their own ~ and .~ and what not that we use in DPPL but they can have different semantics inside their block of code.

[RFC] Can the model function be made a custom closure type?

Currently, a model

@model mymodel(y) = begin
  # model code
end

expands to about this (things ending in _1 are gensymed):

function mymodel_1(y)
  function inner_function(vi, spl, ctx, model)
     local y
     # transformed model code
  end
  
  return Model(inner_function, (y = y,), ModelGen{(:y,)}(mymodel_1, NamedTuple()))
end

mymodel = ModelGen{(:y,)}(mymodel_1, NamedTuple())

Would it be feasible to make the inner function a custom closure being a subtype of a common abstract base, like the following:

struct Mymodel_Function_1 <: AbstractModelFunction end
const inner_function = Mymodel_Function_1()

function (::Mymodel_Function_1)(vi, spl, ctx, model)
   local y
   # transformed model code
end

function mymodel_1(y)
  return Model(inner_function, (y = y,), ModelGen{(:y,)}(mymodel_1, NamedTuple()))
end

mymodel = ModelGen{(:y,)}(mymodel_1, NamedTuple())

where AbstractModelFunction <: Function.

The reason is that I would like to dispatch on the call of the model function with IRTracker, and inner_function being an anonymous closure makes this a bit difficult. I don't see anything that would speak against it -- if you agree, I'll make a PR.


Since this makes things even more layered, some further speculation: what complications are there that make something like the following version difficult?

struct Mymodel_1 <: Model{(:y,)}
  y
  missings
end

function (model::Mymodel_1)(vi, spl, ctx)
   local y
   # transformed model code
end

function mymodel(y)
	return Mymodel_1(y, getmissings(y))
end

`.~` is broken for `MvNormal`

using Turing

@model mwe(::Type{T}=Float64) where {T<:Real} = begin
    m = Matrix{T}(undef, 2, 3)
    m .~ MvNormal(zeros(2), 1)
end

sample(mwe(), HMC(0.2, 4), 1_000)

gives

MethodError: no method matching dim(::MvNormal{Float64,PDMats.ScalMat{Float64},Array{Float64,1}})
Closest candidates are:
  dim(!Matched::PDMats.PDMat) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/PDMats/nKT0N/src/pdmat.jl:30
  dim(!Matched::PDMats.PDSparseMat) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/PDMats/nKT0N/src/pdsparsemat.jl:30
  dim(!Matched::PDMats.PDiagMat) at /afs/inf.ed.ac.uk/user/s16/s1672897/.julia/packages/PDMats/nKT0N/src/pdiagmat.jl:26
  ...

High dimensional Gaussian is slow

High dimensional Gaussian in Turing is slow (> 1 hour):

using Turing

@model high_dim_gauss(D) = begin
    m ~ MvNormal(fill(0, D), 1)
end

chain = sample(high_dim_gauss(10_000), HMC(0.1, 4), 2_000, progress_style=:plain)

Corresponding AHMC + Tracker impl. is fast (~ 5.5 seconds):

using Distributions, Tracker, AdvancedHMC

D = 10_000

ℓπ(x) = logpdf(MvNormal(fill(0, D), 1), x)

function ∂ℓπ∂θ(x)
    y, back = Tracker.forward(ℓπ, x)
    return (Tracker.data(y), Tracker.data(back(1)[1]))
end

metric = UnitEuclideanMetric(D)
h = Hamiltonian(metric, ℓπ, ∂ℓπ∂θ)

samples, stats = sample(h,  StaticTrajectory(Leapfrog(0.1), 4), rand(D), 2_000; verbose=false, progress=false)

[RFC] Distinguishing tilde methods for passing variable names to both variants

Currently, there are essentially two methods of tilde (resp. dot_tilde) -- one for assume, one for observe. These are only distinguished by the assume variant being passed the VarName (and indices).

Can we just have different functions for them, like tilde_observe and tilde_assume, with both taking the VarName? The motivation is that I want to use IRTracker for dependency analysis, for which having the name would really be a nice (i.e., necessary) thing.

The name of an observed variable would just be ignored in all existing implementations, but who knows, maybe it can become useful. I think both methods could actually fall back to tilde for transition. Do any other packages besides Turing itself use them at all?

BTW: what exactly is the reason that assume/observe are not directly produced from the model macro?

Poor support of using column in IO

Model using : below is poorly supported to fetch variables from Chain.

@model ldamodel(K, V, M, N, w, doc, alpha, β) = begin
  theta = Matrix{Real}(M, K)
  for m = 1:M
    theta[m,:] ~ Dirichlet(alpha)
  end

  phi = Matrix{Real}(K, V)
  for k = 1:K
    phi[k,:] ~ Dirichlet(β)
  end

  # z = tzeros(Int, N)
  # for n = 1:N
  #   z[n] ~ Categorical(theta[doc[n]])
  # end

  for n = 1:N
    w[n] ~ Categorical(phi' * theta[doc[n],:])
  end

end

Have to write variables in type of nest arrays at this moment.

Respect user-provided function signatures

Currently, when a user defines a model

@model function test(x, y)
	x ~ Normal(0, 1)
	y ~ Normal(x, 1)
end

then we define both test(x, y) = ModelGen(....) and test(; x, y) = test(x, y). Since we do not define an arbitrary macro syntax but parse Julia functions, we should maybe stick more to the Julia syntax for functions and respect the user-provided function signatures, i.e., only define the function test(x, y) if the user only provides keyword arguments. If the user wants a function with keyword arguments, she would have to write

@model function test(; x, y)
	x ~ Normal(0, 1)
	y ~ Normal(x, 1)
end

instead. Users could still provide default arguments for both positional and keyword arguments but the behaviour would be the same as for a regular Julia function - keyword arguments you can specify in any order but positional arguments require a user to specify arguments in a specific order. Internally we would still keep a named tuple of all arguments (positional + keyword arguments), so the change would only be user-facing by not providing more definitions of test than what the user wrote down explicitly. IMO that would be less surprising for users since @model would not change anymore how the function signature is interpreted.

@mohamed82008 mentioned on Slack that this might cause problems with the string macros (logpdf"...." etc.). Can you elaborate what you had in mind? I assumed it would not affect any functionality but just require you to specify test(missing, missing) explicitly if x and y are not observed.

prob macro - pre-allocation syntax

@trappmartin brought up an issue in TuringLang/Turing.jl#997 about the syntax for pre-allocating varinfo. After the PR is merged, we can do:

using Turing

@model demo(y, ::Type{T} = Float64) where {T} = begin
    x = Vector{T}(undef, 10)
    x .~ Normal()
    y .~ Normal.(x, 1.0)
end
xval = rand(10)
varinfo = Turing.VarInfo(demo(xval))
logprob"y = rand(10) | model = demo, x = xval, varinfo = varinfo" 

Pre-allocating is useful if we need to change xval or y in a loop and re-evaluate. Pre-allocating varinfo can be useful in these cases. However, as @trappmartin indicated, the syntax seems to imply we are conditioning on varinfo which we are not. One solution proposed in the PR was to use ; instead of , before varinfo. Another one that I prefer is to not expose this in the macro at all, and only allow it in the functional API under the hood.

Forbid variables with same name

This is a unsolved point mentioned in TuringLang/Turing.jl#96

Forbid variables with same name, with the following exceptions

  • Loops, e.g. for i=1:N, x ~ Normal(0,1); end
  • Recursions, e.g. f(x) = if x > 1, f(x-1); else x ~ Normal(0,1); end
  • If statements, e.g. if t, x ~ Normal(0,1); else x ~ Normal(0,4); end
  • N.B.: the following case is forbidden: f() = begin x ~ Normal(0,1); x ~ Bernoulli(0.5) end, because variable x is defined twice at different places.

[RFC] VarName indices: something better than strings?

Is there a particular reason strings are used, as opposed to some structured type?

What kinds of indexing are admissible, anyway? Would an equivalent of Tuple{Vararg{Union{Int, Colon, UnitRange{Int}}}} suffice? Like, what @vinds is already doing?

Gripes and anti-gripes about VarInfo

While working on #115, I came to realize that we may be too harsh in our criticism of VarInfo's complexity. This issue tries to explain some of the design choices we have in varinfo.jl which should spiral into a discussion on how to improve things. This issue came to being after a discussion with Hong on slack regarding #118 which I am still very much in favor of.

Function redundancy is a big concern you may have if you have played around with varinfo.jl for a while. The main one I can think of is getval vs getindex and setval! vs setindex!. The way I think about it is that getval and setval! get and set the vectorized version of the value. So getval returns a vector and setval! expects a vector in even if the value was originally a scalar or matrix. getindex and setindex! on the other hand should reconstruct the shape of the original variable, and link/invlink the value based on whether the VarInfo is working in terms of the transformed or original values. Currently, setindex! doesn't do exactly that and but it was "fixed" in #115 to do exactly the above. So r = vi[vn] (or vi[vn, dist] in the PR) will always return a value in the domain of vn's distribution with the right shape and vi[vn] = r expects r to be in the domain of the distribution and in its natural shape.

Another redundancy that we have comes from the need for r = vi[vn] and vi[vn] = r but also r = vi[spl] and vi[spl] = r. The first 2 are used in a model call, while the other 2 are used in the step! function. These need to be defined for UntypedVarInfo as well as TypedVarInfo. But imo not much can be done about the need for those functions. VarInfo needs to define those functions one way or another because they are the main API of VarInfo.

Other than the above redundancy, we have many getter and setter functions in varinfo.jl. I think each one of those functions is used at least once. For example, we need to be able to update the gids of a variable to assign a sampler to it. This is because new variables can pop up in dynamic models and they need to be assigned to HMC for example in the next HMC call. Getters and setters for variable ranges, distributions, logp and flags are all necessary. Other functions like empty!, haskey, syms, tonamedtuple, etc are all important utility functions to have as well.

Finally regarding unit tests, while we are not exactly perfect in that department, our coverage is at 75%. The coverage of varinfo.jl is at 83%. We could definitely do better in terms of splitting Turing and DPPL tests but completely separating the 2 has proven somewhat difficult thus far. This is mostly because we need a Gibbs-HMC sampler to test the Gibbs-HMC specific components of DPPL. We need a particle sampler to test the particle sampling specific components of DPPL, etc. Will it be possible one day? I hope but this isn't the first heavily interlinked package duo that exists in the Julia ecosystem. Sometimes the need arises for 2 separate development cycles in a heavily interlinked package resulting in the splitting of the package into 2 heavily interlinked packages with 2 development cycles. It's not ideal but it's arguably better than combining the 2 packages again mostly for development needs. This is exactly the situation we are in the Turing-DPPL situation.

Now, back to why I am opening this issue. This isn't just me defending the status quo. I understand there are a few similar issues discussing problems or proposed improvements to VarInfo, e.g. #5, #7, #16, #18 and #68. I am open to criticism if you disagree with anything I said above or think there is a better way to do things but please be specific. For example, here are some questions for you to think about:

  1. Which function exactly could use a docstring but doesn't?
  2. Which internal function/method exactly can be made redundant?
  3. Which exported/API function can be made redundant?
  4. Which additional API function would make your life easier when developing Turing?
  5. How exactly do you propose to further split Turing and DynamicPPL tests?

@devmotion has brought up more than once his preference to design the VarInfo data structure around unvectorized values. I understand the appeal of that but I don't think it will necessarily simplify the code by a lot. We still need to keep track of distributions, sampler gids and varnames. We still need to get a vectorized form and set a vectorized form of the values for HMC samplers. We still need to cater for variables disappearing and popping at any time. We still need specialize the type of VarInfo to cater for mixed variable types and automatic differentiation in a type stable way. All of this would still need to be done. Will it be simpler? Maybe, but I doubt it will be much simpler.

Anyways, this issue grew to be longer than I would have liked, but please let me know what you think, whether you agree or disagree with anything I said above. Thank you.

Gibbs sampling - Use blocks to reduce computation

I want to use Turing for the GP-SSM model, which is based on the PGAS algorithm. The implementation is actually straight forward thanks to Turing. However, I have noticed that the Gibbs sampler runs the whole model for every subsampler. For large models this implies that there is a huge computational overhead. One way to solve this issue would be by implementing some "hacky" macros, like:

@condition_on (:x, ;y) = begin
# Do some code here which is only executed when :x or :y appears in the sampling space
end

which would be changed to something like :

if :x in spl.alg.space || :y in spl.alg.space
# Run the code above
end

This makes the model rather difficult to design because it might not be obvious what belongs inside the brackets and what not. @torfjelde had the idea to do something like the following instead :

var = @condition_on (:x, ;y) = begin
# Compute var
end

which would be changed to something like :

if :x in spl.alg.space || :y in spl.alg.space
# Compute var
cache["var"] = var
else
var = cache["var"]
end

I am anyway planning to implement this macro for my own use and I was thinking that this might be interesting for Turing in general. What do you think?

Taking stochastic control flow a bit more seriously

Currently, we support a variable number of random variables but only when the random variables are elements of an array that's always at least partially sampled. That is each "symbol" in the model is assumed to be visited every time you run the model. This is enough most of the time and indeed one can group together all optional parameters in a vector and use this method, however it fails to address models of the form:

@model demo(x) = begin
    a ~ Uniform()
    if a > 0.5
        b ~ Normal(a)
        x ~ Normal(b)
    else
        c ~ Uniform()
        x ~ Normal(c)
    end
end

Note that b and c will not always be "visited". The current VarInfo setup will therefore fail in those cases. One way to address this is to have an UntypedVarInfo stored inside the TypedVarInfo for previously unseen variable symbols. Then during the sampling, we can check at every iteration if there are new symbols sampled. If so, we create a new spl::Sampler whose spl.state.vi is an instance of the expanded TypedVarInfo. If we do this right, I think we will not be paying any penalty in the case where all the symbols are always visited, and we may need to pay only a small dynamic dispatch price per new symbol seen, keeping the remaining of the code type stable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.