charlesknipp / sequential_monte_carlo Goto Github PK

This module is an efficient and flexible implementation of various Sequential Monte Carlo (SMC) methods. Bayesian updates occur for both latent states and model parameters using joint inference.

Julia 100.00%

julia sequential-monte-carlo mcmc-methods

sequential_monte_carlo's Introduction

SequentialMonteCarlo.jl

SequentialMonteCarlo.jl presents a set of particle filters and Sequential Monte Carlo algorithms for both latent state and model parameter inference.

State Space Models

To begin using this module, we must first properly establish the construction of state space models (SSMs) under my framework. Typical model construction requires a small set of constructors for: (1) an initial distribution, (2) a transition density, and (3) an observation density.

To construct a univariate linear Gaussian SSM, we can define the following constructor, where θ::Vector{Float64} represents a vector of parameters.

lg_mod(θ) = StateSpaceModel(
    LinearGaussian(θ[1],1.0,θ[2],θ[3],0.0),
    (1,1)
)

Calling lg_mod(⋅) constructs a StateSpaceModel object, which can be called to simulate data, or to be used in a particle filter. As long as the model in question has the required methods defined. For demonstration define lg_example and simulate 100 periods.

lg_example = lg_mod([0.5,0.9,0.8])
_,y = simulate(lg_example,100)

This construction weighs heavily on multiple dispatch, which is a nice feature of Julia but also requires more work from the user. As such, a better method is in the works.

Particle Filters

Particle filters are primarily used for online inference for latent states, which is ideally defined as a mutating operation given a set of particles x and weights w.

Suppose we observe a vector y simulated from model lg_example. To construct a bootstrap filter with 1024 particles and capture summary statistics at each time period we do the following:

# preallocate quantile vector and log likelihood
xq = fill(zeros(Float64,3),length(y))
logZ = 0.0

# initialize bootstrap filter at time t=1
x,w,logμ = bootstrap_filter(1024,y[1],lg_example)

xq[1] = quantile(x,[0.25,0.5,0.75])
logZ += logμ

# subsequent iterations of the bootstrap filter
for t in 2:eachindex(y)
    # run filter and print ess
    logμ,w,ess = bootstrap_filter!(x,w,y[t],lg_example)
    @printf("t = %4d\tess = %4.3f",t,ess)

    # update summary statistics
    xq[t] = quantile(x,[0.25,0.5,0.75])
    logZ += logμ
end

It should be noted that x and w are both operated in-place and thus update with each call of bootstrap_filter!(). Furthermore, to avoid such long winded code, one can instead call the function log_likelihood() to run a particle filter over all periods of a model and return the log likelihood (or logZ as its referred to above).

# for a bootstrap filter leave the proposal argument empty
logZ = log_likelihood(1024,y,lg_example)

The construction of these particle filters is not perfect on its own, since it is meant specifically to perform joint estimation. However, they are fully functional and meant to be flexible enough when called in high volumes. Something that LowLevelParticleFilters.jl actually fails to do (more on this in issue...).

Joint Inference

Suppose a given state space model has an uncertain construction such that the model parameters are unobserved. Now the problem becomes twofold: what can we infer about the latent states as well as the parameters?

To solve this problem, I introduce two main algorithms: SMC² and density tempered SMC.

Running these algorithms requires a prior for the parameter space as well as a model constructor as we defined above with lg_mod(). Below we define lg_prior such that length(lg_prior) == length(θ) where θ is the input to the function lg_mod.

# model constructor
lg_mod(θ) = StateSpaceModel(
    LinearGaussian(θ[1],1.0,θ[2],θ[3],0.0),
    (1,1)
)

# prior definition
lg_prior = product_distribution([
    TruncatedNormal(0,1,-1,1),
    LogNormal(),
    LogNormal()
])

Upon declaration of the prior and model constructor, we define a generic SMC sampler with 513 parameter particles, 1024 state particles, and ESS threshold of 0.5, and 3 MCMC steps. To demonstrate, we can run density tempered SMC like so...

lg_dt_smc = SMC(512,1024,lg_mod,lg_prior,3,0.5)
density_tempered(lg_dt_smc,y)

For an online algorithm like SMC², we treat it similarly to the particle filter by using mutating functions to change properties of lg_smc² as time progresses.

lg_smc² = SMC(512,1024,lg_mod,lg_prior,3,0.5)
smc²(lg_smc²,y)

for t in 2:T
    smc²!(lg_smc²,y,t)
end

sequential_monte_carlo's People

Contributors

Watchers

sequential_monte_carlo's Issues

Duan and Fulop Setup

Overview

Duan and Fulop consider a vector of parameters θ = [A,B,ln(Q),ln(R)] where A and B are scalar multipliers bounded on an open unit interval. We define Q and R as model variance, but encased in a log function to preserve positive definiteness. They consider a multivariate normal distribution with identity variance Σ = I(4) and mean zero μ = zeros(4).

To run their algorithm they generate a sample of T = 1000 and initialize an algorithm with M = 1024 parameter particles and N = 512 state particles. For rejuvenation, the random walk occurs 10 times (we can adjust this accordingly) and occurs when ESS ≤ M/2.

Implementation

Let θ = [A,B,ln(Q),ln(R)] similar to the paper. The prior is thus constructed as follows.

θ0 = [0.25,0.475,ln(1.5),ln(0.475)]
prior = TruncatedMvNormal(θ0,I(4),[-1.0,-1.0,-Inf,-Inf])

The problem is that I do not need to use a truncated normal for the last two elements in the vector θ so it is inefficient for me to construct.

Model Structure and Redundancies

As it currently stands StateSpaceModel lacks important features that are useful in parameter estimation.

struct StateSpaceModel <: AbstractSSM
    # define general structure using functions
    transition::Function
    observation::Function
    
    dim_x::Int64
    dim_y::Int64
end

One solution to this problem is to consider a parametric object StateSpaceModel{Model<:ModelParameters} which is convenient for storage and calling methods inherent to the object type, but introduces redundancies when working over a large set of model parameters of the same type.

struct StateSpaceModel{Model<:ModelParameters} <: AbstractSSM
    # store parameter type
    parameters::Model

    # define general structure using functions
    transition::Function
    observation::Function

    dim_x::Int64
    dim_y::Int64
end

To construct a StateSpaceModel{T} we now have to write a new default constructor. This process is relatively easy and efficient, but requires the storage of the parametric type in struct itself; intuitively this should not be necessary, since the creation of the object requires an argument subtyped from ModelParameters already.

function StateSpaceModel{ModelType}(params::AbstractVector) where ModelType
    # assuming this constructor exists on a given model
    return StateSpaceModel(ModelType(params...))
end

Furthermore, storing the dimension of x and y is a redundancy which can be classified when working on a larger set of models.

Reworked SSM Objects

using Random,LinearAlgebra,Distributions
using BenchmarkTools

The idea here is to present a more flexible instance of StateSpaceModel objects which allows for initial values other than x0 = 0 and Σ0 = 1

const ScalarOrArray{T} = Union{T,Array{T}}

abstract type AbstractStateSpaceModel{T<:Real} end

struct LinearGaussian{T<:Real} <: AbstractStateSpaceModel{T}
    A::Matrix{T}
    B::Matrix{T}
    Q::Matrix{T}
    R::Matrix{T}

    μ0::Array{T}
    Σ0::Matrix{T}
end

function LinearGaussian(
        A::ScalarOrArray{T},B::ScalarOrArray{T},
        Q::ScalarOrArray{T},R::ScalarOrArray{T},
        μ0::ScalarOrArray{T},Σ0::ScalarOrArray{T}
    ) where {T<:Real}

    # dimensions of recorded sequence
    dim_x = size(A,1)
    dim_y = size(B,1)

    # dimensions of random vectors
    dim_δ = size(Q,2)
    dim_ε = size(R,2)

    A = reshape(vcat(A),dim_x,dim_x)
    B = reshape(vcat(B),dim_y,dim_y)

    Q = reshape(vcat(Q),dim_x,dim_δ)
    R = reshape(vcat(R),dim_y,dim_ε)

    μ0 = reshape([μ0;],dim_x)
    Σ0 = reshape(vcat(Σ0),dim_x,dim_x)

    return LinearGaussian{T}(A,B,Q,R,μ0,Σ0)
end

# this method may not work, so I should play around with it
function LinearGaussian(
        A::ScalarOrArray{<:Real},B::ScalarOrArray{<:Real},
        Q::ScalarOrArray{<:Real},R::ScalarOrArray{<:Real},
        μ0::ScalarOrArray{<:Real},Σ0::ScalarOrArray{<:Real}
    )

    T = Base.promote_eltype(A,B,Q,R,μ0,Σ0)

    return LinearGaussian(
        convert(ScalarOrArray{T},A),convert(ScalarOrArray{T},B),
        convert(ScalarOrArray{T},Q),convert(ScalarOrArray{T},R),
        convert(ScalarOrArray{T},μ0),convert(ScalarOrArray{T},Σ0)
    )
end

dims(mod::LinearGaussian) = (size(mod.A,1),size(mod.B,1))


function transition(mod::LinearGaussian,xt::AbstractVector{<:Real})
    return MvNormal(mod.A*xt,mod.Q)
end

function observation(mod::LinearGaussian,xt::AbstractVector{<:Real})
    return MvNormal(mod.B*xt,mod.R)
end

function initialize(mod::LinearGaussian)
    return MvNormal(mod.μ0,mod.Σ0)
end


function simulate(rng::AbstractRNG,mod::AbstractStateSpaceModel,T::Int64)
    dx,dy = dims(mod)

    x = Matrix{Float64}(undef,dx,T)
    y = Matrix{Float64}(undef,dy,T)
    
    x[:,1] = rand(rng,initialize(mod))

    # try experimenting with views and inbounds
    for t in 1:T-1
        x[:,t+1] = rand(rng,transition(mod,x[:,t]))
    end

    for t in 1:T
        y[:,t] = rand(rng,observation(mod,x[:,t]))
    end

    return x,y
end

To test this method in practice, let's apply it to a bootstrap filter.

function bootstrapFilter(rng::AbstractRNG,y::Matrix,mod::AbstractStateSpaceModel,N::Int64)
    T  = length(y)
    xs = rand(rng,initialize(mod),N)
    qs = Matrix{Float64}(undef,T,3)

    for t in 1:T
        xt = similar(xs)
        lw = Vector{Float64}(undef,N)

        for n in 1:N
            rand!(rng,transition(mod,view(xs,:,n)),view(xt,:,n))
            lw[n] = logpdf(observation(mod,view(xt,:,n)),view(y,:,t))
        end

        w = normalize(lw)
        xs .= wsample(rng,xt,w,N)'

        qs[t,:] .= quantile(vec(xs),[0.25,0.5,0.75])
    end
    return qs
end

# test simulations
test_mod = LinearGaussian(1.0,1.0,1.0,1.0,0.0,1.0)
test_rng = Random.seed!(1234)

x,y = simulate(test_rng,test_mod,10)

test_mod2 = LinearGaussian(Matrix(1.0I(2)),Matrix(1.0I(2)),Matrix(1.0I(2)),Matrix(1.0I(2)),zeros(2),Matrix(1.0I(2)))

x2,y2 = simulate(test_rng,test_mod2,10)

Complete Particle Degeneration in SMC²

In the inflation example, I run a few different models over the FRED data. When running SMC² for the UC model, complete particle degeneration occurs at 2 points in this process. Let's look at the last.

SMC² Log

Looking at the log, we observe this process getting stuck at t=196 and fixing itself at t=212 when only one particle is considered an effective estimation. From here, the sampler knows that

...
t =  193        ess = 363.557
t =  194        ess = 234.327   [rejuvenating]  acc_rate: 0.28320
t =  195        ess = 459.419
t =  196        ess = 212.097   [rejuvenating]  acc_rate: 0.00000
t =  197        ess = 62.057    [rejuvenating]  acc_rate: 0.00000
t =  198        ess = 232.472   [rejuvenating]  acc_rate: 0.00000
t =  199        ess = 224.963   [rejuvenating]  acc_rate: 0.00000
t =  200        ess = 39.595    [rejuvenating]  acc_rate: 0.00000
t =  201        ess = 48.292    [rejuvenating]  acc_rate: 0.00000
t =  202        ess = 54.444    [rejuvenating]  acc_rate: 0.00000
t =  203        ess = 48.497    [rejuvenating]  acc_rate: 0.00000
t =  204        ess = 91.183    [rejuvenating]  acc_rate: 0.00000
t =  205        ess = 136.884   [rejuvenating]  acc_rate: 0.00000
t =  206        ess = 51.889    [rejuvenating]  acc_rate: 0.00000
t =  207        ess = 60.103    [rejuvenating]  acc_rate: 0.00000
t =  208        ess = 20.934    [rejuvenating]  acc_rate: 0.00000
t =  209        ess = 102.997   [rejuvenating]  acc_rate: 0.00000
t =  210        ess = 41.018    [rejuvenating]  acc_rate: 0.00000
t =  211        ess = 7.715     [rejuvenating]  acc_rate: 0.00000	
t =  212        ess = 1.000     [rejuvenating]  acc_rate: 1.00000
t =  213        ess = 503.637
t =  214        ess = 478.646
t =  215        ess = 421.993
...

Since we only care about degeneration, let's filter this down to observations with sufficiently low acceptance ratios. To track the degeneration process, we will track the movement sizes via the covariance:

ess = 62.057    norm(cov(θ)) = 0.4448203    norm(wcov(θ)) = 0.5533801
ess = 232.472   norm(cov(θ)) = 0.5406501    norm(wcov(θ)) = 0.5991085
ess = 224.963   norm(cov(θ)) = 0.5856705    norm(wcov(θ)) = 0.6205049
ess = 39.595    norm(cov(θ)) = 0.6291859    norm(wcov(θ)) = 0.6299769
ess = 48.292    norm(cov(θ)) = 0.6461557    norm(wcov(θ)) = 0.0635495
ess = 54.444    norm(cov(θ)) = 0.0658842    norm(wcov(θ)) = 0.0047175
ess = 48.497    norm(cov(θ)) = 0.0050397    norm(wcov(θ)) = 0.0025555
ess = 91.183    norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 136.884   norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 51.889    norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 60.103    norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 20.934    norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 102.997   norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 41.018    norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 7.715     norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000
ess = 1.000     norm(cov(θ)) = 0.0000000    norm(wcov(θ)) = 0.0000000

Thus somewhere around t = 203 brings covariance down to almost nothing. Although, the norms are not exactly 0; this occurs exactly at the point where ess = 1.000 which finally allows new move steps to be accepted. Below are the last weighted covariance matrices.

wcov(θ) = 3×3 Array{Float64,2}:
  7.88861e-31   1.2326e-32   -9.86076e-32
  1.2326e-32    1.92593e-34  -1.54074e-33
 -9.86076e-32  -1.54074e-33   1.2326e-32

wcov(θ) = 3×3 Array{Float64,2}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0

But Why?

I'm struggling to understand why degeneration even occurs since we specifically push it through a rejuvenation step. Furthermore, increasing the number of MCMC steps seems to do very little in terms of improvement; in fact, it actually drags this problem on longer since there are more chances of accepting a move step.

This may be a potential underflow problem caused by taking the log of the deepcopy of ω, but I have my doubts. It seems like a more systematic problem especially since this occurs consistently at the same observation.

Lastly, I think this could be a problem with the model itself. It seems to be less prominent in the UCSV report, but again there is no way to know for certain unless I want to solve this analytically. Moreover, maybe my specification of the priors is malformed, although I remember testing this case and coming up with similar results.

Needs Further Investigation

Look into exactly why the program rejuvenates only when ess = 1.0. This is not well understood
Look at both the log parameter weights and the log likelihoods to investigate underflows
Run this without multithreading to see if there's a race condition I missed [highly unlikely]

SMC² State Particle Time Indexing

In the newly implemented SMC² algorithm, the indexing of state particles is rather odd. It will propagate normally on the first few steps, but after awhile with sufficient parameter particles, the filter will have gone through far more iterations than the parameter particle time index suggests.

Essentially problems arise because smc².filters.state.t[] is far greater than the "supposedly" equivalent smc².parameters.t[]. This could be a multitude of reasons, but my guess is the indexing of Reference objects is confused when changing an immutable object.

This is a problem I have had to deal with before, but have never found an apt solution which does not involve a complete redesign of the particle filter object.

Particle filter are too slow

Consider a basic bootstrap filter; in the current state of this repository, it is fully functional and supports a wide variety of user defined models over a cacophony of distributions.

However, the current implementation relies on distributions being multivariate. Thus each particle set is a vector of static vectors, and because of the nature of Distributions.jl it is impossible to generalize rand(x::Sampleable) to fill an array given a univariate distribution. This becomes problematic because the filter must evaluate a boolean at each step of the filter, which is the obvious remedy to support univariate distributions.

A better solution would be to avoid the use of static arrays in the case of univariate distributions. This would reduce the computational strain added by multivariate distributions as well as reduce the overall number of allocations generated at each step.