Giter Club home page Giter Club logo

idbrms's Introduction

idbrms: Infectious Disease Modelling using brms

Lifecycle: experimental R-CMD-check Codecov test coverage

Provides population-level infectious disease models as an extension of brms.

Installation

You can install the unstable development version from GitHub with:

# install.packages("devtools")
devtools::install_github("epiforecasts/idbrms")

Example

  • Load idbrms, brms, data.table (used for manipulating data), and ggplot2 for visualisation.
library(idbrms)
library(brms)
library(data.table)
library(ggplot2)
  • Define simulated data containing cases of some infectious disease over time in a single region (with an initial growth of 20% a day followed by a decline of 10% a day). We then assume that deaths are a convolution of cases using a log normal distribution with a log mean of 1.6 (standard deviation 0.2) and a log standard deviation of 0.8 (standard deviation 0.2), and are then scaled by a fraction (here 0.4 (standard deviation 0.025)).
# apply a convolution of a log normal to a vector of observations
weight_cmf <- function(x, ...) {
  set.seed(x[1])
   meanlog <- rnorm(1, 1.6, 0.1)
   sdlog <- rnorm(1, 0.6, 0.025)
   cmf <- (cumsum(dlnorm(1:length(x), meanlog, sdlog)) -
     cumsum(dlnorm(0:(length(x) - 1), meanlog, sdlog)))
   conv <- sum(x * rev(cmf), na.rm = TRUE)
   conv <- rpois(1, round(conv, 0))
  return(conv)
}

obs <- data.table(
  region = "Glastonbury", 
  cases = as.integer(c(10 * exp(0.15 * 1:50), 
                       10 * exp(0.15 * 50) * exp(-0.1 * 1:50))),
  date = seq(as.Date("2020-10-01"), by = "days", length.out = 100))
# roll over observed cases to produce a convolution
obs <- obs[, deaths := frollapply(cases, 15, weight_cmf, align = "right")]
obs <- obs[!is.na(deaths)]
obs <- obs[, deaths := round(deaths * rnorm(.N, 0.25, 0.025), 0)]
obs <- obs[deaths < 0, deaths := 0]
  • Visual simulated data (columns are cases and points are deaths).
ggplot(obs) +
  aes(x = date, y = cases) +
  geom_col(fill = "lightgrey") +
  geom_point(aes(y = deaths)) +
  theme_minimal()

  • Prepare the data to be fit using the convolution model.
prep_obs <- prepare(obs, model = "convolution", location = "region",
                    primary = "cases", secondary = "deaths", max_convolution = 15)
head(prep_obs, 10)
#>        location       date time index init_obs cstart cmax primary secondary
#>  1: Glastonbury 2020-10-15    0     1        1      1    1      94        15
#>  2: Glastonbury 2020-10-16    1     2        1      1    2     110        21
#>  3: Glastonbury 2020-10-17    2     3        1      1    3     128        18
#>  4: Glastonbury 2020-10-18    3     4        1      1    4     148        18
#>  5: Glastonbury 2020-10-19    4     5        1      1    5     172        28
#>  6: Glastonbury 2020-10-20    5     6        1      1    6     200        25
#>  7: Glastonbury 2020-10-21    6     7        1      1    7     233        28
#>  8: Glastonbury 2020-10-22    7     8        1      1    8     271        32
#>  9: Glastonbury 2020-10-23    8     9        1      1    9     315        45
#> 10: Glastonbury 2020-10-24    9    10        1      1   10     365        44
  • Fit the model assuming a Poisson observation model (It is important to use an identity link here as idbrm provides its own link by default).
fit <- idbrm(data = prep_obs, family = poisson(link = "identity"))
  • Summarise fit.
summary(fit)
#>  Family: poisson 
#>   Links: mu = identity 
#> Formula: secondary ~ idbrms_convolve(primary, scale, cmean, lcsd, cmax, index, cstart, init_obs) 
#>          scale ~ 1
#>          cmean ~ 1
#>          lcsd ~ 1
#>    Data: data (Number of observations: 86) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> scale_Intercept    -1.16      0.00    -1.17    -1.15 1.00     2338     2282
#> cmean_Intercept     1.52      0.03     1.46     1.58 1.00     1338     1615
#> lcsd_Intercept      0.11      0.05     0.02     0.21 1.00     1343     1622
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
  • Explore estimated effect sizes (we approximately should recover those used for simulation).
exp(posterior_summary(fit, "scale_Intercept")) / 
  (1 + exp(posterior_summary(fit, "scale_Intercept")))
#> Warning: Argument 'pars' is deprecated. Please use 'variable' instead.

#> Warning: Argument 'pars' is deprecated. Please use 'variable' instead.
#>                    Estimate Est.Error      Q2.5    Q97.5
#> b_scale_Intercept 0.2380622 0.5012091 0.2363883 0.239789
posterior_summary(fit, "cmean_Intercept")
#> Warning: Argument 'pars' is deprecated. Please use 'variable' instead.
#>                   Estimate  Est.Error     Q2.5    Q97.5
#> b_cmean_Intercept 1.517401 0.03095378 1.463574 1.581733
exp(posterior_summary(fit, "lcsd_Intercept"))
#> Warning: Argument 'pars' is deprecated. Please use 'variable' instead.
#>                  Estimate Est.Error     Q2.5    Q97.5
#> b_lcsd_Intercept 1.120378  1.048099 1.023345 1.228248
  • Expose model stan functions.
expose_functions(fit)
  • Test central estimates returns something sensible compared to observed data.
n_obs <- length(prep_obs$primary)
fixed <- summary(fit)$fixed
pt_ests <- fixed[, 1]
names(pt_ests) <- rownames(fixed)
p_primary <- with(prep_obs, idbrms_convolve(primary, rep(pt_ests["scale_Intercept"], n_obs), 
                                            rep(pt_ests["cmean_Intercept"], n_obs),
                                            rep(pt_ests["lcsd_Intercept"], n_obs), 
                                            cmax, index, cstart, init_obs))
ggplot(prep_obs) + 
  aes(x = date, y = secondary) +
  geom_col(fill = "lightgrey") +
  geom_point(aes(y = p_primary)) +
  theme_minimal()

  • Posterior predictive check. Runs without error but looks unlikely to be correct given check above.
pp_check(fit)
#> Using 10 posterior draws for ppc type 'dens_overlay' by default.

  • Plot conditional effects (fails with due to mismatched vector sizes in the stan dot product + there are no variables in this model so should always fail).
plot(conditional_effects(fit), ask = FALSE)

idbrms's People

Contributors

bisaloo avatar sbfnk avatar seabbs avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

idbrms's Issues

pp_check output is wrongly scaled

Using pp_check and posterior_predict returns output that is approximately a factor of 15 ~ 20 smaller than the observed data. When recreating the model using the internal stan functions and the central estimates from the model the observed data is fairly well approximated.

This indicates either a possible brms or a flaw in the implementation of the non-linear model here. It would be useful to dig into as it is blocking much of the functionality.

See the readme for a walkthrough of this issue using toy data.

R CMD check fails on CRAN

Spotted thanks to R-universe: https://github.com/r-universe/epiforecasts/runs/2765209598?check_suite_focus=true

You can reproduce locally by running:

devtools::check(env_vars = NULL)

Error: could not find function "calc_unique_pmfs"

This is expected since

if (!testthat:::on_cran()) {
files <- c("discretised_lognormal_pmf.stan",
"calc_pmf.stan",
"calc_unique_pmfs.stan")
suppressMessages(
expose_idbrms_stan_fns(
files, dir = system.file("stan/functions", package = "idbrms")
)
)
}
never runs on CRAN.

I'm not sure how to fix it because I don't know why it is necessary to wrap these lines with if (!testthat:::on_cran()) { ... }

Optimise convolution

To reduce compute time of the convolution model one option is to only calculate the probability mass function for unique combinations of log means and log standard deviations.

In the current framework this would need to be done by detecting unique entries in a vector in stan, computing the pmf for these and then sharing it across all original vectors in such a way that the control code has a lower cost than calculating the pmf.

Another option is to reduce the complexity of the pmf perhaps by using a different delay distribution or to provide a second version of the model where the pmf is estimated once or by some supplied indicator rather than for every time point as currently.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.