Giter Club home page Giter Club logo

estrodis's Introduction

R package estRodis

estRodis is an R package for statistical analyses in the field of infectious disease dynamics. More precisely, for estimating the effective reproduction number, the dispersion parameter and the testing probability from the sequence cluster size distribution. The theoretical background and a use case of the package's functionalities are presented in the paper "Estimating $R_e$ and overdispersion in secondary cases from the size of identical sequence clusters of SARS-CoV-2" by Emma Hodcroft et al. A preprint is available on medRxiv.

(A) Overview of content of R package estRodis

At the heart of the R-package estRodis are functions to simulate the size distribution of identical sequence clusters and to estimate parameters related to transmission dynamics from the sequence cluster size distribution. The simulation is based on a mathematical model of the size distribution of identical sequence clusters that takes into account the viral transmission process, the mutation of the virus and incomplete case-detection. We described the probability that a case was detected as the product of the probability that a case was confirmed by a test and the probability that the viral genome of a confirmed case is sequenced. There are two Bayesian inference models to estimate the effective reproduction number, the dispersion parameter, the yearly mutation rate and the testing probability to choose from. The first model uses a weakly informative prior distribution for the probability that a case is confirmed by a test, whereas the second model takes a fixed value for this probability as input. Both models use weakly informative prior distributions for the effective reproduction number, the dispersion parameter and the yearly mutation rate and require a fixed value for the probability that the viral genome of a confirmed case is sequenced as input.

(B) General remarks

B.1 Use case

The code used for the statistical analysis of the paper "Estimating $R_e$ and overdispersion in secondary cases from the size of identical sequence clusters of SARS-CoV-2" by Emma Hodcroft et al. (see medRxiv), which is a use case of the estRodis package, can be found here: GitHub Martin Wohlfender

B.2 Name convention

Whenever "model one" is mentioned in comments in the code, this refers to the standard model developed in the paper (prior distribution for testing probability) and "model two" refers to the alternative model described in the section "Sensitivity analysis" of the supplementary material (fixed value for testing probability).

(C) How to install the R package estRodis

R code to install the latest stable release of the estRodis package:
devtools::install_github("mwohlfender/estRodis", ref = "main", force = TRUE)

R code to install the newest development version of the estRodis package:
devtools::install_github("mwohlfender/[email protected]", ref = "main", force = TRUE)

(D) How to use the R package estRodis

In the following we present code examples how to use the main functionalities of the R package estRodis.

(D1) Simulate data

We simulate 1000 identical sequence clusters.

simulated_clusters <- estRodis_simulate_cluster_sizes(n_clusters = 1000,
                                                      max_cluster_size = 2500,
                                                      R = 0.8,
                                                      k = 0.3,
                                                      yearly_mutation_rate = 14,
                                                      mean_generation_interval = 5.2,
                                                      testing_proba = 0.6,
                                                      sequencing_proba = 0.4)

Result:

size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 19 20 22 24 26
frequency 725 111 59 31 16 13 10 3 6 2 3 4 4 2 4 1 2 1 1 1 1
percentage 0.725 0.111 0.059 0.031 0.016 0.013 0.01 0.003 0.006 0.002 0.003 0.004 0.004 0.002 0.004 0.001 0.002 0.001 0.001 0.001 0.001

(D2) Apply model one

We apply model one to estimate the effective reproduction number, the dispersion parameter, the yearly mutation rate and the testing probability from the clusters we simulated in section (D1).

options(mc.cores = parallelly::availableCores())

estRodis_estimate_parameters_one(clusters_size = simulated_clusters$size,
                                 clusters_freq = simulated_clusters$frequency,
                                 sequencing_proba = 0.44)

Result:

Inference for Stan model: estRodis_stan_model_estimate_parameters_one.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                            mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
R                           0.81    0.00 0.06     0.71     0.76     0.80     0.85     0.95  1171    1
k                           0.34    0.00 0.08     0.19     0.28     0.33     0.39     0.53  1813    1
number_yearly_mutations    13.99    0.01 0.49    13.00    13.66    13.99    14.31    14.95  2254    1
testing_proba               0.60    0.01 0.24     0.15     0.41     0.62     0.81     0.99  1061    1
mutation_proba              0.18    0.00 0.01     0.17     0.18     0.18     0.18     0.19  2252    1
detection_proba             0.27    0.00 0.11     0.07     0.18     0.27     0.36     0.43  1061    1
lp__                    -1156.67    0.04 1.54 -1160.57 -1157.41 -1156.29 -1155.53 -1154.84  1200    1

Samples were drawn using NUTS(diag_e) at Tue Mar 26 11:01:36 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

(D3) Apply model two

We apply model two to estimate the effective reproduction number, the dispersion parameter, the yearly mutation rate and the testing probability from the clusters we simulated in section (D1).

options(mc.cores = parallelly::availableCores())

estRodis_estimate_parameters_two(clusters_size = simulated_clusters$size,
                                 clusters_freq = simulated_clusters$frequency,
                                 testing_proba = 0.55,
                                 sequencing_proba = 0.44)

Result:

Inference for Stan model: estRodis_stan_model_estimate_parameters_two.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                            mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
R                           0.81    0.00 0.02     0.76     0.79     0.81     0.83     0.86  3181    1
k                           0.34    0.00 0.07     0.22     0.29     0.33     0.38     0.51  2630    1
number_yearly_mutations    14.01    0.01 0.51    13.04    13.66    14.00    14.34    15.02  2763    1
mutation_proba              0.18    0.00 0.01     0.17     0.18     0.18     0.18     0.19  2761    1
lp__                    -1154.64    0.03 1.22 -1157.85 -1155.19 -1154.32 -1153.76 -1153.24  2102    1

Samples were drawn using NUTS(diag_e) at Tue Mar 26 11:09:22 2024.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

(E) Short description of the functionality of the R package estRodis

The R package estRodis contains nine functions. Short introductions of all functions are provided below, more detailed information on input and output values can be found in the documentation of the respective function within the R package.

E.1 simulate cluster sizes

Simulate the size of identical sequence clusters.

E.2 simulate transmission chain

Simulate a transmission chain.

E.3 plot transmission chain

Create a plot of a transmission chain.

E.4 plot transmission chain with mutations

Create a plot of a transmission chain including mutations of the viral genome.

E.5 estimate parameters model one

Apply model one to estimate parameters from the sequence cluster size distribution.

E.6 estimate parameters model two

Apply model two to estimate parameters from the sequence cluster size distribution.

E.7 initialize parameters model one

Determine initial values for model one.

E.8 initialize parameters model two

Determine initial values for model two.

E.9 scaled beta distribution

Probability density function of a scaled version of the beta distribution.

$$ f \left( x \right) = \frac{ \left( x - 0.05 \right)^{ \alpha - 1 } \left( 1 - x \right)^{ \beta - 1 }}{ \left( 1 - 0.05 \right)^{ \alpha + \beta - 1 }} \frac{\Gamma \left( \alpha + \beta \right) }{ \Gamma \left( \alpha \right) \Gamma \left( \beta \right)} \quad \forall x \in \left[ 0.05, 1 \right] $$

This distribution is used as prior distribution of the testing probability in model one.

estrodis's People

Contributors

mwohlfender avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.