reconverse / i2extras Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 4.26 MB

Additional functionality for working with incidence2

Home Page: https://www.reconverse.org/i2extras/

License: Other

R 100.00%

i2extras's People

Contributors

Stargazers

Watchers

Forkers

jamesabaker

i2extras's Issues

log-linear

We should have an option in fit_curve for a log-linear function. This will require some changes to the upstream trending package due to how data is stored with a fitted model.

Alternatives to moving averages

Is your feature request related to a problem? Please describe.

Moving averages that are commonly used to process observational data prior to visualisation etc have several issues with the most notable being information loss and lag. The main drivers of the use of moving averages are day of the week effects and reporting noise.

Some of this functionality looks like it is supported in fit_curve and would only need some minimal extension.

Describe the solution you'd like

In general, day of the week effects are much easier to account for than reporting effects. Time series decomposition would be one possible alternative with other alternatives requiring some non-parametric driver of reports.

Reporting noise is harder to adjust for and requires some more thought about what kind of model would have both the required simplicity and ability to make the adjustment in a rigorous way.

I would in principle be happy to support the implementation of some of these features and to discuss them in more detail.

Add function to detect low counts

It would be useful to have a function to detect counts abnormally low (zero or close) indicative of under-reporting, and set these to NAs. Criteria could be being < thres x median(counts). It should be able to handle a counts argument in the case of multiple counts.

Possible interface for simple fitting

library(incidence2)
library(incidence2plus)
library(tidyr)

data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist

inci <- incidence(
  dat,
  date_index = date_of_onset,
  interval = "week",
  last_date = "2014-10-05",
  groups = gender
)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

inci %>% 
  fit(model = "poisson")
#> # A tibble: 2 x 6
#>   gender model  fitted                 r `r-lower` `r-upper`
#>   <fct>  <list> <list>             <dbl>     <dbl>     <dbl>
#> 1 f      <glm>  <tibble [26 × 6]> 0.0249    0.0233    0.0265
#> 2 m      <glm>  <tibble [26 × 6]> 0.0250    0.0234    0.0267

inci %>% 
  fit(model = "poisson") %>% 
  add_doubling()
#> # A tibble: 2 x 9
#>   gender model fitted      r `r-lower` `r-upper` doubling `doubling-lower`
#>   <fct>  <lis> <list>  <dbl>     <dbl>     <dbl>    <dbl>            <dbl>
#> 1 f      <glm> <tibb… 0.0249    0.0233    0.0265     27.9             29.8
#> 2 m      <glm> <tibb… 0.0250    0.0234    0.0267     27.7             29.7
#> # … with 1 more variable: `doubling-upper` <dbl>

inci %>% 
  fit(model = "poisson") %>% 
  plot(color = "white", angle = 45, n_breaks = 4)

^{Created on 2020-09-03 by the reprex package (v0.3.0)}

Plotting the output of fit_curve() when it includes warnings

Here is a reprex illustrating the problem. Basically the problem is that when fit_curve() issues warnings, some functions like growth_rate() down the line can ignore warnings as an option, but plot() cannot. In terms of design, I am wondering if it would be useful to have a user-facing function to ignore warnings e.g.

x %>% 
  fit_curve() %>% 
  ignore_warnings() %>% 
  plot()

Or is it okay to have this as internal and add an option to plot()? Would be useful to discuss before making changes (also, happy to do them).

@TimTaylor tagging you for awareness and future discussions :)

library(tidyverse)
library(incidence2)
library(i2extras)

## set the random seed so we all get the same result
set.seed(1)
days <- 0:30
cases <- rpois(n = length(days), lambda = 3)


## step 3: create dates of infection for these cases
date_infection <- rep(days, cases)
data <- tibble(date_infection)
data
#> # A tibble: 96 × 1
#>    date_infection
#>             <int>
#>  1              0
#>  2              0
#>  3              1
#>  4              1
#>  5              2
#>  6              2
#>  7              2
#>  8              3
#>  9              3
#> 10              3
#> # … with 86 more rows

## build epicurve and fitting
res <- data %>%
  incidence(date_infection) %>%
  fit_curve(model = "negbin",
            control = glm.control(maxit = 1e3))
res
#> # A tibble: 1 × 8
#>   count_variable           data model  estimates   fitting_warning fitting_error
#>   <chr>          <list<tibble[> <list> <list>      <list>          <list>       
#> 1 count                [30 × 2] <negb… <df [30 × … <chr [2]>       <NULL>       
#> # … with 2 more variables: prediction_warning <list>, prediction_error <list>

## Note: there seems to be a 'safe' warning, which I would like to be able to
## ignore in further analyses
res %>% 
  pull(fitting_warning)
#> [[1]]
#> [1] "NaNs produced" "NaNs produced"

## get growth rates: this behaves as expected
res %>%
  growth_rate() # empty result coz of warnings - fine
#> # A tibble: 0 × 9
#> # … with 9 variables: count_variable <chr>, model <list>, r <dbl>,
#> #   r_lower <dbl>, r_upper <dbl>, growth_or_decay <lgl>, time <lgl>,
#> #   time_lower <lgl>, time_upper <lgl>
res %>%
  growth_rate(include_warnings = TRUE) # results as expected - fine
#> # A tibble: 1 × 9
#>   count_variable model        r r_lower r_upper growth_or_decay  time time_lower
#>   <chr>          <lis>    <dbl>   <dbl>   <dbl> <chr>           <dbl>      <dbl>
#> 1 count          <neg… -0.00276 -0.0256  0.0200 halving          251.       27.1
#> # … with 1 more variable: time_upper <dbl>

## but plotting won't go through
res %>%
  plot()
#> Error: Can't subset columns that don't exist.
#> x Column `count` doesn't exist.

^{Created on 2021-10-21 by the reprex package (v2.0.1)}

quasipoisson

Add, from trending, quasipoisson as a function in fit

Add function for data imputation

As a follow-up to #7 it would be useful to have a procedure for replacing NAs in an incidence2 object. Different methods could be foreseen, e.g.:

using a rolling average
interpolating from neighbouring values

In the case of multiple NAs next to each other, we may need to do this replacement recursively.

Will come up with a proposal interface later on.

Unvendor {trending} bits

Now {trending} v0.1.0 is on CRAN we can unvendor the stuff in https://github.com/reconverse/i2extras/blob/master/R/compat_trending.R.

Can we deal with plotting rolling averages better for grouped incidence objects

Currently grouped incidence objects need regrouping if an overall rolling average is required. Is this the best way to deal with it?

library(outbreaks)
library(incidence2)

data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist


# without groups ----------------------------------------------------------
inci <- incidence(dat,
                  date_index = date_of_onset,
                  interval = "week", 
                  last_date = "2014-10-05")
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

inci %>%  
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).

#> Warning: Removed 2 rows containing missing values (position_stack).

# grouped by gender -------------------------------------------------------
inci <- incidence(dat,
                  date_index = date_of_onset,
                  interval = "week", 
                  last_date = "2014-10-05",
                  groups = gender)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.

# facet_plot
inci %>%  
  rolling_average(before = 2) %>% 
  facet_plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).
#> Warning: Removed 4 rows containing missing values (position_stack).

# individual plot would needs regrouping if groups present
inci %>%  
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).

#> Warning: Removed 4 rows containing missing values (position_stack).

inci %>%  
  regroup() %>% 
  rolling_average(before = 2) %>% 
  plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).
#> Warning: Removed 2 rows containing missing values (position_stack).

^{Created on 2020-07-30 by the reprex package (v0.3.0)}

growth_rate - highlight confidence intervals containing zero

Currently growth_rate marks a fitted curve as either growth or decay depending on sign of the coefficient. We should think of a way to highlight when the confidence interval around r contains 0.

highlight spanning of zero in growth_rate function

It would be nice

reconverse / i2extras Goto Github PK

i2extras's People

Contributors

Stargazers

Watchers

Forkers

i2extras's Issues

log-linear

Alternatives to moving averages

Add function to detect low counts

Possible interface for simple fitting

Plotting the output of fit_curve() when it includes warnings

quasipoisson

Add function for data imputation

Unvendor {trending} bits

Can we deal with plotting rolling averages better for grouped incidence objects

growth_rate - highlight confidence intervals containing zero

highlight spanning of zero in growth_rate function

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent