reconverse / i2extras Goto Github PK
View Code? Open in Web Editor NEWAdditional functionality for working with incidence2
Home Page: https://www.reconverse.org/i2extras/
License: Other
Additional functionality for working with incidence2
Home Page: https://www.reconverse.org/i2extras/
License: Other
We should have an option in fit_curve
for a log-linear function. This will require some changes to the upstream trending
package due to how data is stored with a fitted model.
Is your feature request related to a problem? Please describe.
Moving averages that are commonly used to process observational data prior to visualisation etc have several issues with the most notable being information loss and lag. The main drivers of the use of moving averages are day of the week effects and reporting noise.
Some of this functionality looks like it is supported in fit_curve
and would only need some minimal extension.
Describe the solution you'd like
In general, day of the week effects are much easier to account for than reporting effects. Time series decomposition would be one possible alternative with other alternatives requiring some non-parametric driver of reports.
Reporting noise is harder to adjust for and requires some more thought about what kind of model would have both the required simplicity and ability to make the adjustment in a rigorous way.
I would in principle be happy to support the implementation of some of these features and to discuss them in more detail.
It would be useful to have a function to detect counts abnormally low (zero or close) indicative of under-reporting, and set these to NAs. Criteria could be being < thres x median(counts). It should be able to handle a counts
argument in the case of multiple counts.
library(incidence2)
library(incidence2plus)
library(tidyr)
data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist
inci <- incidence(
dat,
date_index = date_of_onset,
interval = "week",
last_date = "2014-10-05",
groups = gender
)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.
inci %>%
fit(model = "poisson")
#> # A tibble: 2 x 6
#> gender model fitted r `r-lower` `r-upper`
#> <fct> <list> <list> <dbl> <dbl> <dbl>
#> 1 f <glm> <tibble [26 × 6]> 0.0249 0.0233 0.0265
#> 2 m <glm> <tibble [26 × 6]> 0.0250 0.0234 0.0267
inci %>%
fit(model = "poisson") %>%
add_doubling()
#> # A tibble: 2 x 9
#> gender model fitted r `r-lower` `r-upper` doubling `doubling-lower`
#> <fct> <lis> <list> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 f <glm> <tibb… 0.0249 0.0233 0.0265 27.9 29.8
#> 2 m <glm> <tibb… 0.0250 0.0234 0.0267 27.7 29.7
#> # … with 1 more variable: `doubling-upper` <dbl>
inci %>%
fit(model = "poisson") %>%
plot(color = "white", angle = 45, n_breaks = 4)
Created on 2020-09-03 by the reprex package (v0.3.0)
Here is a reprex illustrating the problem. Basically the problem is that when fit_curve()
issues warnings, some functions like growth_rate()
down the line can ignore warnings as an option, but plot()
cannot. In terms of design, I am wondering if it would be useful to have a user-facing function to ignore warnings e.g.
x %>%
fit_curve() %>%
ignore_warnings() %>%
plot()
Or is it okay to have this as internal and add an option to plot()
? Would be useful to discuss before making changes (also, happy to do them).
@TimTaylor tagging you for awareness and future discussions :)
library(tidyverse)
library(incidence2)
library(i2extras)
## set the random seed so we all get the same result
set.seed(1)
days <- 0:30
cases <- rpois(n = length(days), lambda = 3)
## step 3: create dates of infection for these cases
date_infection <- rep(days, cases)
data <- tibble(date_infection)
data
#> # A tibble: 96 × 1
#> date_infection
#> <int>
#> 1 0
#> 2 0
#> 3 1
#> 4 1
#> 5 2
#> 6 2
#> 7 2
#> 8 3
#> 9 3
#> 10 3
#> # … with 86 more rows
## build epicurve and fitting
res <- data %>%
incidence(date_infection) %>%
fit_curve(model = "negbin",
control = glm.control(maxit = 1e3))
res
#> # A tibble: 1 × 8
#> count_variable data model estimates fitting_warning fitting_error
#> <chr> <list<tibble[> <list> <list> <list> <list>
#> 1 count [30 × 2] <negb… <df [30 × … <chr [2]> <NULL>
#> # … with 2 more variables: prediction_warning <list>, prediction_error <list>
## Note: there seems to be a 'safe' warning, which I would like to be able to
## ignore in further analyses
res %>%
pull(fitting_warning)
#> [[1]]
#> [1] "NaNs produced" "NaNs produced"
## get growth rates: this behaves as expected
res %>%
growth_rate() # empty result coz of warnings - fine
#> # A tibble: 0 × 9
#> # … with 9 variables: count_variable <chr>, model <list>, r <dbl>,
#> # r_lower <dbl>, r_upper <dbl>, growth_or_decay <lgl>, time <lgl>,
#> # time_lower <lgl>, time_upper <lgl>
res %>%
growth_rate(include_warnings = TRUE) # results as expected - fine
#> # A tibble: 1 × 9
#> count_variable model r r_lower r_upper growth_or_decay time time_lower
#> <chr> <lis> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
#> 1 count <neg… -0.00276 -0.0256 0.0200 halving 251. 27.1
#> # … with 1 more variable: time_upper <dbl>
## but plotting won't go through
res %>%
plot()
#> Error: Can't subset columns that don't exist.
#> x Column `count` doesn't exist.
Created on 2021-10-21 by the reprex package (v2.0.1)
Add, from trending, quasipoisson as a function in fit
As a follow-up to #7 it would be useful to have a procedure for replacing NAs in an incidence2 object. Different methods could be foreseen, e.g.:
In the case of multiple NAs next to each other, we may need to do this replacement recursively.
Will come up with a proposal interface later on.
Now {trending} v0.1.0 is on CRAN we can unvendor the stuff in https://github.com/reconverse/i2extras/blob/master/R/compat_trending.R.
Currently grouped incidence objects need regrouping if an overall rolling average is required. Is this the best way to deal with it?
library(outbreaks)
library(incidence2)
data(ebola_sim_clean, package = "outbreaks")
dat <- ebola_sim_clean$linelist
# without groups ----------------------------------------------------------
inci <- incidence(dat,
date_index = date_of_onset,
interval = "week",
last_date = "2014-10-05")
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.
inci %>%
rolling_average(before = 2) %>%
plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).
#> Warning: Removed 2 rows containing missing values (position_stack).
# grouped by gender -------------------------------------------------------
inci <- incidence(dat,
date_index = date_of_onset,
interval = "week",
last_date = "2014-10-05",
groups = gender)
#> 3522 observations outside of [2014-04-07, 2014-10-05] were removed.
# facet_plot
inci %>%
rolling_average(before = 2) %>%
facet_plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).
#> Warning: Removed 4 rows containing missing values (position_stack).
# individual plot would needs regrouping if groups present
inci %>%
rolling_average(before = 2) %>%
plot(color = "white")
#> Warning: Removed 4 rows containing missing values (position_stack).
#> Warning: Removed 4 rows containing missing values (position_stack).
inci %>%
regroup() %>%
rolling_average(before = 2) %>%
plot(color = "white")
#> Warning: Removed 2 rows containing missing values (position_stack).
#> Warning: Removed 2 rows containing missing values (position_stack).
Created on 2020-07-30 by the reprex package (v0.3.0)
Currently growth_rate
marks a fitted curve as either growth or decay depending on sign of the coefficient. We should think of a way to highlight when the confidence interval around r contains 0.
It would be nice
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.