Giter Club home page Giter Club logo

covid_adjusted_cfr's People

Contributors

jriou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid_adjusted_cfr's Issues

Problems to run the experiment

Hi @jriou,

Congratulations on your work.

I try to run the experiment, but I have several problems to do it.

Some questions:

  • Do I need data_S_model13.R, or I can generate it?
  • Do you have any step by step guide to reproduce the results?

Please, note I don't have any experience with the R language.

Best,
Dario

Wrong way to calculate age distribution

Hi @jriou,

I think there is an error calculating age distribution in your R experiment:

> age_dist = read_excel("data/age_structure.xlsx") %>%
+   filter(country=="Italy") %>%
+   gather("age","n",3:23) %>%
+   mutate(n=as.numeric(n),age2=c(0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80)) %>%
+   group_by(age2) %>%
+   summarise(n=sum(n)) %>%
+   pull(n)
                                                                                                                               
> age_dist = age_dist/sum(age_dist)
> pop_t = sum(c(10.04e6,4.453e6,4.905e6,1.522e6,4.356e6))
> age_dist
[1] 0.08595641 0.09456890 0.10178792 0.11909651 0.15515993 0.15356926 0.12112699 0.09653392 0.07220017
> print(age_dist)
[1] 0.08595641 0.09456890 0.10178792 0.11909651 0.15515993 0.15356926 0.12112699 0.09653392 0.07220017

The problem is related to the format on data/age_structure.xlsx file. If you take a look to the xlsx file:

Italy 60627.291 2442.494 2768.81 2869.745 2863.711 2973.472 3197.654 3417.097 3803.402 4469.335 4937.591 4879.297 4431.191 3779.643 3563.958 3213.162 2639.428 2220.209 1327.127 631.952 181.855 16.158

I think you need to remove the dot of the values on the dataset to get the real age distribution results, because in R, as.numeric doesn't transform these numbers to integers:

> as.numeric(2442.494)
[1] 2442.494

I can share my Python code to do the same:

First version, with the line commented to get the same results:

>>> import os
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_excel(os.path.join("data", "age_structure.xlsx"))
>>> df['country'] = df['country'].str.strip()
>>> df = df[df['country'] == 'Italy']
>>> # df = df.apply(lambda x : x.astype(str).str.replace(r'\.', ''))  # Posible bug in R experiment
>>> df.drop(['country', 'all_classes'], axis=1, inplace=True)
>>> df = df.T
>>> df.rename(columns={df.columns[0]: "values"}, inplace=True)
>>> df['values'] = df['values'].astype(int)
>>> df['age'] = np.array([0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80])
>>> age_dist = df.groupby("age").sum()['values'].values
>>> age_dist = age_dist/sum(age_dist)
>>> print(age_dist)
[0.08594949 0.09456093 0.10178663 0.1191085  0.15517099 0.15358728
 0.12112114 0.09654057 0.07217447]

Second version, using the line to delete the dot and get the real age distribution:

>>> import os
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_excel(os.path.join("data", "age_structure.xlsx"))
>>> df['country'] = df['country'].str.strip()
>>> df = df[df['country'] == 'Italy']
>>> df = df.apply(lambda x : x.astype(str).str.replace(r'\.', ''))  # Posible bug in R experiment
>>> df.drop(['country', 'all_classes'], axis=1, inplace=True)
>>> df = df.T
>>> df.rename(columns={df.columns[0]: "values"}, inplace=True)
>>> df['values'] = df['values'].astype(int)
>>> df['age'] = np.array([0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80])
>>> age_dist = df.groupby("age").sum()['values'].values
>>> age_dist = age_dist/sum(age_dist)
>>> print(age_dist)
[0.04677661 0.09862252 0.10615099 0.1242015  0.16181074 0.16015189
 0.126319   0.10067177 0.07529498]

You can find the error here:

age_dist = read_excel("data/age_structure.xlsx") %>%

age_dist = read_excel("data/age_structure.xlsx") %>%

Let me know if you agree.

Best,
Dario

problems to run the R codes

Hi,
I read through the manuscript which is truly fascinating. However, while running the code, I faced with some difficulties. Could anyone please guide me how to start and reproduce the results here.
I tried running the code of "output_model13_italy.R", but I do not have the "model13IT_2020-03-18.Rdata" file in my directory.
Any kind of help will be much appreciated.
Thanks.
Moumita.

(B) and (C) are reversed in README figure

Hi there,

Just noticed that caption of the figure in README.md, has (B) and (C) be reversed compared to the paper.

From README.md:

Figure. (A) Reported confirmed cases of COVID-19 in the Hubei province by date of disease onset (blue) and reported deaths (red) from 8 December 2019 to 11 February 2020. (B) Age distribution of the Chinese population compared to that of confirmed cases of and deaths due to COVID-19. (C) Proportion of individuals infected by COVID-19 showing symptoms among passengers of the Diamond Princess ship (with 95% credible interval).

versus the table from the paper:

image

Question on the implementation of the BDF ode solver

Hi @jriou,

Congrats on the very nice work. This is not an issue, but rather a question on some choices made in your analysis.

Problem statement

My analysis involves an age-structured SEIR transmission model implemented in Rstan, with a time-varying effective contact rate and daily observations. The population of the country is at the order of \approx 1.4e+07. I have previously attempted to implement the BDF solver assuming that the number of individuals in each compartment is at the original scale. The solver appears to struggle and this creates a computational bottleneck.

At Section 2.1 of the Supplementary appendix of this paper, it is stated that the number of individuals in each compartment is scaled by the total population of the area, so that the sum of all compartments is 1. For reference, I have successfully replicated the analysis for Spain (period of data collection = 46 days) under the same MCMC setting, so I know what to expect in terms of computational time in my machine, e.g. by simply looking at the rstan message "Gradient evaluation took ... seconds"). The same implementation with RK45 in my machine appears to increase the computational cost per itaration by \approx 3 times.

When I am working with the option of scaled compartments, the computational time for gradient evaluation is at the same order of magnitude as the reference analysis, so I get rid of the computational bottleneck.

I see that Version "10_bales" of the model is the first time you switch from the RK45 to the BDF ode solver, but still working with a compartmental model that assumes homogeneous mixing of the population. Version 11 is the first time where the age-structured transmission model is used.

Questions

  1. What is the reason for working with the scaled compartments, then reverting the ODE solutions to the original scale? Let's focus on the dummy compartment for the sake of the discussion.
  2. What is the reason for accounting for the initial values of (S, E) compartments in the calculation of the derivatives, e.g dydt[k] = - f_inf[k] * (y[k]+init[k]); (file model16.stan, line 76)?
  3. What is the reason for choosing the particular BDF ODE solver control parameters? Was their choice informed by the order of magnitude of the compartments? Did you ever face very small changes between consecutive time points and was this addressed by experienting with different control parameters?

All the best,
Lampros

Error in check_pars(allpars, pars) : no parameter delta

I am running this on RStudio on MacOSX 10.14.

I ran the model in RStudio and got the following error (full job log output shown below):

"Error in check_pars(allpars, pars) : no parameter delta"

Full job log:

── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.0 ✓ purrr 0.3.3
✓ tibble 2.1.3 ✓ dplyr 0.8.5
✓ tidyr 1.0.2 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

date

Loading required package: StanHeaders
rstan (Version 2.19.3, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)

Attaching package: ‘rstan’

The following object is masked from ‘package:tidyr’:

extract

Note: As of version 1.0.0, cowplot does not change the
default ggplot2 theme anymore. To recover the previous
behavior, execute:
theme_set(theme_cowplot())


Attaching package: ‘cowplot’

The following object is masked from ‘package:lubridate’:

stamp

SAMPLING FOR MODEL 'model10' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.292054 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2920.54 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: WARNING: No variance estimation is
Chain 1: performed for num_warmup < 20
Chain 1:
Chain 1: Iteration: 1 / 5 [ 20%] (Warmup)
Chain 1: Iteration: 2 / 5 [ 40%] (Warmup)
Chain 1: Iteration: 3 / 5 [ 60%] (Sampling)
Chain 1: Iteration: 4 / 5 [ 80%] (Sampling)
Chain 1: Iteration: 5 / 5 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 1.22936 seconds (Warm-up)
Chain 1: 7.12388 seconds (Sampling)
Chain 1: 8.35324 seconds (Total)
Chain 1:
Error in check_pars(allpars, pars) : no parameter delta
Calls: sourceWithProgress ... summary -> .local -> check_pars_second -> check_pars
In addition: Warning messages:
1: There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

3: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
Execution halted

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.