jriou / covid_adjusted_cfr Goto Github PK

View Code? Open in Web Editor NEW

88.0 88.0 29.0 1.47 GB

License: MIT License

R 38.83% TeX 33.72% Stan 27.44%

covid_adjusted_cfr's People

Contributors

Stargazers

Watchers

covid_adjusted_cfr's Issues

Problems to run the experiment

Hi @jriou,

Congratulations on your work.

I try to run the experiment, but I have several problems to do it.

Some questions:

Do I need data_S_model13.R, or I can generate it?
Do you have any step by step guide to reproduce the results?

Please, note I don't have any experience with the R language.

Best,
Dario

Wrong way to calculate age distribution

Hi @jriou,

I think there is an error calculating age distribution in your R experiment:

> age_dist = read_excel("data/age_structure.xlsx") %>%
+   filter(country=="Italy") %>%
+   gather("age","n",3:23) %>%
+   mutate(n=as.numeric(n),age2=c(0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80)) %>%
+   group_by(age2) %>%
+   summarise(n=sum(n)) %>%
+   pull(n)
                                                                                                                               
> age_dist = age_dist/sum(age_dist)
> pop_t = sum(c(10.04e6,4.453e6,4.905e6,1.522e6,4.356e6))
> age_dist
[1] 0.08595641 0.09456890 0.10178792 0.11909651 0.15515993 0.15356926 0.12112699 0.09653392 0.07220017
> print(age_dist)
[1] 0.08595641 0.09456890 0.10178792 0.11909651 0.15515993 0.15356926 0.12112699 0.09653392 0.07220017

The problem is related to the format on data/age_structure.xlsx file. If you take a look to the xlsx file:

Italy	60627.291	2442.494	2768.81	2869.745	2863.711	2973.472	3197.654	3417.097	3803.402	4469.335	4937.591	4879.297	4431.191	3779.643	3563.958	3213.162	2639.428	2220.209	1327.127	631.952	181.855	16.158

I think you need to remove the dot of the values on the dataset to get the real age distribution results, because in R, as.numeric doesn't transform these numbers to integers:

> as.numeric(2442.494)
[1] 2442.494

I can share my Python code to do the same:

First version, with the line commented to get the same results:

>>> import os
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_excel(os.path.join("data", "age_structure.xlsx"))
>>> df['country'] = df['country'].str.strip()
>>> df = df[df['country'] == 'Italy']
>>> # df = df.apply(lambda x : x.astype(str).str.replace(r'\.', ''))  # Posible bug in R experiment
>>> df.drop(['country', 'all_classes'], axis=1, inplace=True)
>>> df = df.T
>>> df.rename(columns={df.columns[0]: "values"}, inplace=True)
>>> df['values'] = df['values'].astype(int)
>>> df['age'] = np.array([0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80])
>>> age_dist = df.groupby("age").sum()['values'].values
>>> age_dist = age_dist/sum(age_dist)
>>> print(age_dist)
[0.08594949 0.09456093 0.10178663 0.1191085  0.15517099 0.15358728
 0.12112114 0.09654057 0.07217447]

Second version, using the line to delete the dot and get the real age distribution:

>>> import os
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_excel(os.path.join("data", "age_structure.xlsx"))
>>> df['country'] = df['country'].str.strip()
>>> df = df[df['country'] == 'Italy']
>>> df = df.apply(lambda x : x.astype(str).str.replace(r'\.', ''))  # Posible bug in R experiment
>>> df.drop(['country', 'all_classes'], axis=1, inplace=True)
>>> df = df.T
>>> df.rename(columns={df.columns[0]: "values"}, inplace=True)
>>> df['values'] = df['values'].astype(int)
>>> df['age'] = np.array([0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80])
>>> age_dist = df.groupby("age").sum()['values'].values
>>> age_dist = age_dist/sum(age_dist)
>>> print(age_dist)
[0.04677661 0.09862252 0.10615099 0.1242015  0.16181074 0.16015189
 0.126319   0.10067177 0.07529498]

You can find the error here:

covid_adjusted_cfr/data/italy/data_management_italy.R

Line 13 in 9e041b8

age_dist = read_excel("data/age_structure.xlsx") %>%

covid_adjusted_cfr/data/south-korea/data_management_south_korea.R

Line 19 in 9e041b8

age_dist = read_excel("data/age_structure.xlsx") %>%

Let me know if you agree.

Best,
Dario

problems to run the R codes

Hi,
I read through the manuscript which is truly fascinating. However, while running the code, I faced with some difficulties. Could anyone please guide me how to start and reproduce the results here.
I tried running the code of "output_model13_italy.R", but I do not have the "model13IT_2020-03-18.Rdata" file in my directory.
Any kind of help will be much appreciated.
Thanks.
Moumita.

(B) and (C) are reversed in README figure

Hi there,

Just noticed that caption of the figure in README.md, has (B) and (C) be reversed compared to the paper.

From README.md:

Figure. (A) Reported confirmed cases of COVID-19 in the Hubei province by date of disease onset (blue) and reported deaths (red) from 8 December 2019 to 11 February 2020. (B) Age distribution of the Chinese population compared to that of confirmed cases of and deaths due to COVID-19. (C) Proportion of individuals infected by COVID-19 showing symptoms among passengers of the Diamond Princess ship (with 95% credible interval).

versus the table from the paper:

Question on the implementation of the BDF ode solver

Hi @jriou,

Congrats on the very nice work. This is not an issue, but rather a question on some choices made in your analysis.

Problem statement

My analysis involves an age-structured SEIR transmission model implemented in Rstan, with a time-varying effective contact rate and daily observations. The population of the country is at the order of \approx 1.4e+07. I have previously attempted to implement the BDF solver assuming that the number of individuals in each compartment is at the original scale. The solver appears to struggle and this creates a computational bottleneck.

At Section 2.1 of the Supplementary appendix of this paper, it is stated that the number of individuals in each compartment is scaled by the total population of the area, so that the sum of all compartments is 1. For reference, I have successfully replicated the analysis for Spain (period of data collection = 46 days) under the same MCMC setting, so I know what to expect in terms of computational time in my machine, e.g. by simply looking at the rstan message "Gradient evaluation took ... seconds"). The same implementation with RK45 in my machine appears to increase the computational cost per itaration by \approx 3 times.

When I am working with the option of scaled compartments, the computational time for gradient evaluation is at the same order of magnitude as the reference analysis, so I get rid of the computational bottleneck.

I see that Version "10_bales" of the model is the first time you switch from the RK45 to the BDF ode solver, but still working with a compartmental model that assumes homogeneous mixing of the population. Version 11 is the first time where the age-structured transmission model is used.

Questions

What is the reason for working with the scaled compartments, then reverting the ODE solutions to the original scale? Let's focus on the dummy compartment for the sake of the discussion.
What is the reason for accounting for the initial values of (S, E) compartments in the calculation of the derivatives, e.g dydt[k] = - f_inf[k] * (y[k]+init[k]); (file model16.stan, line 76)?
What is the reason for choosing the particular BDF ODE solver control parameters? Was their choice informed by the order of magnitude of the compartments? Did you ever face very small changes between consecutive time points and was this addressed by experienting with different control parameters?

All the best,
Lampros

Error in check_pars(allpars, pars) : no parameter delta

I am running this on RStudio on MacOSX 10.14.

I ran the model in RStudio and got the following error (full job log output shown below):

"Error in check_pars(allpars, pars) : no parameter delta"

Full job log:

── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.0 ✓ purrr 0.3.3
✓ tibble 2.1.3 ✓ dplyr 0.8.5
✓ tidyr 1.0.2 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()

Attaching package: ‘lubridate’

The following object is masked from ‘package:base’:

date

Loading required package: StanHeaders
rstan (Version 2.19.3, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)

Attaching package: ‘rstan’

The following object is masked from ‘package:tidyr’:

extract

Note: As of version 1.0.0, cowplot does not change the
default ggplot2 theme anymore. To recover the previous
behavior, execute:
theme_set(theme_cowplot())

Attaching package: ‘cowplot’

The following object is masked from ‘package:lubridate’:

stamp

SAMPLING FOR MODEL 'model10' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.292054 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2920.54 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: WARNING: No variance estimation is
Chain 1: performed for num_warmup < 20
Chain 1:
Chain 1: Iteration: 1 / 5 [ 20%] (Warmup)
Chain 1: Iteration: 2 / 5 [ 40%] (Warmup)
Chain 1: Iteration: 3 / 5 [ 60%] (Sampling)
Chain 1: Iteration: 4 / 5 [ 80%] (Sampling)
Chain 1: Iteration: 5 / 5 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 1.22936 seconds (Warm-up)
Chain 1: 7.12388 seconds (Sampling)
Chain 1: 8.35324 seconds (Total)
Chain 1:
Error in check_pars(allpars, pars) : no parameter delta
Calls: sourceWithProgress ... summary -> .local -> check_pars_second -> check_pars
In addition: Warning messages:
1: There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

3: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
Execution halted

jriou / covid_adjusted_cfr Goto Github PK

covid_adjusted_cfr's People

Contributors

Stargazers

Watchers

Forkers

covid_adjusted_cfr's Issues

Problems to run the experiment

Wrong way to calculate age distribution

problems to run the R codes

(B) and (C) are reversed in README figure

Question on the implementation of the BDF ode solver

Error in check_pars(allpars, pars) : no parameter delta

Full job log:

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent