jriou / covid_adjusted_cfr Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi @jriou,
Congratulations on your work.
I try to run the experiment, but I have several problems to do it.
Some questions:
data_S_model13.R
, or I can generate it?Please, note I don't have any experience with the R language.
Best,
Dario
Hi @jriou,
I think there is an error calculating age distribution in your R experiment:
> age_dist = read_excel("data/age_structure.xlsx") %>%
+ filter(country=="Italy") %>%
+ gather("age","n",3:23) %>%
+ mutate(n=as.numeric(n),age2=c(0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80)) %>%
+ group_by(age2) %>%
+ summarise(n=sum(n)) %>%
+ pull(n)
> age_dist = age_dist/sum(age_dist)
> pop_t = sum(c(10.04e6,4.453e6,4.905e6,1.522e6,4.356e6))
> age_dist
[1] 0.08595641 0.09456890 0.10178792 0.11909651 0.15515993 0.15356926 0.12112699 0.09653392 0.07220017
> print(age_dist)
[1] 0.08595641 0.09456890 0.10178792 0.11909651 0.15515993 0.15356926 0.12112699 0.09653392 0.07220017
The problem is related to the format on data/age_structure.xlsx
file. If you take a look to the xlsx file:
Italy | 60627.291 | 2442.494 | 2768.81 | 2869.745 | 2863.711 | 2973.472 | 3197.654 | 3417.097 | 3803.402 | 4469.335 | 4937.591 | 4879.297 | 4431.191 | 3779.643 | 3563.958 | 3213.162 | 2639.428 | 2220.209 | 1327.127 | 631.952 | 181.855 | 16.158 |
---|
I think you need to remove the dot of the values on the dataset to get the real age distribution results, because in R, as.numeric
doesn't transform these numbers to integers:
> as.numeric(2442.494)
[1] 2442.494
I can share my Python code to do the same:
First version, with the line commented to get the same results:
>>> import os
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_excel(os.path.join("data", "age_structure.xlsx"))
>>> df['country'] = df['country'].str.strip()
>>> df = df[df['country'] == 'Italy']
>>> # df = df.apply(lambda x : x.astype(str).str.replace(r'\.', '')) # Posible bug in R experiment
>>> df.drop(['country', 'all_classes'], axis=1, inplace=True)
>>> df = df.T
>>> df.rename(columns={df.columns[0]: "values"}, inplace=True)
>>> df['values'] = df['values'].astype(int)
>>> df['age'] = np.array([0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80])
>>> age_dist = df.groupby("age").sum()['values'].values
>>> age_dist = age_dist/sum(age_dist)
>>> print(age_dist)
[0.08594949 0.09456093 0.10178663 0.1191085 0.15517099 0.15358728
0.12112114 0.09654057 0.07217447]
Second version, using the line to delete the dot and get the real age distribution:
>>> import os
>>> import numpy as np
>>> import pandas as pd
>>> df = pd.read_excel(os.path.join("data", "age_structure.xlsx"))
>>> df['country'] = df['country'].str.strip()
>>> df = df[df['country'] == 'Italy']
>>> df = df.apply(lambda x : x.astype(str).str.replace(r'\.', '')) # Posible bug in R experiment
>>> df.drop(['country', 'all_classes'], axis=1, inplace=True)
>>> df = df.T
>>> df.rename(columns={df.columns[0]: "values"}, inplace=True)
>>> df['values'] = df['values'].astype(int)
>>> df['age'] = np.array([0,0,10,10,20,20,30,30,40,40,50,50,60,60,70,70,80,80,80,80,80])
>>> age_dist = df.groupby("age").sum()['values'].values
>>> age_dist = age_dist/sum(age_dist)
>>> print(age_dist)
[0.04677661 0.09862252 0.10615099 0.1242015 0.16181074 0.16015189
0.126319 0.10067177 0.07529498]
You can find the error here:
Let me know if you agree.
Best,
Dario
Hi,
I read through the manuscript which is truly fascinating. However, while running the code, I faced with some difficulties. Could anyone please guide me how to start and reproduce the results here.
I tried running the code of "output_model13_italy.R", but I do not have the "model13IT_2020-03-18.Rdata" file in my directory.
Any kind of help will be much appreciated.
Thanks.
Moumita.
Hi there,
Just noticed that caption of the figure in README.md, has (B) and (C) be reversed compared to the paper.
From README.md:
Figure. (A) Reported confirmed cases of COVID-19 in the Hubei province by date of disease onset (blue) and reported deaths (red) from 8 December 2019 to 11 February 2020. (B) Age distribution of the Chinese population compared to that of confirmed cases of and deaths due to COVID-19. (C) Proportion of individuals infected by COVID-19 showing symptoms among passengers of the Diamond Princess ship (with 95% credible interval).
versus the table from the paper:
Hi @jriou,
Congrats on the very nice work. This is not an issue, but rather a question on some choices made in your analysis.
Problem statement
My analysis involves an age-structured SEIR transmission model implemented in Rstan, with a time-varying effective contact rate and daily observations. The population of the country is at the order of \approx 1.4e+07. I have previously attempted to implement the BDF solver assuming that the number of individuals in each compartment is at the original scale. The solver appears to struggle and this creates a computational bottleneck.
At Section 2.1 of the Supplementary appendix of this paper, it is stated that the number of individuals in each compartment is scaled by the total population of the area, so that the sum of all compartments is 1. For reference, I have successfully replicated the analysis for Spain (period of data collection = 46 days) under the same MCMC setting, so I know what to expect in terms of computational time in my machine, e.g. by simply looking at the rstan message "Gradient evaluation took ... seconds"). The same implementation with RK45 in my machine appears to increase the computational cost per itaration by \approx 3 times.
When I am working with the option of scaled compartments, the computational time for gradient evaluation is at the same order of magnitude as the reference analysis, so I get rid of the computational bottleneck.
I see that Version "10_bales" of the model is the first time you switch from the RK45 to the BDF ode solver, but still working with a compartmental model that assumes homogeneous mixing of the population. Version 11 is the first time where the age-structured transmission model is used.
Questions
dydt[k] = - f_inf[k] * (y[k]+init[k]);
(file model16.stan, line 76)?All the best,
Lampros
I am running this on RStudio on MacOSX 10.14.
I ran the model in RStudio and got the following error (full job log output shown below):
"Error in check_pars(allpars, pars) : no parameter delta"
── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
✓ ggplot2 3.3.0 ✓ purrr 0.3.3
✓ tibble 2.1.3 ✓ dplyr 0.8.5
✓ tidyr 1.0.2 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
Attaching package: ‘lubridate’
The following object is masked from ‘package:base’:
date
Loading required package: StanHeaders
rstan (Version 2.19.3, GitRev: 2e1f913d3ca3)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
Attaching package: ‘rstan’
The following object is masked from ‘package:tidyr’:
extract
Note: As of version 1.0.0, cowplot does not change the
default ggplot2 theme anymore. To recover the previous
behavior, execute:
theme_set(theme_cowplot())
Attaching package: ‘cowplot’
The following object is masked from ‘package:lubridate’:
stamp
SAMPLING FOR MODEL 'model10' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.292054 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2920.54 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: WARNING: No variance estimation is
Chain 1: performed for num_warmup < 20
Chain 1:
Chain 1: Iteration: 1 / 5 [ 20%] (Warmup)
Chain 1: Iteration: 2 / 5 [ 40%] (Warmup)
Chain 1: Iteration: 3 / 5 [ 60%] (Sampling)
Chain 1: Iteration: 4 / 5 [ 80%] (Sampling)
Chain 1: Iteration: 5 / 5 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 1.22936 seconds (Warm-up)
Chain 1: 7.12388 seconds (Sampling)
Chain 1: 8.35324 seconds (Total)
Chain 1:
Error in check_pars(allpars, pars) : no parameter delta
Calls: sourceWithProgress ... summary -> .local -> check_pars_second -> check_pars
In addition: Warning messages:
1: There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems
3: The largest R-hat is NA, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess
Execution halted
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.