epiforecasts / covid Goto Github PK

View Code? Open in Web Editor NEW

109.0 109.0 30.0 11.73 GB

Temporal variation in transmission during the COVID-19 outbreak

Home Page: https://epiforecasts.io/covid/

License: MIT License

TeX 93.33% Shell 0.43% Dockerfile 0.21% R 5.36% CSS 0.67%

covid

covid's People

Contributors

Stargazers

Watchers

covid's Issues

Projections for New Zealand seem to be erroneous

I note that for a day or two the projections for New Zealand appear to be very high. The predicted range for the Reproduction number having a range up to 75 and the recorded cases being dwarfed by the predicted range.

I've attached an image to illustrate.

(Niger and Palestine may be similar.)

The doubling time plot also looks odd.

I have not personally checked the number of data points and how that might impact the computations.

Date of report

Put the date on which the report was generated somewhere at the top of each page, as well as "using data up to 2020-xx-xx" - the Rt estimates "as of..." can give the impression that it's outdated.

website update interval

The website was updated every ~2 days for a while, until about a week ago, nothing since. I suggest adding a clear indication on the website with the update frequency/schedule.

Data issues

You have mixed up total cases with daily case load. The R0 plots for some states are clearly wrong. Hawaii, Montana, Alaska.

Website feedback

Methods:

spacing issues after citations (4f6be5e)
need to space out the equations more (4f6be5e)

Contributors

Decide upon final organisation of contributors

Italy

Fix NAs on Italy map. Partially fixed by changing region_codes.rds but other regions don't seem to return any result, need to check why. (2af1778)

Maps

Pick better colours to correspond to Increasing, Likely increasing etc. and sync them with all other plots (begun this in a branch)
Better explanation of why certain locations are NA, state that not enough data/cases etc

Layout

Put Figure 1 above the summary table on each region page

-> Release

To do:

rate of growth, R^2

Ref: https://epiforecasts.io/covid/posts/national/united-states/ Figure 3

First, the rate of growth should be properly defined. If the serial interval is assumed to be constant, then rate of growth and effective reproduction number is equal. So rate of growth needs a proper time frame, IIRC it is the daily growth rate - in the beginning we had ~25% more infections each day.

I have not fully understood how the results of Fig. 3C on the coefficient of determination were calculated. Does it show an evaluation on A or B? The caption says "with values closer to 1 indicating a better fit", so I would doubt the whole calculation if R squared goes negative.

some plot facets are too small to be legible

e.g., https://epiforecasts.io/covid/posts/national/united-states/

Website feedback

Overall I think this is great. Comments below are meant as helpful suggestions, not eviscerating the work that's been done.

Agree with @kathsherratt in #7 on colour. Greys are too similar, and I'm not sure that blue adequately represents that there are increasing cases. I don't think red-yellow-green is the right palette to use but colorbrewer's 3-class RdYlBu scheme is broadly interpretable as "bad", "not so bad", "good". The five-class scheme can be used for increasing, likely increasing, unsure, likely decreasing, decreasing, with grey reserved for "No data".
Equirectangular map projection is better than Mercator but you can probably ditch Antarctica and consider a separate plot for each of the World Bank's continent definitions, optionally splitting the Americas into South and "North and Central". There's too big a difference in variation in country area (both true and distorted) to just show all nations on the one map.
Figures 2 and 3 are missing, or relabelling has broken.
I assume that "likely increasing" has a median R0 > 1 but that 1 is outside the 50% and inside the 90% interval. This isn't confirmed on the page, though, and makes it a little difficult to interpret figures 1 and 4.
Using a different colour for the R0 estimates and the number of cases by date of infection would help make it clear to the reader that figures 5 (and 7) and 6 (and 8) show different things. The consistency of style is great, but the colouring can help identify that they're different. Please don't reuse the colours from the likely increasing/decreasing when doing this, though.
Caption for table 2 should go above, and might be worth putting in the note that when the doubling time estimate is "cases decreasing" that this corresponds to cases no longer doubling and hence the doubling time is effectively infinite. One way we got around this with the outbreak delay paper was to say "at least 4.5 days" rather than "4.5 - outbreak delayed".

Really great work.

Looks like you are missing Hawaii from the US page

Hawaii is missing. You might want dedicated pages for each state so that people can see bigger charts.

Data for interactive maps and data visualisations

I have been working on interactive maps and data visualisations consulting with @seabbs. The vis are interactive svgs written with d3.js, and packaged as html widgets for inclusion in the .Rmd document.

There is a sample vis here.

Can I request 2 files to make this more straightforward and reliable?

rt.csv is working well to generate the r0 plot for each country. Could we have a similar file for the nowcasts and a summary file for country classifications?

For nowcasts:

Proposed file columns:

country : country name - using the same country names as rt.csv
date : date
median : nowcast median
lower_90 : nowcast lower 90% CI - currently named 'bottom'
upper_90 : nowcast upper 90% CI - currently named 'top'
lower_50 : nowcast lower 50% CI - currently named 'lower'
upper_50 : nowcast upper 50% CI - currently named 'upper'
cases : number of cases on each date - all NA values set to 0

(assuming that the original columns are 50% and 90% CI's)

This would make the format of the new nowcast csv file the same as rt.csv.

For individual country visualisations, these datasets could just be subset for each country and we could output static svgs with the same styles.

For summary map:

Could we output a summary csv file of the classification of each country.

Proposed file columns:

country : country name - using the same country names as rt.csv and the above file.
trajectory : with 6 coded values - decreasing, likely_decreasing, unsure, increasing, likely_increasing, no_data

This would allow for a quick join of the map data to each day's classification and then some styles.

Thanks, let me know what you think.

Highlight testing limitation

Make it very clear on all pages that estimates are impacted by changes in testing and reporting. Running out of tests is a particular issue until a new reporting equilibrium is arrived at.

trend categories: NA vs unsure vs ...

For color / labelling scheme:

is there a likely decreasing / decreasing category? if so, should show even if there aren't any examples currently
there's no apparent gradient currently from increasing -> likely increasing -> unsure, but there is conceptually; worth having a gradient in the color scheme?
the NAs need to be more distinct from other colors
the NAs could use a more useful label. Seems like there's two flavors? No reported cases, and not-yet-at-100 cases threshold?

Minor comments

Some minor and non-critical comments:

Global summary page:

General: use of word “regions” > personal preference is the word “countries” instead as regions sounds more like a group of countries (but only a preference, either works)
Figure 1: “Likely Increasing” colour grey > looks a bit too close to NA to me - could change to e.g. light blue - or to a sequential scale so that all three values are in colour order (Increasing > Likely increasing > Unsure)
Figure 4: it might be easier to read and compare if ordered by geographic region, but not a problem as is
Figure 8: to me the title reads a bit awkwardly and doesn’t intuitively match the caption (which explains really well). Going by the caption, I’d change from “Cases with date of onset on the day of report generation in all regions” to “Cases by date of report, and estimated cases by date of infection”

Methods page:

Typo in header caption: “publically” > “publicly” available data

Venezuela R=

Do you have information about R0 COVID-19 for Veenzuela?

National halving time for UK appears incompatible with regional ones?

https://epiforecasts.io/covid/posts/national/united-kingdom/ shows regional halving times all -29 or higher, but national halving time is shown as -72 (-460 – -39)
I haven't checked the calculations in the code, but that appears to suggest the two are calculated in different ways?

Brazil Data Latest Update

The page containing data for Brazil doesn't make it clear which estimates use data up to 2020-04-14 and which use data up to 2020-04-24.

https://epiforecasts.io/covid/posts/national/brazil/

Colour Palette

Multiple reviewers have flagged colour palette as an issue for the map and summary plot. In this meta-thread please battle out your colour palette choices. (I will then choose the survivor)

Choose palette
Implement palette

Selected comments

From @samclifford: Agree with @kathsherratt in #7 on colour. Greys are too similar, and I'm not sure that blue adequately represents that there are increasing cases. I don't think red-yellow-green is the right palette to use but colorbrewer's 3-class RdYlBu scheme is broadly interpretable as "bad", "not so bad", "good". The five-class scheme can be used for increasing, likely increasing, unsure, likely decreasing, decreasing, with grey reserved for "No data".

From @kathsherratt: Figure 1: “Likely Increasing” colour grey > looks a bit too close to NA to me - could change to e.g. light blue - or to a sequential scale so that all three values are in colour order (Increasing > Likely increasing > Unsure)

From @jhellewell14 : Pick better colours to correspond to Increasing, Likely increasing etc. and sync them with all other plots (begun this in a branch)

From @pearsonca: there's no apparent gradient currently from increasing -> likely increasing -> unsure, but there is conceptually; worth having a gradient in the color scheme?
the NAs need to be more distinct from other colors

A Reminder of the stakes

Mismatch in USA data sources

It looks like ECDC case counts and John hopkins data have different case counts leading to different number of a national to state scale in the USA

Add links to epiforecasts and CMMID

Seb comments

Highlight the impact of changes in testing, testing saturation and general step changes in testing on all pages.
“new infections” - is possibly a misleading label. It’s the number of new infections that ultimately get confirmed (which means something different in every country). Perhaps call it “New cases by infection date” and in the figure labels like Fig. 6 on “Global” slightly rephrase to “Cases by date of report and their estimated date of infection”.
big reproduction number plot (global) - would this look bad if they were all on the same time scale (probably, because of China)?
Latest estimates table: I’d remove “new infections” for the reasons given above, unless we can come up with a better term

sortable tables

It would be nice if the tables had sortable headings, e.g. using https://rstudio.github.io/DT/

y on summary plot axis

plot_summary from EpiNow is showing a small y on the axis when it should not. Remove for next update.

IT region map

Two of Italy's regions are missing both in the map and the table:

Basilicata (Provinces Potenza and Matera)
Molise (Provinces Campobasso and Isernia)

Reported doubling times in table have strange intervals

At the time of writing, Table 1 on https://epiforecasts.io/covid/posts/global/ has some bizarre entries, such as Australia's doubling time of -71 (7 – -6), or Austria's of 440 (15 – -16). Belgium has -14 (-22 – -10), which indicates the issue might be the way doubling/halving times are reported back to the table when the interval contains 0.

Cameroon is 110 (15 – -21) and the estimate is not contained within the interval at all. Same with Cote d'Ivoire, 51 (9.7 – -16). Croatia is -10 (15 – -3.8). Bahrain is 200 (17 – -20). These are reflected in the national summaries, e.g. https://epiforecasts.io/covid/posts/national/cameroon/ and it's clear that there's something wrong when converting from growth rates to doubling/halving times.

Netherlands was dropped from the most recent analysis run ?

It is missing with the 'estimates as of the 2020-03-29' / 'Using data available up to the: 2020-04-08' run.

But was available with the 'up to 2020-04-03(?)' run.

Question: why is Hawaii excluded from all your research on COVID?

Speed up using sequential regions

Some regions/countries take much longer to be simulated than others. It might make sense to run all regions sequentially with parallelisation within each region. For example the USA regional breakdown has one state that takes 3 times longer to run than any other during this time all cores excepting one are idle.

Short term changes

Clarify what confidence is in figure captions (perhaps we can drop entirely as added lag?): " In the figures showing RT over time "Confidence in the estimated values is indicated by shading with reduced shading" maybe clearer talking about transparency as shading may lead people to think about the confidence interval."

Infinite doubling times not being properly caught in `EpiNow`

Changes upstream mean that infinite doubling times are not being properly caught. Review and fix.

Sporadic page build failures

It looks like page builds may sporadically be failing (see #31 ) - track down, isolate and fix.

DE map for reproduction number

The map in Figure 1 https://epiforecasts.io/covid/posts/national/germany/ does not fit to the classifications given in figure 4 / table 2.

Duplicate "Figure 1" in global summary

In Covid-19: Global summary, figures are misnumbered. There are two "Figure 1" figures.

Alaska and Hawaii are not in your list of states in the detail section

On:

https://epiforecasts.io/covid/posts/national/united-states/

Alaska and Hawaii do not appear in your list of states at the bottom fo the page.

Switch to right truncated from right censored

Memory leak

Hosted in the EpiNow repo: epiforecasts/EpiNow#91

Africa Countries: Kosovo classified to be In Africa

Bringing to your attention that Kosovo is not in Africa, but in rather in Europe. Kindly rectify accordingly

Nick comments

This looks great.

Comments: The ribbons showing cases by date of infection are beautiful but to me are a little tricky to interpret. I hate to say this, but would this maybe be clearer as a geom_pointrange (with a thicker line for 50% CrI?)
The title "Summary of latest reproduction number and case count estimates by date of infection" to me was confusing as the plot is not by date.
Can there be more detail on the difference between wide CrI and the measure of uncertainty that gets reflected as translucency?
Can you show distributions that go into estimating e.g. time between infection and reporting?
Can I ask about the doubling times which often have an upper bound of infinity, and for which the point estimate sometimes is not within the uncertainty range (e.g. -100 (14 – Inf) for Italy)?

Maybe extend the doubling time graph to 23 days to match avg. ICU stay?

Since the average ICU stay appears to be 23 days, it would be nice if we could see if the 50%/90% confidence interval was above this 'safe' threshold. Meaning people have left the ICU before new ones arrive.

Data for Germany

Great work, thank you for providing this!

Could you please clarify which exact data source you are using for Germany? The linked source https://github.com/jgehrcke/covid-19-germany-gae is providing official data from RKI as well as data "curated" by two large german newspapers. The newspaper source is always more recent but the quality of the curation is debatable. The data plotted at your tool today looks unreliable (seems to have a gap for the last days and looks in general inconsistent to the offical data). Maybe it would be worth considering switching to the official RKI data? Might even be an option to take RKI data augmenting it using other source for the last three days where RKI is lacking behind? Please see screenshots attached.

Thank you

"Your Data" (gap, weird jumps):

.
"Official Data" (dense, looking very consistent in general except for the weekend effect):

Add reporting rate limitation to global page.

The global page is missing the reporting rate limitation statement ("These results are impacted by changes in testing effort, increases and decreases in testing effort will increase and decrease reproduction number estimates respectively (see Methods for further explanation)."). Add for the next update.

From: https://twitter.com/DogOfPoasts/status/1246869025855963136

Add citation info

The website now has a DOI () - need to add this to the website along with citation info

Czechia

Looks like old data is causing duplicates to show. Remove old version.

Report: https://twitter.com/ATabarrok/status/1249677197058678785

Medium term changes

Add a second global map that is split by continent (@samclifford).
Make NAs be informative (can be short term if someone has a speedy fix @jhellewell14 @pearsonca). Flagged this in caption, limitations, and methods for now (3f63772
)
Figure interactivity
Consider fixed x-axis in comparative plots
Add two dimensions to plots with hatching and colours for certainty and scale
Split nowcast results from website to make more modular. Keep nowcast results that are out of data off GitHub in deep store to keep the git repo relatively light weight. Do this by moving now-casts into a new repo covid-nowcasts and using a sub module. Regular git history purges and archives to dropbox for historic results.
Show report delay as a plot in the methods

The recent NJ new case data has test reporting artifacts probably relating to easter. The last four days should be averaged or windowed. Other data may also have this problem

Restrict scale on state comparison

It'd be nice if the comparative plot was limited to a maximum y axis value of 3 (similar to the other R plots). Guam has been throwing things off for the last several updates:

UK references

References on the UK sub-national page need updating to reflect new data sources, as below:

‘Coronavirus (COVID-19) UK Historical Data’. [link] White, T., 2020
‘Coronavirus (COVID-19) Cases in the UK’. [link] Public Health England, 2020

Brazilian data adm level 1

Hi,

I have been working in a Brazilian task force for covd-19.

Would you guys be interested to add Brazilian data at subnational level (adm 1)? I could help point out where the data is and help with translation if needed.

Cheers,
Leo

Remove file LICENSE as there is also LICENSE.md

I had a slight confusion when trying to find the licence of this repo: the top-right link "View License" on https://github.com/epiforecasts/covid points to file LICENSE, which appears to say that it's a normal proprietary software:

YEAR: 2020
COPYRIGHT HOLDER: Epiforecasts

However then I found LICENSE.md, which states it's open source software under the MIT license. Which is great 🙂 The redundant and incomplete / confusing file LICENSE can probably be simply deleted. Github will then automatically pick up LICENSE.md for the licensing information.

German region map

It looks like German regions are now not mapping correctly. This may be a change in region name processing or maybe a problem higher up the tool chain (NCoVUtils).

epiforecasts / covid Goto Github PK

covid's People

Contributors

Stargazers

Watchers

Forkers

covid's Issues

Methods:

Contributors

Italy

Maps

Layout

Selected comments

A Reminder of the stakes

Recommend Projects

Recommend Topics

Recommend Org