Install the stable version of the package using
{drat}
:
install.packages("drat")
drat:::add("epiforecasts")
install.packages("covidregionaldata")
Install the development version of the package with:
remotes::install_github("epiforecasts/covidregionaldata")
The function which returns sub-national level data by country is covidregionaldata::get_regional_data()
.
This function takes 3 arguments:
country
- the English name of the country of interest. Not case sensitivetotals
(optional, default is FALSE) - a Boolean (TRUE/FALSE), denoting whether the data returned should be a table of total counts (one row per region) or time series data (one row per region/date combination).include_level_2_regions
(optional, default is FALSE) - a Boolean (TRUE/FALSE), denoting whether the data returned should be stratified by admin level 1 region (usually the largest subregion available) or admin level 2 region (usually the second largest).
For example:
covidregionaldata::get_regional_data("Belgium")
This returns a dataset with the following structure
date | region | iso_code | cases_new | cases_total | deaths_new | deaths_total | recovered_new | recovered_total | hosp_new | hosp_total | tested_new | tested_total |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-05-24 | Wallonia | BE-WAL | 24 | 18196 | 16 | 3251 | NA | NA | 8 | 5126 | NA | NA |
2020-05-25 | Brussels | BE-BRU | 26 | 5838 | 2 | 1421 | NA | NA | 6 | 2533 | NA | NA |
2020-05-25 | Flanders | BE-VLG | 183 | 32381 | 14 | 4681 | NA | NA | 29 | 9334 | NA | NA |
For totals data, use the totals
argument.
covidregionaldata::get_regional_data("Belgium", totals = TRUE)
This returns a dataset with the following structure
region | iso_code | cases_total | deaths_total | recovered_total | hosp_total | tested_total |
---|---|---|---|---|---|---|
Flanders | BE-VLG | 34195 | 4878 | 0 | 9694 | 0 |
Wallonia | BE-WAL | 19093 | 3362 | 0 | 5321 | 0 |
Brussels | BE-BRU | 6229 | 1482 | 0 | 2657 | 0 |
All countries have data for regions at the admin-1 level, usually the largest regions available (e.g. state in the USA). Some countries have data for regions at the admin-2 level (e.g. county in the USA). Requesting data stratified by Level 2 regions instead of Level 1 is done by using the include_level_2_regions
logical argument as discussed above. The datasets will also have the corresponding level 1 region included along with its corresponding code.
For an example of requesting Level 2 regions:
covidregionaldata::get_regional_data("Belgium", include_level_2_regions = TRUE)
This returns a dataset with the following structure
date | province | level_2_region_code | region | iso_code | cases_new | cases_total | deaths_new | deaths_total | recovered_new | recovered_total | hosp_new | hosp_total | tested_new | tested_total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2020-05-24 | Brussels | BE-BRU | Brussels | BE-BRU | 7 | 5812 | NA | NA | NA | NA | 4 | 2527 | NA | NA |
2020-05-24 | Antwerpen | BE-VAN | Flanders | BE-VLG | 16 | 7905 | NA | NA | NA | NA | 5 | 2510 | NA | NA |
2020-05-24 | Limburg | BE-VLI | Flanders | BE-VLG | 14 | 6126 | NA | NA | NA | NA | 2 | 1848 | NA | NA |
The possible data columns that will be returned by get_regional_data()
are listed below.
Note that Date is not included if totals
is FALSE, and level 2 region/level 2 region code are not included if include_level_2_regions
is FALSE.
The columns returned for each country will always be the same for standardisation reasons, though if the corresponding data was missing from the original source then that data field will be all NA values (or 0 if accessing totals data). Some rows may also be all NA in *_new
data cells if the data for that date was missing from the source also.
date
: the date that the counts were reported (YYYY-MM-DD).level 1 region
: The level 1 region. This column will be named differently for different countries (e.g. state, province).level 1 region code
: A standard code for the level 1 region. The column will be named differently for different countries (e.g. iso_3166_2, ons).level 2 region
: The level 2 region. This column will be named differently for different countries (e.g. city, county).level 2 region code
: A standard code for the level 2 region. The column will be named differently for different countries (e.g. iso_3166_2, fips).cases_new
: new reported cases for that daycases_total
: total reported cases up to and including that daydeaths_new
: new reported deaths for that daydeaths_total
: total reported deaths up to and including that dayrecovered_new
: new reported recoveries for that dayrecovered_total
: total reported recoveries up to and including that dayhosp_new
: new reported hospitalisations for that dayhosp_total
: total reported hospitalisations up to and including that day (note this is cumulative total of new reported, not total currently in hospital)tested_new
: tests for that daytested_total
: total tests completed up to and including that day
Currently we include functions for sub-national data in the following countries (* indicates data for level 2 regions as well):
Europe
-
Belgium (*)
-
Germany (*)
-
Italy
-
Russia
-
UK (*)
Americas
-
Brazil (*)
-
Canada
-
Colombia
-
USA (*)
Asia
-
Afghanistan
-
India
Worldwide data is also included in the package to aid analysis. There are three sources of worldwide, country-level data on cases and deaths.
- Extract total global cases and deaths by country, and specify source, using:
covidregionaldata::get_total_cases(source = c("WHO", "ECDC"))
- Extract daily international case and death counts compiled by the WHO using:
covidregionaldata::get_who_cases(country = NULL, daily = TRUE))
- Extract daily international case and death counts compiled by ECDC using:
covidregionaldata::get_ecdc_cases()
A further function for worldwide data extracts non-pharmaceutical interventions by country:
covidregionaldata::get_interventions_data()
And anonymised international patient linelist data can be imported and cleaned with:
covidregionaldata::get_linelist()
Developers who wish to contribute should read the System Maintenance Guide (SMG.md).