Giter Club home page Giter Club logo

covid-19-uk-data's Introduction

COVID-19 UK Historical Data

Data on testing and case numbers for coronavirus (COVID-19) in the UK is published by the government, but it is fragmented and not always provided in consistent or machine-friendly formats. Also, in many cases only the latest numbers are available so it's not possible to look at changes over time.

This site collates the historical data and provides it in an easily consumable format (CSV), in both wide and tidy data forms.

Ideally the data publishers will start doing this so this site becomes redundant.

Data files

The following CSV files are available:

  • data/covid-19-cases-uk.csv: daily counts of confirmed cases for (upper tier) local authorities in England, and health boards in Scotland and Wales. No data for Northern Ireland is currently available.
    • Note that prior to 18 March 2020 Wales data was broken down by local authority, not heath board.
  • data/covid-19-totals-uk.csv: daily counts of tests, confirmed cases, deaths for the whole of the UK
  • data/covid-19-totals-england.csv: daily counts of tests, confirmed cases, deaths for England
  • data/covid-19-totals-northern-ireland.csv: daily counts of tests, confirmed cases, deaths for Northern Ireland
  • data/covid-19-totals-scotland.csv: daily counts of tests, confirmed cases, deaths for Scotland
  • data/covid-19-totals-wales.csv: daily counts of tests, confirmed cases, deaths for Wales
  • data/covid-19-indicators-uk.csv: daily counts of tests, confirmed cases, deaths for the whole of the UK and individual countries in the UK (England, Scotland, Wales, Northern Ireland). This is a tidy-data version of covid-19-totals-*.csv combined into one file.
  • data/daily/*.csv: daily counts, with a separate file for each date and country.

You can use these files without reading the rest of this document.

There is an experimental Datasette instance hosting the data. This is useful for running simple SQL on the data, or exporting in JSON format. Note that there may be a lag in publishing the data to Datasette.

News

  • 27 March 2020. UK daily indicators now include number of deaths for UK, England, Scotland, Wales, and Northern Ireland.
  • 25 March 2020. The reporting period for number of deaths changed. Previously it was for the 24 hour period starting and ending at 9am. The new period starts and ends at 5pm, and is reported the following afternoon at 2pm. (So the number of deaths reported on 25 March (cumulative total 463) represents the period 9am to 5pm on 24 March.) The testing and case numbers continue to be the 9am period.
  • 24 March 2020. Northern Ireland's Public Health Agency (PHA) started producing a Daily COVID-19 Surveillance Bulletin in PDF form. It contains test numbers (also broken down by Health and Social Care Trust), and case numbers but only on a choropleth map (and broken down by age and gender).
  • 21 March 2020. PHW is back to health board (not LA) breakdowns again, this time it looks permanent.
  • 20 March 2020. PHW is providing LA area breakdowns again, after not doing so for two days.
  • 18 March 2020. PHW is no longer providing LA area breakdowns. "Novel Coronavirus (COVID-19) is now circulating in every part of Wales. For this reason, we will not be reporting cases by local authority area from today. From tomorrow, we will update daily at 12 noon the case numbers by health board of residence."

Wishlist

Here are my suggestions for how to improve the data being published by public bodies.

The short version: publish everything in CSV format, and include historical data!

Department of Health and Social Care, and Public Health England

  1. Publish historical data, not just the current day's data.
  2. Add a column for number of recovered patients to the daily indicators. (It is published on the dashboard, but nowhere else.)
  3. Publish deaths by hospital every day.

Public Health Wales

  1. Publish the number of tests being performed every day.
  2. Publish daily totals (tests, confirmed cases, deaths) in machine readable form (CSV). Or failing that, at least in a consistent format on a web page.
  3. Publish confirmed cases by local authority/health board in machine readable form (CSV).
  4. Publish historical data, not just the current day's data.
  5. Publish deaths by hospital every day.

Public Health Scotland

  1. Publish daily totals (tests, confirmed cases, deaths) in machine readable form (CSV).
  2. Publish confirmed cases by local authority/health board in machine readable form (CSV).
  3. Publish historical data, not just the current day's data.
  4. Publish deaths by hospital every day.

Public Health Northern Ireland

  1. Publish daily totals (tests, confirmed cases, deaths) in machine readable form (CSV).
  2. Publish confirmed cases by local authority/health board in machine readable form (CSV). These are not currently being published, so it would be good to be able to get these figures, even if just on a web page.
  3. Publish historical data, not just the current day's data.
  4. Publish deaths by hospital every day.

Data sources and the collation process

A lot of the collation process is manual, however there are a few command line tools to help process the data into its final form. The data sources are changing from day to day, which means the process is constantly changing.

Raw data is archived under data/raw, it should never be edited.

UK

England

  • Number of tests are not published
  • Number of confirmed cases are published in the daily indicators at 6pm in XLSX format
  • Number of deaths are not published
  • Number of confirmed cases by local authority are published in the UTLA cases table at 6pm in CSV format
    • Note that prior to 11 March 2020 case numbers were published in HTML format.

Scotland

Wales

  • Number of tests are not published
  • Number of confirmed cases and deaths, and confirmed cases by local authority, are published at https://covid19-phwstatement.nhs.wales/ at midday in HTML format
  • Number of confirmed cases by local authority are published in the UTLA cases table
    • Note that prior to 11 March 2020 case numbers were published in HTML format.
  • Twitter updates: @PublicHealthW

Northern Ireland

Note that daily indicators includes confirmed cases for all countries.

By URL

URL What When Format Archived?
https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public UK tests, UK confirmed cases 2pm HTML Yes
https://www.arcgis.com/sharing/rest/content/items/bc8ee90225644ef7a6f4dd1b13ea1d67/data UK tests, England/Scotland/Wales/NI confirmed cases, UK deaths ("daily indicators") 6pm XLSX No
https://www.arcgis.com/sharing/rest/content/items/b684319181f94875a6879bbc833ca3a6/data England confirmed cases by local authority ("UTLA cases table") 6pm CSV No
https://www.arcgis.com/sharing/rest/content/items/ca796627a2294c51926865748c4a56e8/data England confirmed cases by NHS region ("NHA regional cases table") 6pm CSV No
https://www.gov.scot/coronavirus-covid-19/ Scotland tests, confirmed cases, deaths, confirmed cases by local authority 2pm HTML Yes
https://covid19-phwstatement.nhs.wales/ Wales confirmed cases, deaths, confirmed cases by local authority midday HTML Yes
https://www.health-ni.gov.uk/news/ Northern Ireland tests, confirmed cases, deaths 2pm HTML No

Note that the arcgis.com links are direct links to the data.

Local Authority and Health Board metadata

Related projects

Tools

There are command line tools for downloading, parsing, and processing the data. They rely on Python 3.

To install the tools, create a virtual environment, activate it, then install the required packages:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Daily workflow

A sqlite DB is now used to store and aggregate intermediate data. The CSV files remain the point of record.

The crawl tool will see if the reseouce (webpage, date file) has already been downloaded, and if it hasn't download it if it's available for the specified date (today). (If not available the tool will exit.) If available, the tool will then extract the relevant information from it and update the sqlite database. This means that you can just run crawl until it finds new updates.

The convert_sqlite_to_csvs tool will extract the data from sqlite and update the CSV files.

./tools/update.sh Wales
./tools/update.sh Scotland
./tools/update.sh 'Northern Ireland'
./tools/update.sh UK
./tools/update.sh UK-daily-indicators
./tools/update.sh England
DATE=$(date +'%Y-%m-%d')
curl -L https://www.arcgis.com/sharing/rest/content/items/ca796627a2294c51926865748c4a56e8/data -o data/raw/NHSR_Cases_table-$DATE.csv

Check data consistency

./tools/check_indicators.py
./tools/check_totals.py

Update Dataset instance: https://glitch.com/edit/#!/covid-19-uk-data, then click on Tools > Terminal

curl https://raw.githubusercontent.com/tomwhite/covid-19-uk-data/master/data/covid-19-uk.db -o data/covid-19-uk.db

Check: https://covid-19-uk-data.glitch.me/

Manual overrides

Sometimes it's necessary to fix data by hand. In this case the following tools are useful:

Repopulate the sqlite database from the CSV files:

rm data/covid-19-uk.db
csvs-to-sqlite --replace-tables -t indicators -pk Date -pk Country -pk Indicator data/covid-19-indicators-uk.csv data/covid-19-uk.db
csvs-to-sqlite --replace-tables -t cases -pk Date -pk Country -pk AreaCode -pk Area data/covid-19-cases-uk.csv data/covid-19-uk.db

Daily workflow (obsolete)

England (2pm, with area totals an hour or two later):

Make commands

  1. make england-all: Runs all of the UA Daily and Totals commands listed below in a single master command

UA Daily

  1. make england-ua-dailies: Runs all of the commands below
  2. make england-ua-dailies-download: Download the daily UAs
  3. make england-ua-dailies-generate: Generate the daily UAs (requires make england-ua-dailies-generate to be run first)

Totals

  1. make england-totals: Runs all of the commands below
  2. make england-totals-download: Download a temp HTML file containing the totals
  3. make england-totals-generate: Generate the totals from the temp HTML file (requires make england-totals-download to be run first) will append to the ./data/covid-19-totals-uk.csv if the temp HTML file contains today's date
  4. make england-totals-cleanup: Removed the temp HTML file (requires make england-totals-download to be run first)

Manually running scripts

Wales (11am)

DATE=$(date +'%Y-%m-%d')
curl -L https://covid19-phwstatement.nhs.wales/ -o data/raw/coronavirus-covid-19-number-of-cases-in-wales-$DATE.html
./tools/gen_daily_areas_wales.py data/raw/coronavirus-covid-19-number-of-cases-in-wales-$DATE.html data/daily/covid-19-cases-$DATE-wales.csv
# Edit data/covid-19-totals-wales.csv (only have test numbers on Thursdays, leave column blank on other days)
./tools/extract_totals.py data/raw/coronavirus-covid-19-number-of-cases-in-wales-$DATE.html

Scotland (2pm)

DATE=$(date +'%Y-%m-%d')
curl -L https://www.gov.scot/coronavirus-covid-19/ -o data/raw/coronavirus-covid-19-number-of-cases-in-scotland-$DATE.html
./tools/gen_daily_areas_scotland.py data/raw/coronavirus-covid-19-number-of-cases-in-scotland-$DATE.html data/daily/covid-19-cases-$DATE-scotland.csv
# Edit data/covid-19-totals-scotland.csv with output from running the following (double check numbers)
./tools/extract_totals.py data/raw/coronavirus-covid-19-number-of-cases-in-scotland-$DATE.html

England (2pm):

DATE=$(date +'%Y-%m-%d')
# Edit data/covid-19-totals-uk.csv with output from running the following (double check numbers)
curl -L https://www.gov.uk/guidance/coronavirus-covid-19-information-for-the-public -o data/raw/coronavirus-covid-19-number-of-cases-in-uk-$DATE.html
./tools/extract_totals.py data/raw/coronavirus-covid-19-number-of-cases-in-uk-$DATE.html

England (6pm):

DATE=$(date +'%Y-%m-%d')
curl -L https://www.arcgis.com/sharing/rest/content/items/b684319181f94875a6879bbc833ca3a6/data -o data/raw/CountyUAs_cases_table-$DATE.csv
curl -L https://www.arcgis.com/sharing/rest/content/items/ca796627a2294c51926865748c4a56e8/data -o data/raw/NHSR_Cases_table-$DATE.csv
./tools/gen_daily_areas_england.py data/raw/CountyUAs_cases_table-$DATE.csv data/daily/covid-19-cases-$DATE-england.csv
# Edit data/covid-19-totals-uk.csv with output from running the following (double check numbers)
# Also edit data/covid-19-indicators.csv
curl -L https://www.arcgis.com/sharing/rest/content/items/bc8ee90225644ef7a6f4dd1b13ea1d67/data -o data/raw/DailyIndicators-$DATE.xslx
./tools/extract_indicators.py data/raw/DailyIndicators-$DATE.xslx

Northern Ireland (2pm)

Northern Ireland (evening)

This is often no longer needed since the numbers come from the daily indicators

open https://www.publichealth.hscni.net/news/covid-19-coronavirus#situation-in-northern-ireland
# Edit data/covid-19-totals-northern-ireland.csv with output from running the following (double check numbers)
curl -L https://www.publichealth.hscni.net/news/covid-19-coronavirus -o ni-tmp.html
./tools/extract_totals.py ni-tmp.html

Consolidate and check

./tools/consolidate_daily_areas.py
./tools/convert_totals_to_indicators.py
./tools/check_indicators.py
./tools/check_totals.py

covid-19-uk-data's People

Contributors

tomwhite avatar desholmes avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.