Giter Club home page Giter Club logo

covid19canada's Introduction

⚠️ IMPORTANT NOTICE ⚠️

THIS DATASET HAS BEEN REPLACED WITH A NEW DATASET: CovidTimelineCanada

⚠️ Please use the new dataset from now on. The dataset in this repository will no longer be updated as of May 4, 2022.

🚨 Vaccine-related datasets have also been added to the new repository (vaccine coverage, vaccine administration) or will be added in the near future (vaccine distribution).

❗ To ease the transition to the new dataset, case and death datasets using the old column names, date format, province/territory names and health region names are being offered for download as CSV files. These files should be more-or-less drop-in replacements for the old case and death datasets. However, we encourage users to switch to the new dataset format, as this legacy format will not be supported indefinitely. Download links for the CSV files:

Epidemiological Data from the COVID-19 Outbreak in Canada

The COVID-19 Canada Open Data Working Group collects daily time series data on COVID-19 cases, deaths, recoveries, testing and vaccinations at the health region and province levels. Data are collected from publicly available sources such as government datasets and news releases. Updates are made nightly at 22:00 ET. See data_notes.txt for notes regarding the latest data update. Our data collection is mostly automated; see Covid19CanadaETL for details.

Our data dashboard is available at the following URL: https://art-bd.shinyapps.io/covid19canada/.

Table of contents:

Accessing the data

❗ Before using our datasets, please read the Datasets section below. ⚠️

Our datasets are available in three different formats:

  • CSV format from this GitHub repository (to download all the latest data, select the green "Code" button and click "Download ZIP")
  • JSON format from our API
  • Google Drive

Note that retired datasets (retired_datasets) are only available on GitHub.

Datasets

Usage notes and caveats

The dataset in this repository was launched in March 2020 and has been maintained ever since. As a legacy dataset, it preserves many oddities in the data introduced by changes to COVID-19 reporting over time (see details below). A new, definitive COVID-19 dataset for Canada is currently being developed as CovidTimelineCanada, a part of the What Happened? COVID-19 in Canada project. While the new CovidTimelineCanada dataset is not yet stable (and thus should not be relied upon), it fixes many of the aforementioned oddities present in the legacy dataset in this repository.

  • ℹ️ See data_notes.txt for notes regarding issues affecting the dataset.
  • ℹ️ Ontario case, mortality and recovered data are retrieved from individual public health units (exceptions are listed here and differ from values reported in the Ontario Ministry of Health dataset. For most public health units, we limit cases to confirmed cases (excluding probable cases).
  • ⚠️ Impossible values, such as negative case or death counts
    • Our dataset preserves some "impossible" values such as negative daily case or death counts. This is because our dataset reports primarily the cumulative value reported each day by the public health authority. Since historical data are sometimes revised (e.g., cases reassigned to different regions, fixing data quality issues, etc.), this sometimes results in negative values reported for a particular date.
  • ⚠️ Testing numbers are unreliable
    • For continuity, we generally report the first testing number that was reported by the province. For some provinces this was number of tests performed, for others this was number of unique people tested. For the purposes of calculating percent positivity, the number of tests performed should generally be used. The Public Health Agency of Canada provides a province-level time series of number of tests performed. We supply a compatible version of this dataset as in the official_datasets directory as phac_n_tests_performed_timeseries_prov.csv. This dataset should be used over our dataset for inter-provincial comparisons.
    • Additionally, some provinces have stopped directly reporting their COVID-19 testing numbers.
  • ⚠️ Recovered/active case counts are unreliable
    • The defintion of "recovered" has changed over time and differs between provinces. For example, Quebec changed their defintion of recovered on July 17, 2020, which created a massive spike on that date. For this reason, these data should be interpreted with caution.
    • Recovered and active case numbers for Ontario (and thus Canada) are incorrectly estimated prior to 2021-09-07 and should not be considered reliable.
    • Recovered and active case numbers for British Columbia are no longer available as of 2021-02-10. Values for this province (and thus Canada) should be discarded after this date. Several other provinces have also stopped reporting these values, including Saskatchewan, Nova Scotia and Newfoundland & Labrador.
  • ⚠ Vaccine dose numbers are unreliable
    • Many provinces have stopped reporting vaccine dose data like they did previously. The most reliable vaccine numbers are available weekly from the PHAC vaccine coverage map.

The update date and time for our dataset is given in update_time.txt.

The following time series data are available at the health region level (as well as at the level of province and Canada-wide):

  • cases (confirmed and probable COVID-19 cases)
  • mortality (confirmed and probable COVID-19 deaths)

The following time series data are available at the province level (as well as Canada-wide):

  • recovered (COVID-19 cases considered resolved that did not end in death)
  • testing (definitions vary, see our technical report
  • active cases (we use the formula active cases = confirmed cases - recovered - deaths, which explains the disrepecies between our active case numbers and those reported from official sources)
  • vaccine distribution (total doses distributed)
  • vaccine administration (total doses administered)
  • vaccine completion (second doses administered)
  • vaccine additional doses (third doses administered)

Note that definitions for each of these values differ between provinces. See our technical report for more details.

Several other important files are also available in the other folder:

  • Correspondence between health region names used in our dataset and HRUID values given in Esri Canada's health region map, with 2019 population values: other/hr_map.csv
  • Correspondece between province names used in our dataset and full province names and two-letter abbreviations, with 2019 population values: other/prov_map.csv
  • Correspondece between province names used in our dataset and full province names and two-letter abbreviations, with 2019 population values and new Saskatchewan health regions: other/prov_map_sk_new.csv
    • The new Saskatchewan health regions (13 health regions versus 6 in the original data) use unofficial estimates of 2020 population values provided by Statistics Canada and may differ from official data released by Statistics Canada at a later date

We also have a case and mortality datasets which combine our dataset with the official SK provincial dataset using the new 13 reporting zones (our dataset continues to use the old 6 reporting zones) in the hr_sk_new folder. Data for SK are only available from August 4, 2020 and onward in this dataset.

Our individual-level case and mortality datasets are retired as of June 1, 2021 (see retired_datasets).

Recommended citation

Below is the current citation for the dataset:

  • Berry, I., O’Neill, M., Sturrock, S. L., Wright, J. E., Acharya, K., Brankston, G., Harish, V., Kornas, K., Maani, N., Naganathan, T., Obress, L., Rossi, T., Simmons, A. E., Van Camp, M., Xie, X., Tuite, A. R., Greer, A. L., Fisman, D. N., & Soucy, J.-P. R. (2021). A sub-national real-time epidemiological and vaccination database for the COVID-19 pandemic in Canada. Scientific Data, 8(1). doi: https://doi.org/10.1038/s41597-021-00955-2

Below is the previous citation for the dataset:

  • Berry, I., Soucy, J.-P. R., Tuite, A., & Fisman, D. (2020). Open access epidemiologic data and an interactive dashboard to monitor the COVID-19 outbreak in Canada. Canadian Medical Association Journal, 192(15), E420. doi: https://doi.org/10.1503/cmaj.75262

Methodology & data notes

Detailed information about our data collection methodology and sources, answers to frequently asked data questions and the technical report for our dataset are available on our website. Note that some of this information is out-of-date and will eventually be updated. Information on automated data collection is available in the Covid19CanadaETL GitHub repository.

The scripts used to prepare, update and validate the datasets in this repository are available in the scripts folder.

Acknowledgements

We would like to thank all individuals and organizations across Canada who have worked tirelessly to provide data to the public during this pandemic.

Additionally, we thank the following organizations/individuals for their support:

Public Health Agency of Canada / Joe Murray (JMA Consulting)

Contact us

You can learn more about the COVID-19 Canada Open Data Working Group at our website and reach out to us via our contact page.

covid19canada's People

Contributors

ishaberry avatar jeanpaulrsoucy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid19canada's Issues

BC cases for reporting gaps

The file cases_timeseries_prov.csv could be improved for BC (and perhaps other provinces). The issue is that the province does not issue a report every day: there are gaps. The first report after a gap provides the case counts for each of the days during the gap, as well as the current day.

The problem is that the cases for the gap days are recorded as 0 in the database, and all the cases that occurred during the gap are lumped together into one number on the day of the next report.

Example (from today): If the case counts on (Saturday, Sunday, Monday) are (10, 6, 16) but there was no report issued on Saturday and Sunday, the database values are recorded as (0, 0, 32). It would be better if the cases were properly attributed to the correct days, to be consistent with the official reports.

If the current team does not have the ability to implement that procedure, I would be willing to make those updates, since I am tracking these numbers for my own purposes.

Date formatting for Ontario cases

Ontario cases from 12 to 59 have MM-DD-YYYY date formatting. The rest of the ontario cases are DD-MM-YYYY.

If at all possible, it would be good to have YYYY-MM-DD for all.

Redesign meta-data / README.md

The meta-data (primarily stored within README.md) is in need of an overhaul. Tasks will be added below as they arise:

  • Add instructions for downloading from GitHub for non-technical users
  • Explain differences between various datasets (e.g., recovered_cumulative and recovered_timeseries_prov.csv)

/timeseries_hr/recoveries file available?

Firstly, thank you for maintaining the spreadsheet and this repo.

I was looking at recoveries data in Covid19Canada/timeseries_hr/ in particular. Would that be available as well?

I am creating a web-app providing info for each province & regions and I am missing # of recoveries for each region ATM. Thanks again!

Tests-performed or people-tested?

What exactly do the testing numbers mean?

Are you reporting on people-tested, or tests-performed? I don't see that specified anywhere.

When I look at the government source data, it seems to be a mixed bag.
Some jurisdictions report on both tests-performed and people-tested, but most don't.

Examples:
Tests-performed: BC, MB
People-tested: QC, NL

Ontario data

I have switched the provincial/territorial charts on my site link fully to your datasets, and I am noticing an oddity with the Ontario data. At first I thought it was a problem of the Canada data as the numbers here differ greatly from the John Hopkins data (they report about 2000 fewer cumulative cases), but on drilling down all the provinces seem fine except Ontario.

As of this morning 2020/08/15 1100h Pacific, the Ontario Government is reporting 40,565 cases
Ontario ca

While the data here reports 42288 cases. While comparing to: COVID-19 Case Data (Ontario.ca), I see the data starts diverging on April 01, 2020. I am curious if there is a known reason for this difference?

BC Testing Numbers Discrepancy

There seems to a be an issue with the testing numbers coming from the API, where there will be no tests reported for 2-3 days, then all of the missing days reporting on a singular day.

image

The data from the BC CDC website (which seems to be your source) has all of the testing numbers by day, which matches up with what they present on their ArcGIS dashboard. Any chance we can get that information updated in the API?

BC and Alberta weekend new case numbers

Both BC and Alberta new case numbers don't show weekend data, Saturday and Sunday numbers are merged with Monday, leading to 0s on weekends and hyper-inflated Monday counts. Both provinces do however report weekend breakdowns, and the data are available via the BC CDC and the Alberta government. Is there any appetite to correct the BC and Alberta weekend numbers?

Format of numeric columns

Thanks so much for the repo and maintaining the spreadsheet. In the spreadsheet, would it be possible to keep columns with numerical values, purely numeric? The cumulative_testing column on the Testing tab has asterisks on some numbers, which can cause (small) headaches when importing the data. Could the asterisk be in the column beside instead?

Cheers!
Jon

Some more ideas for graphing the data

I created the following from the John Hopkins data:

https://rigsomelight.com/canada_covid

It contains some graphing ideas that I got from the Financial Times and talking to friends.

Mainly providing a number of days since first case as an X value and looking at percentage of population as a Y value. You may be interested in putting these on your dashboard.

I'm available if you need someone to work on it.

Add "Repatriated" to health region time series

Add to both hr_map.csv and actual timeseries_hr files for each stat.

This will allow the same range of data to be included in each of the timeseries (canada, prov, hr).

It will also simplify the function to construct the actual timeseries files from the raw files using update_data.R.

Add "Repatriated" to testing time series

Repatriated testing numbers are available from the PHAC CSV file.

Repatriated cases have already been added to the recovered file based on assumptions regarding recovery time (and lack of news reports indicating serious disease in identified repatriated travelers).

Source of recovered / deaths / tested

First of all curating this data is nothing less than phenomenal, and the dashboard you have created is awesome. Thank you for this public service. I'm curious though about your source for the number recovered, deaths and number tested, as this information doesn't seem to be in the google spreadsheet you linked from the dashboard. Can you point me in the right direction? Thanks!

Could hospitalization data be added?

At least one province (BC) has started publishing hospitalization numbers. This seems like an important statistic to follow. Could the available data be added to the spreadsheet?

For BC the daily reports are listed on http://www.bccdc.ca/health-info/diseases-conditions/covid-19/case-counts-press-statements. An example report is http://www.bccdc.ca/Health-Info-Site/Documents/BC_Surveillance_Summary_March_27_final.pdf. I've extracted some of the data here https://docs.google.com/spreadsheets/d/1uz-hq7ncFff92iSh63_G-oFew2O1F_2eHOfr1_zdREg/edit?usp=sharing

Recovery numbers for Ontario are lagging as case numbers are from PHUs

Thanks for curating this dataset and all your efforts!

I noticed that the number of recoveries is equal to that reported by Ontario (which is lagging) while case numbers are from individual PHUs (which are more recent). This makes it seem like Ontario has a larger number of active cases than it actually does.

For example, on July 28th, 21:00
According to this data: # of active cases is 3,400 with 34,567 recovered.
If you sum up recoveries from individual PHUs they come to around 36,200 meaning in reality active cases is almost half of that.

Would it be possible for you to source recoveries from individual PHUs as well?

JSON API

Hi there,

I would like to ask the maintainers if it would be ok if we forked this into a (open, of course) JSON API?

Postal Codes

Hi all,

Love the effort so far. I am interested in creating a project that displays this data visually on a map to neighbourhood detail. This won’t be possible without postal codes.

How challenging would it be to get postal codes? I realised just the first three alphanumerics are good enough to show a neighbourhood close up (also mitigating any privacy concerns).

Ratnesh

I can't see where you're getting the numbers for Recovery, for 5 jurisdictions.

code for Health regions

Hi,
Impressive work! Just wonder is it possible to add the HR_UID to each health region?
Seems that the health regions names from different sources (e.g., shapefile) could be very different.

thanks,

Guowen

Add Canada-wide time series

Add an additional set of time series aggregated to the level of the entire country. Should mirror what is available for the the provincial time series:

  • active cases
  • cases
  • mortality
  • recovered
  • testing

Detailed Ontario data available in CSV format

Hello;

Detailed Province of Ontario data is available in CSV format at

https://data.ontario.ca/dataset/f4f86e54-872d-43f8-8a86-3892fd3cb5e6/resource/ed270bb8-340b-41f9-a7c6-e8ef587e6d11/download/covidtesting.csv

It is updated daily at approximately 10:15 AM Eastern. Lots of interesting stuff. Some examples are...

  • Column E is currently acive cases, noticeably lower than your count

  • Column G is cumulative deaths. Subtract previous day from current day to get deaths occuring on the most recent day..

  • Column H is cumulative number of positive tests. Subtract previous day from current day to get positives occuring on the most recent day.. I believe Ontario currently uses number of tests, not number of people, in the count.

  • Column J number of tests run in past 24 hours.

And I'm sure there is other stuff you'll be interested in. The province also has detailed daily PDF reports at https://covid-19.ontario.ca/covid-19-epidemiologic-summaries-public-health-ontario#daily

The PDF files have breakdowns by Public Health Unit going back to June 11 (June 9th and 10th data). I've been scraping the daily PDFs for daily case counts for each health unit. I can upload the daily PHU new cases data if you're interested. Note that with "adjustments", you'll see occasional days with negative case counts.

Make .xls more parsable

I'm writing a parser for the .xls for https://github.com/neherlab/covid19_scenarios, and ran into some minor issues. In particular

  • The data rows are not starting on the same row among worksheets (4,3,3 for cases, deaths,recovered)
  • province is used as label for two columns on the recovered sheet
  • there is no 'health_region' column on the recovered sheet
  • dates colums are labelled 'date_report' or 'date_death_report' and 'date_recovered'. It would be easier if they would all be called 'date_report' if possible
    It would be great if these could be unified to make parsing easier.

Ontario Significant Discrepancies in cases.csv

Thank you very much for the work you do. Multiple provinces have some minor variances to official Canadian numbers due to timing. And this is fine. When it comes to Ontario, your data set shows almost 2,000 cases higher than Ontario/Canada officially reported to date. Please see the attached excel with COVID19 cases cross-validation. It is definitely not due to the timing, as the gap started a few months ago and keeps on widening. I am trying to understand the nature of the difference that is unique to Ontario. If you think the data need to be fixed, I can help. I am also happy to jump on a call to discuss.

Variances in COVID19 Daily New Cases Reporting by Province.xlsx

Ontario testing

Cumulative test numbers in Ontario for Mar-29 or Mar-30 appear incorrect. The number of tests on Mar-29 is higher, which should not be true.

Add HR_UID to health region time series

Although HR_UID linkage is available in other/hr_map.csv, it would be much more convenient if it were present directly in the health region time series CSV files themselves.

Add official datasets adapted for compatibility with CCODWG datasets

Several provinces offer datasets (e.g., CSV files) that we do not use as direct inputs into our dataset. For example, the Ontario and BC datasets use different date schemes than our dataset, rendering them incompatible with our universal date scheme, which is public reporting date.

This addition will solve many previously discussed issues, such as #44 by allowing official datasets to serve as drop-in replacements for portions of our dataset.

The following datasets will be adapted for compatibility with the CCODWG datasets (additional datasets may added later):

Geocoding for each health region

Hi, I'm using Leaflet to geo locate the health region, but there's some health regoin i cannot locate, do you guys have any thoughts on that? I'm using Mapbox geocoding

data entry error recovered_cumulative.csv

Dear team,

The cumulative recovered counts for SK for May 31, 2020 (most recent update) should be 582 instead of 682.

see line number 992 on data page, https://github.com/ishaberry/Covid19Canada/blob/master/recovered_cumulative.csv#L992

31-05-2020 | Saskatchewan | 682 <- should be 582

https://www.saskatchewan.ca/government/health-care-administration-and-provider-resources/treatment-procedures-and-guidelines/emerging-public-health-issues/2019-novel-coronavirus/cases-and-risk-of-covid-19-in-saskatchewan

Many thanks,
Kuan

Contributing to Spreadsheet

This is an awesome project. Thanks for doing this.

I've noticed that the data is outdated for my area of Windsor-Essex. I'd love to help contribute to the spreadsheet. I'm a student from the University of Windsor. Would it be possible to get access and contribute to your data set?

Add ESRI maps

Add ESRI maps (old SK borders and new SK borders).

Feature request: Hospitalizations

Quebec officials have suggested hospitalizations as a useful measure of the impact of the virus in different provinces. Would including this data be achievable?

Reported cases/deaths are not using proper health regions

Looking at the health region data it looks like the names of the regions are incomplete. The full list of health regions are listed at https://www150.statcan.gc.ca/n1/pub/82-402-x/2015002/app-ann/ap-an1-eng.htm

Among the values in the data I see "Fraser" but this is ambiguous and can't be mapped to the actual health regions which are:

  • 5921 Fraser East Health Service Delivery Area
  • 5922 Fraser North Health Service Delivery Area
  • 5923 Fraser South Health Service Delivery Area

Hoping this data is available somewhere upstream...

Tests - ON not including pending tests

Although I don't see it stated anywhere, you seem to be reporting on the total number of tests performed, including pending tests.

If that's the case, your reporting for Ontario seems to be off. For example, yesterday, on April 22, ON reported these numbers:

  • Total tests completed 184,531
  • Currently under investigation 6,845 (Samples with testing in progress.)

Your reported testing number for ON for that day is 184,531. If you are indeed reporting completed+pending, shouldn't you be reporting the sum of the above two numbers?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.