ccodwg / covid19canada Goto Github PK
View Code? Open in Web Editor NEWEpidemiological Data from the COVID-19 Epidemic in Canada
Home Page: https://opencovid.ca/
License: Creative Commons Attribution 4.0 International
Epidemiological Data from the COVID-19 Epidemic in Canada
Home Page: https://opencovid.ca/
License: Creative Commons Attribution 4.0 International
In the active_timeseries_prov.csv, the active cases in Quebec drops from 25,102 to 1,556 in one day (from 2020-07-16 to 2020-07-17). This amount of reduction seems unlikely. Could your team please validate and comment on these unusual figures?
Hello,Your dataset was added to CoronaWhy (https://www.coronawhy.org/) Data Lake on Dataverse as a piece of common COVID-19 dataframe http://datasets.coronawhy.org/dataset.xhtml?persistentId=doi:10.5072/FK2/Z1KPLZ
Would you be willing to help with maintenance of your dataset in Dataverse, e.g. adding the relevant metadata and keeping the dataset up-to-date? That will help to make the dataset findable and accessible for medical science community.
Both BC and Alberta new case numbers don't show weekend data, Saturday and Sunday numbers are merged with Monday, leading to 0s on weekends and hyper-inflated Monday counts. Both provinces do however report weekend breakdowns, and the data are available via the BC CDC and the Alberta government. Is there any appetite to correct the BC and Alberta weekend numbers?
Add ESRI maps (old SK borders and new SK borders).
The file cases_timeseries_prov.csv
could be improved for BC (and perhaps other provinces). The issue is that the province does not issue a report every day: there are gaps. The first report after a gap provides the case counts for each of the days during the gap, as well as the current day.
The problem is that the cases for the gap days are recorded as 0 in the database, and all the cases that occurred during the gap are lumped together into one number on the day of the next report.
Example (from today): If the case counts on (Saturday, Sunday, Monday) are (10, 6, 16) but there was no report issued on Saturday and Sunday, the database values are recorded as (0, 0, 32). It would be better if the cases were properly attributed to the correct days, to be consistent with the official reports.
If the current team does not have the ability to implement that procedure, I would be willing to make those updates, since I am tracking these numbers for my own purposes.
Hi there,
I would like to ask the maintainers if it would be ok if we forked this into a (open, of course) JSON API?
Dear team,
The cumulative recovered counts for SK for May 31, 2020 (most recent update) should be 582 instead of 682.
see line number 992 on data page, https://github.com/ishaberry/Covid19Canada/blob/master/recovered_cumulative.csv#L992
31-05-2020 | Saskatchewan | 682 <- should be 582
Many thanks,
Kuan
This is an awesome project. Thanks for doing this.
I've noticed that the data is outdated for my area of Windsor-Essex. I'd love to help contribute to the spreadsheet. I'm a student from the University of Windsor. Would it be possible to get access and contribute to your data set?
Add an additional set of time series aggregated to the level of the entire country. Should mirror what is available for the the provincial time series:
At least one province (BC) has started publishing hospitalization numbers. This seems like an important statistic to follow. Could the available data be added to the spreadsheet?
For BC the daily reports are listed on http://www.bccdc.ca/health-info/diseases-conditions/covid-19/case-counts-press-statements. An example report is http://www.bccdc.ca/Health-Info-Site/Documents/BC_Surveillance_Summary_March_27_final.pdf. I've extracted some of the data here https://docs.google.com/spreadsheets/d/1uz-hq7ncFff92iSh63_G-oFew2O1F_2eHOfr1_zdREg/edit?usp=sharing
Cumulative test numbers in Ontario for Mar-29 or Mar-30 appear incorrect. The number of tests on Mar-29 is higher, which should not be true.
The numbers for cases
and cumulative_cases
for Ontario don't match those on the government website. Why is that?
Several provinces offer datasets (e.g., CSV files) that we do not use as direct inputs into our dataset. For example, the Ontario and BC datasets use different date schemes than our dataset, rendering them incompatible with our universal date scheme, which is public reporting date.
This addition will solve many previously discussed issues, such as #44 by allowing official datasets to serve as drop-in replacements for portions of our dataset.
The following datasets will be adapted for compatibility with the CCODWG datasets (additional datasets may added later):
Adding the province short codes (two-letter codes) to time series CSVs would help with usability of the datasets.
Looking at the health region data it looks like the names of the regions are incomplete. The full list of health regions are listed at https://www150.statcan.gc.ca/n1/pub/82-402-x/2015002/app-ann/ap-an1-eng.htm
Among the values in the data I see "Fraser" but this is ambiguous and can't be mapped to the actual health regions which are:
Hoping this data is available somewhere upstream...
I created the following from the John Hopkins data:
https://rigsomelight.com/canada_covid
It contains some graphing ideas that I got from the Financial Times and talking to friends.
Mainly providing a number of days since first case as an X value and looking at percentage of population as a Y value. You may be interested in putting these on your dashboard.
I'm available if you need someone to work on it.
Thank you very much for the work you do. Multiple provinces have some minor variances to official Canadian numbers due to timing. And this is fine. When it comes to Ontario, your data set shows almost 2,000 cases higher than Ontario/Canada officially reported to date. Please see the attached excel with COVID19 cases cross-validation. It is definitely not due to the timing, as the gap started a few months ago and keeps on widening. I am trying to understand the nature of the difference that is unique to Ontario. If you think the data need to be fixed, I can help. I am also happy to jump on a call to discuss.
Variances in COVID19 Daily New Cases Reporting by Province.xlsx
Where are health_region values coming from? There are some variances from official health region/authorities boundaries.
Sask has 13 health region/authorities eg https://www150.statcan.gc.ca/n1/pub/82-402-x/2017001/maps-cartes/rm-cr10-eng.htm but Google Sheet has only 6 eg Central, Far North, North
Regina, Saskatoon & South
Thanks
Thanks for curating this dataset and all your efforts!
I noticed that the number of recoveries is equal to that reported by Ontario (which is lagging) while case numbers are from individual PHUs (which are more recent). This makes it seem like Ontario has a larger number of active cases than it actually does.
For example, on July 28th, 21:00
According to this data: # of active cases is 3,400 with 34,567 recovered.
If you sum up recoveries from individual PHUs they come to around 36,200 meaning in reality active cases is almost half of that.
Would it be possible for you to source recoveries from individual PHUs as well?
Hi,
I can't see where you're getting the numbers for Recovered, for 5 jurisdictions.
I'm using these links as base:
NS
https://novascotia.ca/coronavirus/
QC
https://www.msss.gouv.qc.ca/professionnels/maladies-infectieuses/coronavirus-2019-ncov/
AB
https://www.alberta.ca/covid-19-alberta-data.aspx
NL
https://www.gov.nl.ca/covid-19/
Please advise.
Thanks so much for the repo and maintaining the spreadsheet. In the spreadsheet, would it be possible to keep columns with numerical values, purely numeric? The cumulative_testing column on the Testing tab has asterisks on some numbers, which can cause (small) headaches when importing the data. Could the asterisk be in the column beside instead?
Cheers!
Jon
Cases.csv not being updated daily as was before. I notice today that Google Sheet has been updated but cases.csv has not.
Should we now consider Google Sheet as official regularly updated data source?
Thanks for all your good work!
The meta-data (primarily stored within README.md) is in need of an overhaul. Tasks will be added below as they arise:
What exactly do the testing numbers mean?
Are you reporting on people-tested, or tests-performed? I don't see that specified anywhere.
When I look at the government source data, it seems to be a mixed bag.
Some jurisdictions report on both tests-performed and people-tested, but most don't.
Examples:
Tests-performed: BC, MB
People-tested: QC, NL
Hi,
Could you please publish a table showing the coordinates you used for each health region?
Thanks!
-OWN
First of all curating this data is nothing less than phenomenal, and the dashboard you have created is awesome. Thank you for this public service. I'm curious though about your source for the number recovered, deaths and number tested, as this information doesn't seem to be in the google spreadsheet you linked from the dashboard. Can you point me in the right direction? Thanks!
These are available in the PHAC CSV.
Hello;
Detailed Province of Ontario data is available in CSV format at
It is updated daily at approximately 10:15 AM Eastern. Lots of interesting stuff. Some examples are...
Column E is currently acive cases, noticeably lower than your count
Column G is cumulative deaths. Subtract previous day from current day to get deaths occuring on the most recent day..
Column H is cumulative number of positive tests. Subtract previous day from current day to get positives occuring on the most recent day.. I believe Ontario currently uses number of tests, not number of people, in the count.
Column J number of tests run in past 24 hours.
And I'm sure there is other stuff you'll be interested in. The province also has detailed daily PDF reports at https://covid-19.ontario.ca/covid-19-epidemiologic-summaries-public-health-ontario#daily
The PDF files have breakdowns by Public Health Unit going back to June 11 (June 9th and 10th data). I've been scraping the daily PDFs for daily case counts for each health unit. I can upload the daily PHU new cases data if you're interested. Note that with "adjustments", you'll see occasional days with negative case counts.
Question about health regions. There is a statscan classification on health regions, but I found that many of the names do not match with the values provided in the data.
https://www150.statcan.gc.ca/n1/pub/82-402-x/2017001/app-ann/ap-an1-eng.htm
Curious if you are using a different definition definition for health regions and if these information are available.
thanks!
Hi,
I believe you are missing two cases from PEI that were added on Sunday Oct 4th, 2020. See: https://www.cbc.ca/news/canada/prince-edward-island/pei-covid-19-two-cases-1.5750109
The case total is now 61.
The PEI Case Data still has 59 listed, however, the Public Health Agency of Canada Site is reporting the 2 additional cases.
Thanks
I have switched the provincial/territorial charts on my site link fully to your datasets, and I am noticing an oddity with the Ontario data. At first I thought it was a problem of the Canada data as the numbers here differ greatly from the John Hopkins data (they report about 2000 fewer cumulative cases), but on drilling down all the provinces seem fine except Ontario.
As of this morning 2020/08/15 1100h Pacific, the Ontario Government is reporting 40,565 cases
While the data here reports 42288 cases. While comparing to: COVID-19 Case Data (Ontario.ca), I see the data starts diverging on April 01, 2020. I am curious if there is a known reason for this difference?
Ontario cases from 12 to 59 have MM-DD-YYYY date formatting. The rest of the ontario cases are DD-MM-YYYY.
If at all possible, it would be good to have YYYY-MM-DD for all.
Repatriated testing numbers are available from the PHAC CSV file.
Repatriated cases have already been added to the recovered file based on assumptions regarding recovery time (and lack of news reports indicating serious disease in identified repatriated travelers).
First, thanks for the great service!
I was wondering if you had an explanation on the variation on some of the data. For example, today's total number of cases on the dashboard (727) varies from Canada.ca's Public Health Services (621)
I pull data from https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/ and also see a similar discrepancy.
Are you assigning this case number, or is this case number assigned by the province? Looking to see if I can just this for joining with an Ontario dataset.
Although HR_UID linkage is available in other/hr_map.csv, it would be much more convenient if it were present directly in the health region time series CSV files themselves.
Hi all,
Love the effort so far. I am interested in creating a project that displays this data visually on a map to neighbourhood detail. This won’t be possible without postal codes.
How challenging would it be to get postal codes? I realised just the first three alphanumerics are good enough to show a neighbourhood close up (also mitigating any privacy concerns).
Ratnesh
Quebec officials have suggested hospitalizations as a useful measure of the impact of the virus in different provinces. Would including this data be achievable?
Just so you are aware.. on the dashboard at https://art-bd.shinyapps.io/covid19canada/
if you try to order the table at the bottom of the overview page by "Cases per 100,000 population", the order is wrong for both ascending and descending order.
I have been collecting QC ON and NS data on cases and testing. Please feel free to integrate:
https://docs.google.com/spreadsheets/d/1oB6lRKAlNg0LVNXXAXV9d07ka04JWQgBvcqVjHKc21M/edit?usp=sharing
Although I don't see it stated anywhere, you seem to be reporting on the total number of tests performed, including pending tests.
If that's the case, your reporting for Ontario seems to be off. For example, yesterday, on April 22, ON reported these numbers:
Your reported testing number for ON for that day is 184,531. If you are indeed reporting completed+pending, shouldn't you be reporting the sum of the above two numbers?
Add to both hr_map.csv and actual timeseries_hr files for each stat.
This will allow the same range of data to be included in each of the timeseries (canada, prov, hr).
It will also simplify the function to construct the actual timeseries files from the raw files using update_data.R.
an error on the label for Oakville area see map below under cumulative confirmed cases.
Also the Email [email protected] is not recognized by gmail.
Daniel
Hi,
Impressive work! Just wonder is it possible to add the HR_UID to each health region?
Seems that the health regions names from different sources (e.g., shapefile) could be very different.
thanks,
Guowen
Thanks!
Hi, I'm using Leaflet to geo locate the health region, but there's some health regoin i cannot locate, do you guys have any thoughts on that? I'm using Mapbox geocoding
I'm writing a parser for the .xls for https://github.com/neherlab/covid19_scenarios, and ran into some minor issues. In particular
Firstly, thank you for maintaining the spreadsheet and this repo.
I was looking at recoveries data in Covid19Canada/timeseries_hr/ in particular. Would that be available as well?
I am creating a web-app providing info for each province & regions and I am missing # of recoveries for each region ATM. Thanks again!
There seems to a be an issue with the testing numbers coming from the API, where there will be no tests reported for 2-3 days, then all of the missing days reporting on a singular day.
The data from the BC CDC website (which seems to be your source) has all of the testing numbers by day, which matches up with what they present on their ArcGIS dashboard. Any chance we can get that information updated in the API?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.