geodacenter / covid Goto Github PK
View Code? Open in Web Editor NEWCOVID Atlas alpha code
Home Page: https://geodacenter.github.io/covid/
License: GNU General Public License v3.0
COVID Atlas alpha code
Home Page: https://geodacenter.github.io/covid/
License: GNU General Public License v3.0
Following findings from county validation team, need to switch data sources with fewer merging issues while retaining accuracy and CDC standards. Chats with @linqinyu and @SteveGoldstein coalesced in sanctioning this move to USAFacts, at least until we flip to a validated multi-source dataset down the road. Validation efforts will continue.
Another idea down the road -- include a drop down of data source so we could include multiple that way.
For now -- switch to USAFacts with the easy FIPS merge? @lixun910 interested in this one?
From EJ: Is there functionality to add the administrative labels above the visualization layer? It gets hard to read and given that you want this oriented toward a national audience they’re not going to know the local names without clicking and it would be better to put above the viz.
https://www.vox.com/2020/3/28/21197421/usa-coronavirus-covid-19-rural-america
https://news.uchicago.edu/story/state-level-data-misses-growing-coronavirus-hot-spots-us-including-south
also: NPR and maybe Wall Street Journal, Scientific American
Choropleths are great, but it does make it hard to see small counties, eg, Cook, Manhattan, San Francisco...
A bubble map,as in
https://blog.mapbox.com/notable-maps-visualizing-covid-19-and-surrounding-impacts-951724cc4bd8
would be a good additional option.
another social distancing proxy
Add search bar so users can enter address or place and have the map zoom accordingly.
Needs a visible zoom in/out button, otherwise can only zoom in on my laptop (by double-clicking). I have to reload the page to zoom out. [high priority for usability!]
Marked as a priority from health user testing group, as some are having difficulty zooming both in raw atlas, and within the landing page.
We need someone to start reviewing the spreadsheet linked here and create a pilot testing location dataset that Xun could add as a layer to the map. We need to:
Qinyun's reference:
Link for the spreadsheet that includes all testing places (the point (1) I mentioned): https://docs.google.com/spreadsheets/d/1svnaZ2UG_ryFr8jjqVx7ZVZksBue4EQUJ4dolMDJx70/edit#gid=0.
Originally posted by @linqinyu in #3 (comment)
Reach out to 1point3acres crowdsourcing group to see if we can get access to daily county-level CSVs.
Several cases in the county-level dataset we'll be getting will be listed as "unassigned." These are usually updated within a day or two to the right county, but it seems that some states are better about this than others. Something to think about and discuss. What will our protocol be?
include neighbors of cluster cores in map app since explaining to people what a cluster core is might be confusing vs just visualizing the whole cluster.
Calling volunteers! We need help updating a document to identify how, and at what scale, each state health department is recording confirmed cases, testing (ie. negative cases), and deaths. This will be essential to help confirm county-cases on a daily level, and identify what will be automated vs what will require human editors.
https://docs.google.com/spreadsheets/d/1b3ElJC8AnwnYfBupBmoJEZJ8D53YfF719R2AL4ierb0/edit?usp=sharing
Multiple options for daily visualizations:
Things to consider:
Who can help to crawl this website?
How states react with quarantine over time may be of interest as both a visualization, and for future research. We will collect and visualize state-wide data on quarantine policies in the US. This would be updated and validated on a weekly basis.
Need to ID someone who is interested in starting this collection in a google spreadsheet. You can use the the appropriate state name/id from the states.geojson file in the data folder of this repo to make merging easier. Columns would indicate time/dates and we'll need to identify different categories/coding for different types of policies. Please use thread here for discussion of coding strategies for collaborative approval.
Originally posted by @linqinyu in #6 (comment)
Generate a user session for the map, so that the map doesn't re-initialize and reset every time a new selection is made. For example, if a user zooms into a region, using the temporal slider or clicking a different variable forces them out to the whole US again. Then we could use a button to reset as an option, vs making it a default for any task. I think this may be the "buggy" experience some have been noting.
Have some sample code for that here: https://github.com/Makosak/chihealthaccess/blob/master/js/maps_lib.js but would need to be re-tooled, and I'm a bit rusty.
The data are shared by the Berkely group:
NewYorkTimes data will be updated here:
https://github.com/Yu-Group/covid19-severity-prediction/blob/master/data/nytimes.
Add mobility index dataset for counties (proxy for social distancing, something > nothing), using DesCartes Lab's Mobility index data. Be sure to call this "Mobility Index" and source DesCarte lab (link back to their Github).
https://github.com/descarteslabs/DL-COVID-19
UI consideration: Visualize this as choropleth and clustering as default? Or is there a better way?
Also we will likely get at least 1 more social-distancing-proxy county-level dataset, so thinking about how that is presented?
Note: this could be split into 2 tasks (data merge, JS viz), depends on who claims it!
Including these as feature requests/enhancements. Xun feel free to take 'em all on or divvy up.
On a 13 inch laptop screen, the right panel is too long and causes doubled scroll-bars. A max-height of 96% does not consider the margins (24px on all sides in this case). One option to fix: use a max-height of calc(96% - 24px) (or greater if you want to include the bottom margin). Calc is very well supported except for some known issues in IE. https://caniuse.com/#search=calc()
Color on the map and on the legend don't match very well, making it hard to compare on the choropleth version in particular. This may be a function of opacity. Some options: add an opacity toggle to switch between full and lower map opacity (if you want to see what's behind the map); add basemap-colored layer underneath the legend and then make the legend less opaque so it matches the map.
The all cases chart y-axis text is cut off when there is a scroll bar on the right panel (and pretty close to the edge otherwise). Recommend using a smaller width SVG for this chart, and/or making it responsive would also help. Also note that the padding on the right panel container is pushing the SVGs to the right. You may want to make the SVGs no wider than the interior width of the panel (currently set at 344px).
Unnecessary bottom scroll bar appears on my laptop screen. I suspect this is from the right-margin on the right control panel, though the over-long SVG might also be problematic here. Option to fix: set the margin-right to 0 and the right property to 24px. This will absolutely position it 24px from the edge if that is the desired outcome.
Recommend adding medium-grey lines across the charts (based on y-axis ticks) to make them easier to get values from. Tooltips on the bar chart in particular would help.
Recommend some indication that the Data menu is actually a dropdown. It looks just like the buttons right now, and I didn't know what it was until I clicked on it.
Recommend clipping the Great Lakes out of the state geojson. It looks slightly unusual to keep them in.
Hello everyone!
I've been diving into the code this week to get a better feel for how data flows from 1P3A and USAFacts all the way to the front end. This is really fantastic work and I think will make a great foundation as we keep iterating on the back end!
I was hoping to kick off a conversation around how we might be able to translate the hourly_update.py
script into something that could take on the character of an automated data pipeline. I had a few ideas I wanted to share, and looking forward to getting your input as well @jkoschinsky @Makosak @lixun910 @linqinyu
Broadly speaking, I think it would be helpful to have two types of Python scripts we use to pipe data into the app: "fetchers" and "transforms".
A fetcher would be a simple script that runs on a schedule and dumps data as-is from one of our sources into file storage (we can think of this an an archive). For instance, there could be a fetcher for 1P3A cases that runs on the hour and drops a file called 1P3A_cases_<timestamp>.csv
into an Amazon S3 bucket. That bucket would be solely for keeping snapshots of the source data in the original.
When the fetcher is done, it would kick off a transform script that takes the source data, does whatever's needed to get it ready for the Atlas (e.g. aggregates cases by county/state), and puts the result into a separate S3 bucket that's just for derived data products.
In some ways this is quite similar to what's already happening (I see hourly_update.py
has distinct steps for fetching and transforming, so I don't think that code would have to change much at all!) But I did want point out a few potential benefits to packaging this process for the cloud, as well as externalizing data from the Git repo:
A few quick notes on error handling:
unmatched.txt
file that logs counties without matches would stay the same—it would just live in S3, but we could still review it manually and add exceptions to the code where they're needed.As far as implementing this, it shouldn't be hard to adapt hourly_update.py
into two AWS Lambda functions (one fetcher, one transform). We can schedule the fetcher to run on the hour with CloudWatch Events. On the front end, the app would just pull the CSV/JSON files it needs from S3 rather than GitHub.
I think that about sums it up—I hope I explained this well enough but please let me know if I can expand on anything! And of course, I don't want to overcomplicate a process that's been working very well, but it sounds like the Atlas may be reaching a scale of data where going to the cloud could have some tangible benefits.
Looking forward to hearing everyone's thoughts on next steps for the back end and continuing the conversation!
Generate a report for regional clusters. For each regional cluster, the current plan is to include the following information: a list of counties, # of counties, total population, total confirmed cases, total death counts, the fatality rate, confirmed cases per 1M, death counts per 1M, and a figure for this particular cluster.
The current plan is to update this report every 3 days, twice a week. May change later. The movie/gif will also be updated with this report.
Suggestions and comments are always welcome!
We need a volunteer lead to start updating the county case file we have from 1P3A and help determine a more efficient protocol/workflow process. Things to consider:
Once we have a protocol in place we will be able to better take advantage of volunteers and potential RAs to help update this on a regular (daily) basis
UI suggestion: Adding a logarithmic scale option for cases would be helpful in identifying growth trends.
Tagging Xun and Pedro; Pedro's code is almost ready to go, and needs to be integrated. Would update daily based on new data.
Later we'll need to identify an easy way for groups to add their models in the format we need, etc.
Add COVIDCareMap hospitals as a point layer to click/off to help planning for hospital + planner crowd: https://www.covidcaremap.org/maps/us-healthcare-system-capacity/#3.85/38.63/-93.09
We're already using this as data for county levels, but our healthcare "customers" noted that being able to explore hospitals as points with data attached would be super.
add README info, including:
anything else?
Requested by Indian Health Service
Based on stakeholder feedback, so users can access about, methods, blog pages from map.
Suggesting a few potential enhancements to the map that I believe would increase the clarity of the message:
Happy to help if I can on this, just point me in the right direction!
On countyV branch, county_validation.py
fails if the data/validation/raw
directory doesn't exist. I added a file, data/validation/raw/.placeholder
(as well as modifying .gitignore
) to my fork. I started to submit a pull request but quick when I saw that 45 files would be in the commit, even though I thought I only those two files had been touched.
I don't realize there are so many volunteers here. Anyone who knows d3.js, you are welcome to improve the d3.js made charts: the line chart and the bar chart. E.g. add mouse over function to the line chart, or add a navigation line to the line chart etc.
The code to create the line char is in index.js in functions:
addTrendLine() and updateTrendLine()
The code to create the bar chart is in index.js in function:
createTimeSlider()
Hi everyone who volunteers on this project:
We are acknowledging all the volunteers here:
https://spatial.uchicago.edu/content/us-covid-19-atlas
with this page: https://spatial.uchicago.edu/content/volunteers-csds-covid-10-atlas
I'm missing title, affiliation and URL for these volunteers if you can please add to this ticket:
Sihan Mao
John Steill
Steve Goldstein
Sean Kent
Steven R Wangen
Yuetian Luo
Brian Yangell
Will then update the page.
Thx!
The "Show Time Label" option returns very big labels where only the year "2020" is available. I do not know if this is intentional as the data from 1p3a API might still be lagged or if this is a bug? This has been a consistent issue with the browsers I have (Mac, Safari and Opera, and Windows, Opera.)
Two successive rough questions:
as is the rate is expressed as count per million population, I believe the Johns Hopkins figures are per 10,000. for smaller counties, a million is probably too large a denominator to be meaningful. should we consider using 10,000?
Does anyone have the latest population data for states and counties?
The CSV file we're getting likely just has state name and county name. We will need to merge this with the GIS shapefiles. Note that county names are not unique, but state + county names are unique. Need to identify someone to take this merging issue on as we'll like be merging on a daily basis.
Add data from the County Health Rankings group with socioeconomic variables
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.