geodacenter / covid Goto Github PK

View Code? Open in Web Editor NEW

47.0 47.0 19.0 9.09 GB

COVID Atlas alpha code

Home Page: https://geodacenter.github.io/covid/

License: GNU General Public License v3.0

Python 11.92% R 0.87% Dockerfile 0.05% Shell 0.13% HTML 18.46% Jupyter Notebook 15.72% JavaScript 49.38% CSS 3.48%

covid's People

Contributors

Stargazers

Watchers

Forkers

stevegoldstein sihan-mao lin-k randomfractals lixun910 makosak qinyun-lin xukunxiang andres-castiblanco ryanwyg steph-yang mnsr002 acrucetta kenna-camper nermin-ghith jinfei1125 cuulee stuartlynn svijay77

covid's Issues

add differential lisa for day-to-day change of cases

Prepare (Secondary) USAFacts Data Stream

Following findings from county validation team, need to switch data sources with fewer merging issues while retaining accuracy and CDC standards. Chats with @linqinyu and @SteveGoldstein coalesced in sanctioning this move to USAFacts, at least until we flip to a validated multi-source dataset down the road. Validation efforts will continue.

Another idea down the road -- include a drop down of data source so we could include multiple that way.

For now -- switch to USAFacts with the easy FIPS merge? @lixun910 interested in this one?

Add map labels above data visualization layer

From EJ: Is there functionality to add the administrative labels above the visualization layer? It gets hard to read and given that you want this oriented toward a national audience they’re not going to know the local names without clicking and it would be better to put above the viz.

add press tab to website

https://www.vox.com/2020/3/28/21197421/usa-coronavirus-covid-19-rural-america
https://news.uchicago.edu/story/state-level-data-misses-growing-coronavirus-hot-spots-us-including-south

also: NPR and maybe Wall Street Journal, Scientific American

bubble chart option

Choropleths are great, but it does make it hard to see small counties, eg, Cook, Manhattan, San Francisco...
A bubble map,as in
https://blog.mapbox.com/notable-maps-visualizing-covid-19-and-surrounding-impacts-951724cc4bd8
would be a good additional option.

add averages for LISAs (see email thread)

Join Limited Engagement Index from Booth Team

another social distancing proxy

space out dates in histogram x-axis more, so they're more legible

Update viz based on stakeholder feedback

Swap left and right panels
Minimize both panels
Light map?
Less information in the tooltip
Full length panels

Add Search bar

Add search bar so users can enter address or place and have the map zoom accordingly.

Update Zoom Feature

Needs a visible zoom in/out button, otherwise can only zoom in on my laptop (by double-clicking). I have to reload the page to zoom out. [high priority for usability!]

Marked as a priority from health user testing group, as some are having difficulty zooming both in raw atlas, and within the landing page.

Testing locations

We need someone to start reviewing the spreadsheet linked here and create a pilot testing location dataset that Xun could add as a layer to the map. We need to:

copy this info over to new spreadsheet. We could use the "IMPORTRANGE" function in Google Spreadsheets to update automatically.
generate new column -- address that is in formatting for the RCC Geocoder at Uchicago
geocode the data using RCC geocoder
think about process for updating this daily
add updated file to data folder to close issue

Qinyun's reference:

Link for the spreadsheet that includes all testing places (the point (1) I mentioned): https://docs.google.com/spreadsheets/d/1svnaZ2UG_ryFr8jjqVx7ZVZksBue4EQUJ4dolMDJx70/edit#gid=0.

Originally posted by @linqinyu in #3 (comment)

Secure county-level data

Reach out to 1point3acres crowdsourcing group to see if we can get access to daily county-level CSVs.

For Discussion: Unassigned county cases

Several cases in the county-level dataset we'll be getting will be listed as "unassigned." These are usually updated within a day or two to the right county, but it seems that some states are better about this than others. Something to think about and discuss. What will our protocol be?

include neighbors of cluster cores in map app

include neighbors of cluster cores in map app since explaining to people what a cluster core is might be confusing vs just visualizing the whole cluster.

Need Volunteers: State Health Department COVID Case Overview

Calling volunteers! We need help updating a document to identify how, and at what scale, each state health department is recording confirmed cases, testing (ie. negative cases), and deaths. This will be essential to help confirm county-cases on a daily level, and identify what will be automated vs what will require human editors.

https://docs.google.com/spreadsheets/d/1b3ElJC8AnwnYfBupBmoJEZJ8D53YfF719R2AL4ierb0/edit?usp=sharing

For Discussion: Visualization Protocol

Multiple options for daily visualizations:

Mapbox
Rshiny
Leaflet/JS
Py + vis interface

Things to consider:

Should have county-level visualizations enabled, fixed at most recent update
Pop-ups, interaction (click to get more info)
Temporal slider (maybe more important than animation?)
Interaction/linking/brushing? May be nice to have but potentially not priority

data: testing numbers, hospital beds by county

Who can help to crawl this website?

https://covid-19.direct/county/AZ/Maricopa

quarantine policy database generation

How states react with quarantine over time may be of interest as both a visualization, and for future research. We will collect and visualize state-wide data on quarantine policies in the US. This would be updated and validated on a weekly basis.

Need to ID someone who is interested in starting this collection in a google spreadsheet. You can use the the appropriate state name/id from the states.geojson file in the data folder of this repo to make merging easier. Columns would indicate time/dates and we'll need to identify different categories/coding for different types of policies. Please use thread here for discussion of coding strategies for collaborative approval.

Originally posted by @linqinyu in #6 (comment)

User Session for Map w/ Reset as Optional, not Default

Generate a user session for the map, so that the map doesn't re-initialize and reset every time a new selection is made. For example, if a user zooms into a region, using the temporal slider or clicking a different variable forces them out to the whole US again. Then we could use a button to reset as an option, vs making it a default for any task. I think this may be the "buggy" experience some have been noting.

Have some sample code for that here: https://github.com/Makosak/chihealthaccess/blob/master/js/maps_lib.js but would need to be re-tooled, and I'm a bit rusty.

Add New York Times data & Predicted death at the county level

The data are shared by the Berkely group:
NewYorkTimes data will be updated here:
https://github.com/Yu-Group/covid19-severity-prediction/blob/master/data/nytimes.

Join DesCartes Lab Mobility Index data for visualization

Add mobility index dataset for counties (proxy for social distancing, something > nothing), using DesCartes Lab's Mobility index data. Be sure to call this "Mobility Index" and source DesCarte lab (link back to their Github).

https://github.com/descarteslabs/DL-COVID-19

UI consideration: Visualize this as choropleth and clustering as default? Or is there a better way?
Also we will likely get at least 1 more social-distancing-proxy county-level dataset, so thinking about how that is presented?

Note: this could be split into 2 tasks (data merge, JS viz), depends on who claims it!

More Design Requests from Health User Testing Group

Including these as feature requests/enhancements. Xun feel free to take 'em all on or divvy up.

On a 13 inch laptop screen, the right panel is too long and causes doubled scroll-bars. A max-height of 96% does not consider the margins (24px on all sides in this case). One option to fix: use a max-height of calc(96% - 24px) (or greater if you want to include the bottom margin). Calc is very well supported except for some known issues in IE. https://caniuse.com/#search=calc()

Color on the map and on the legend don't match very well, making it hard to compare on the choropleth version in particular. This may be a function of opacity. Some options: add an opacity toggle to switch between full and lower map opacity (if you want to see what's behind the map); add basemap-colored layer underneath the legend and then make the legend less opaque so it matches the map.

The all cases chart y-axis text is cut off when there is a scroll bar on the right panel (and pretty close to the edge otherwise). Recommend using a smaller width SVG for this chart, and/or making it responsive would also help. Also note that the padding on the right panel container is pushing the SVGs to the right. You may want to make the SVGs no wider than the interior width of the panel (currently set at 344px).

Unnecessary bottom scroll bar appears on my laptop screen. I suspect this is from the right-margin on the right control panel, though the over-long SVG might also be problematic here. Option to fix: set the margin-right to 0 and the right property to 24px. This will absolutely position it 24px from the edge if that is the desired outcome.

Recommend adding medium-grey lines across the charts (based on y-axis ticks) to make them easier to get values from. Tooltips on the bar chart in particular would help.

Recommend some indication that the Data menu is actually a dropdown. It looks just like the buttons right now, and I didn't know what it was until I clicked on it.

Recommend clipping the Great Lakes out of the state geojson. It looks slightly unusual to keep them in.

Add Berkeley forecasting data

Data is available here: https://docs.google.com/spreadsheets/d/1ZSG7o4cV-G0Zg3wlgJpB2Zvg-vEN1i_76n2I-djL0Dk/edit#gid=1341003284

Automated data flows

Hello everyone!

I've been diving into the code this week to get a better feel for how data flows from 1P3A and USAFacts all the way to the front end. This is really fantastic work and I think will make a great foundation as we keep iterating on the back end!

I was hoping to kick off a conversation around how we might be able to translate the hourly_update.py script into something that could take on the character of an automated data pipeline. I had a few ideas I wanted to share, and looking forward to getting your input as well @jkoschinsky @Makosak @lixun910 @linqinyu

Broadly speaking, I think it would be helpful to have two types of Python scripts we use to pipe data into the app: "fetchers" and "transforms".

A fetcher would be a simple script that runs on a schedule and dumps data as-is from one of our sources into file storage (we can think of this an an archive). For instance, there could be a fetcher for 1P3A cases that runs on the hour and drops a file called 1P3A_cases_<timestamp>.csv into an Amazon S3 bucket. That bucket would be solely for keeping snapshots of the source data in the original.

When the fetcher is done, it would kick off a transform script that takes the source data, does whatever's needed to get it ready for the Atlas (e.g. aggregates cases by county/state), and puts the result into a separate S3 bucket that's just for derived data products.

In some ways this is quite similar to what's already happening (I see hourly_update.py has distinct steps for fetching and transforming, so I don't think that code would have to change much at all!) But I did want point out a few potential benefits to packaging this process for the cloud, as well as externalizing data from the Git repo:

The S3 bucket that archives our source data could essentially become a warehouse of COVID data by the hour. My assumption is this will be very powerful not just for real-time analysis, but also if any new research interests come up down the line. (It sounds like this may have been a goal of the project from the start, but let me know if I got that wrong or this feels out of scope!)
Storing data in S3 will allow us to capture data at a high temporal resolution and add new sources without having to worry about the size of the Git repo. Also, having the snapshots side-by-side in a file store (vs. tracked in the Git history) should make it easier to iterate over them during any sort of time-series analysis.
Not a super high priority, but it's also nice to be able to "set it and forget it" by automating the flow of data 😄

A few quick notes on error handling:

It shouldn't be hard to set up alerts so if anything goes wrong we would get an email, Slack message, etc.
The unmatched.txt file that logs counties without matches would stay the same—it would just live in S3, but we could still review it manually and add exceptions to the code where they're needed.

As far as implementing this, it shouldn't be hard to adapt hourly_update.py into two AWS Lambda functions (one fetcher, one transform). We can schedule the fetcher to run on the hour with CloudWatch Events. On the front end, the app would just pull the CSV/JSON files it needs from S3 rather than GitHub.

I think that about sums it up—I hope I explained this well enough but please let me know if I can expand on anything! And of course, I don't want to overcomplicate a process that's been working very well, but it sounds like the Atlas may be reaching a scale of data where going to the cloud could have some tangible benefits.

Looking forward to hearing everyone's thoughts on next steps for the back end and continuing the conversation!

Generate summary report twice every week

Generate a report for regional clusters. For each regional cluster, the current plan is to include the following information: a list of counties, # of counties, total population, total confirmed cases, total death counts, the fatality rate, confirmed cases per 1M, death counts per 1M, and a figure for this particular cluster.
The current plan is to update this report every 3 days, twice a week. May change later. The movie/gif will also be updated with this report.
Suggestions and comments are always welcome!

Daily county 1P3A update

We need a volunteer lead to start updating the county case file we have from 1P3A and help determine a more efficient protocol/workflow process. Things to consider:

API seems to be ~2 days behind, but we need to track changes more efficiently.
update with new daily numbers as of 9am CST (or some better time?)
check if total assigned county cases (+ unassigned) equal total confirmed state cases
other parameters to consider?

Once we have a protocol in place we will be able to better take advantage of volunteers and potential RAs to help update this on a regular (daily) basis

Log scale for cases stats graph

UI suggestion: Adding a logarithmic scale option for cases would be helpful in identifying growth trends.

Add baseline-forecast model for counties

Tagging Xun and Pedro; Pedro's code is almost ready to go, and needs to be integrated. Would update daily based on new data.

Later we'll need to identify an easy way for groups to add their models in the format we need, etc.

add CSDS logo and "powered by GeoDa" with 2 URLs to map app

2 URLs:
https://geodacenter.github.io/
https://spatial.uchicago.edu/

2 logo options:

Add hospital data as point layer

Add COVIDCareMap hospitals as a point layer to click/off to help planning for hospital + planner crowd: https://www.covidcaremap.org/maps/us-healthcare-system-capacity/#3.85/38.63/-93.09

We're already using this as data for county levels, but our healthcare "customers" noted that being able to explore hospitals as points with data attached would be super.

Add README info

add README info, including:

the content of each folder (where to download data, etc)
our workflow, collaborators
decisions we made when merging in data

anything else?

Add Native American Reservation Boundary as an optional clickable layer/overlay

Requested by Indian Health Service

Grab county-level shapes for country

since county is focus, make county default map and switch to left and put state on right?

Add navbar above map

Based on stakeholder feedback, so users can access about, methods, blog pages from map.

Add label (hover) to explain LISA legend

Suggesting a few potential enhancements to the map that I believe would increase the clarity of the message:

Outside users of this visualization may not understand "high-high", "high-low", "low-low" and "low-high" labels at first. I found the meaning under methods, but maybe an image/legend can be included under "Covid clusters" to further explain what they mean
[Opinion] the County, confirmed cases per 1M is a really high value add visual, and it might benefit from being the default value for the map.
Round cases per 1M population in the legend (seems to be doing this already in the hover) to make it more readable

Happy to help if I can on this, just point me in the right direction!

set start date for new cases (both charts) to late Feb to focus more on the increase?

data/validation/raw dir needs to exist

On countyV branch, county_validation.py fails if the data/validation/raw directory doesn't exist. I added a file, data/validation/raw/.placeholder (as well as modifying .gitignore) to my fork. I started to submit a pull request but quick when I saw that 45 files would be in the commit, even though I thought I only those two files had been touched.

d3.js helper

I don't realize there are so many volunteers here. Anyone who knows d3.js, you are welcome to improve the d3.js made charts: the line chart and the bar chart. E.g. add mouse over function to the line chart, or add a navigation line to the line chart etc.

The code to create the line char is in index.js in functions:
addTrendLine() and updateTrendLine()

The code to create the bar chart is in index.js in function:
createTimeSlider()

Redesign landing page

LISA map: ring of coldspots around border

(based on conversation with Jamie):

FYI: On March 5, there's a ring of coldspots around the US border. This is probably an artifact of the weights?

Volunteers: please add title, affiliation and URL

Hi everyone who volunteers on this project:

We are acknowledging all the volunteers here:
https://spatial.uchicago.edu/content/us-covid-19-atlas
with this page: https://spatial.uchicago.edu/content/volunteers-csds-covid-10-atlas

I'm missing title, affiliation and URL for these volunteers if you can please add to this ticket:
Sihan Mao
John Steill
Steve Goldstein
Sean Kent
Steven R Wangen
Yuetian Luo
Brian Yangell

Will then update the page.
Thx!

Time Label

The "Show Time Label" option returns very big labels where only the year "2020" is available. I do not know if this is intentional as the data from 1p3a API might still be lagged or if this is a bug? This has been a consistent issue with the browsers I have (Mac, Safari and Opera, and Windows, Opera.)

Empirical Networks

Two successive rough questions:

Can we show that empirical mobility patterns are better predictors of transmission than simple spatial proximity?
Can we measure the response on the empirical network, to understand how that affects transmission --
a. How has the network structure changed (presumably, a LOT).
b. Do current (post-quarantine) networks predict loads better than status-quo networks?
c. Is the observed quarantining linked to reduced transmission in simple models?
d. If certain communities have quarantined more-effectively than others, does this affect their case load?

Negative counts in tooltip

For Weld County, CO (just north of Denver) on 4/26 there was a negative new deaths count:

Fairly certain this is just a glitch in the data but we might want to consider defaulting these values to something like "Error", just to prevent any confusion.

rate units

as is the rate is expressed as count per million population, I believe the Johns Hopkins figures are per 10,000. for smaller counties, a million is probably too large a denominator to be meaningful. should we consider using 10,000?

population data

Does anyone have the latest population data for states and counties?

CSV-county merging

The CSV file we're getting likely just has state name and county name. We will need to merge this with the GIS shapefiles. Note that county names are not unique, but state + county names are unique. Need to identify someone to take this merging issue on as we'll like be merging on a daily basis.

Add CHR socioeconomic data

Add data from the County Health Rankings group with socioeconomic variables

https://trello-attachments.s3.amazonaws.com/5e993b9fe94b2f74c49ab6a3/5ea3399652fa1330e24b51d0/ed0f974ca7afd4151a4173eb745c02e3/CHR_Cleaned.csv