Giter Club home page Giter Club logo

covid's People

Contributors

acrucetta avatar actions-user avatar ariisrael avatar bertday avatar dependabot[bot] avatar erinabbott5 avatar jinfei1125 avatar kenna-camper avatar lchen1733 avatar lixun910 avatar makosak avatar menghamo avatar nofurtherinformation avatar qinyun-lin avatar ryanwyg avatar sihan-mao avatar spaykin avatar steph-yang avatar svijay77 avatar theuscovidatlas avatar vidal-anguiano avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

covid's Issues

Prepare (Secondary) USAFacts Data Stream

Following findings from county validation team, need to switch data sources with fewer merging issues while retaining accuracy and CDC standards. Chats with @linqinyu and @SteveGoldstein coalesced in sanctioning this move to USAFacts, at least until we flip to a validated multi-source dataset down the road. Validation efforts will continue.

Another idea down the road -- include a drop down of data source so we could include multiple that way.

For now -- switch to USAFacts with the easy FIPS merge? @lixun910 interested in this one?

Add map labels above data visualization layer

From EJ: Is there functionality to add the administrative labels above the visualization layer? It gets hard to read and given that you want this oriented toward a national audience they’re not going to know the local names without clicking and it would be better to put above the viz.

Add Search bar

Add search bar so users can enter address or place and have the map zoom accordingly.

Update Zoom Feature

Needs a visible zoom in/out button, otherwise can only zoom in on my laptop (by double-clicking). I have to reload the page to zoom out. [high priority for usability!]

Marked as a priority from health user testing group, as some are having difficulty zooming both in raw atlas, and within the landing page.

Testing locations

We need someone to start reviewing the spreadsheet linked here and create a pilot testing location dataset that Xun could add as a layer to the map. We need to:

  • copy this info over to new spreadsheet. We could use the "IMPORTRANGE" function in Google Spreadsheets to update automatically.
  • generate new column -- address that is in formatting for the RCC Geocoder at Uchicago
  • geocode the data using RCC geocoder
  • think about process for updating this daily
  • add updated file to data folder to close issue

Qinyun's reference:

Link for the spreadsheet that includes all testing places (the point (1) I mentioned): https://docs.google.com/spreadsheets/d/1svnaZ2UG_ryFr8jjqVx7ZVZksBue4EQUJ4dolMDJx70/edit#gid=0.

Originally posted by @linqinyu in #3 (comment)

Secure county-level data

Reach out to 1point3acres crowdsourcing group to see if we can get access to daily county-level CSVs.

For Discussion: Unassigned county cases

Several cases in the county-level dataset we'll be getting will be listed as "unassigned." These are usually updated within a day or two to the right county, but it seems that some states are better about this than others. Something to think about and discuss. What will our protocol be?

For Discussion: Visualization Protocol

Multiple options for daily visualizations:

  • Mapbox
  • Rshiny
  • Leaflet/JS
  • Py + vis interface

Things to consider:

  • Should have county-level visualizations enabled, fixed at most recent update
  • Pop-ups, interaction (click to get more info)
  • Temporal slider (maybe more important than animation?)
  • Interaction/linking/brushing? May be nice to have but potentially not priority

quarantine policy database generation

How states react with quarantine over time may be of interest as both a visualization, and for future research. We will collect and visualize state-wide data on quarantine policies in the US. This would be updated and validated on a weekly basis.

Need to ID someone who is interested in starting this collection in a google spreadsheet. You can use the the appropriate state name/id from the states.geojson file in the data folder of this repo to make merging easier. Columns would indicate time/dates and we'll need to identify different categories/coding for different types of policies. Please use thread here for discussion of coding strategies for collaborative approval.

Originally posted by @linqinyu in #6 (comment)

User Session for Map w/ Reset as Optional, not Default

Generate a user session for the map, so that the map doesn't re-initialize and reset every time a new selection is made. For example, if a user zooms into a region, using the temporal slider or clicking a different variable forces them out to the whole US again. Then we could use a button to reset as an option, vs making it a default for any task. I think this may be the "buggy" experience some have been noting.

Have some sample code for that here: https://github.com/Makosak/chihealthaccess/blob/master/js/maps_lib.js but would need to be re-tooled, and I'm a bit rusty.

Join DesCartes Lab Mobility Index data for visualization

Add mobility index dataset for counties (proxy for social distancing, something > nothing), using DesCartes Lab's Mobility index data. Be sure to call this "Mobility Index" and source DesCarte lab (link back to their Github).

https://github.com/descarteslabs/DL-COVID-19

UI consideration: Visualize this as choropleth and clustering as default? Or is there a better way?
Also we will likely get at least 1 more social-distancing-proxy county-level dataset, so thinking about how that is presented?

Note: this could be split into 2 tasks (data merge, JS viz), depends on who claims it!

More Design Requests from Health User Testing Group

Including these as feature requests/enhancements. Xun feel free to take 'em all on or divvy up.

  • On a 13 inch laptop screen, the right panel is too long and causes doubled scroll-bars. A max-height of 96% does not consider the margins (24px on all sides in this case). One option to fix: use a max-height of calc(96% - 24px) (or greater if you want to include the bottom margin). Calc is very well supported except for some known issues in IE. https://caniuse.com/#search=calc()

  • Color on the map and on the legend don't match very well, making it hard to compare on the choropleth version in particular. This may be a function of opacity. Some options: add an opacity toggle to switch between full and lower map opacity (if you want to see what's behind the map); add basemap-colored layer underneath the legend and then make the legend less opaque so it matches the map.

  • The all cases chart y-axis text is cut off when there is a scroll bar on the right panel (and pretty close to the edge otherwise). Recommend using a smaller width SVG for this chart, and/or making it responsive would also help. Also note that the padding on the right panel container is pushing the SVGs to the right. You may want to make the SVGs no wider than the interior width of the panel (currently set at 344px).

  • Unnecessary bottom scroll bar appears on my laptop screen. I suspect this is from the right-margin on the right control panel, though the over-long SVG might also be problematic here. Option to fix: set the margin-right to 0 and the right property to 24px. This will absolutely position it 24px from the edge if that is the desired outcome.

  • Recommend adding medium-grey lines across the charts (based on y-axis ticks) to make them easier to get values from. Tooltips on the bar chart in particular would help.

  • Recommend some indication that the Data menu is actually a dropdown. It looks just like the buttons right now, and I didn't know what it was until I clicked on it.

  • Recommend clipping the Great Lakes out of the state geojson. It looks slightly unusual to keep them in.

Automated data flows

Hello everyone!

I've been diving into the code this week to get a better feel for how data flows from 1P3A and USAFacts all the way to the front end. This is really fantastic work and I think will make a great foundation as we keep iterating on the back end!

I was hoping to kick off a conversation around how we might be able to translate the hourly_update.py script into something that could take on the character of an automated data pipeline. I had a few ideas I wanted to share, and looking forward to getting your input as well @jkoschinsky @Makosak @lixun910 @linqinyu

Broadly speaking, I think it would be helpful to have two types of Python scripts we use to pipe data into the app: "fetchers" and "transforms".

A fetcher would be a simple script that runs on a schedule and dumps data as-is from one of our sources into file storage (we can think of this an an archive). For instance, there could be a fetcher for 1P3A cases that runs on the hour and drops a file called 1P3A_cases_<timestamp>.csv into an Amazon S3 bucket. That bucket would be solely for keeping snapshots of the source data in the original.

When the fetcher is done, it would kick off a transform script that takes the source data, does whatever's needed to get it ready for the Atlas (e.g. aggregates cases by county/state), and puts the result into a separate S3 bucket that's just for derived data products.

In some ways this is quite similar to what's already happening (I see hourly_update.py has distinct steps for fetching and transforming, so I don't think that code would have to change much at all!) But I did want point out a few potential benefits to packaging this process for the cloud, as well as externalizing data from the Git repo:

  • The S3 bucket that archives our source data could essentially become a warehouse of COVID data by the hour. My assumption is this will be very powerful not just for real-time analysis, but also if any new research interests come up down the line. (It sounds like this may have been a goal of the project from the start, but let me know if I got that wrong or this feels out of scope!)
  • Storing data in S3 will allow us to capture data at a high temporal resolution and add new sources without having to worry about the size of the Git repo. Also, having the snapshots side-by-side in a file store (vs. tracked in the Git history) should make it easier to iterate over them during any sort of time-series analysis.
  • Not a super high priority, but it's also nice to be able to "set it and forget it" by automating the flow of data 😄

A few quick notes on error handling:

  • It shouldn't be hard to set up alerts so if anything goes wrong we would get an email, Slack message, etc.
  • The unmatched.txt file that logs counties without matches would stay the same—it would just live in S3, but we could still review it manually and add exceptions to the code where they're needed.

As far as implementing this, it shouldn't be hard to adapt hourly_update.py into two AWS Lambda functions (one fetcher, one transform). We can schedule the fetcher to run on the hour with CloudWatch Events. On the front end, the app would just pull the CSV/JSON files it needs from S3 rather than GitHub.

I think that about sums it up—I hope I explained this well enough but please let me know if I can expand on anything! And of course, I don't want to overcomplicate a process that's been working very well, but it sounds like the Atlas may be reaching a scale of data where going to the cloud could have some tangible benefits.

Looking forward to hearing everyone's thoughts on next steps for the back end and continuing the conversation!

Generate summary report twice every week

Generate a report for regional clusters. For each regional cluster, the current plan is to include the following information: a list of counties, # of counties, total population, total confirmed cases, total death counts, the fatality rate, confirmed cases per 1M, death counts per 1M, and a figure for this particular cluster.
The current plan is to update this report every 3 days, twice a week. May change later. The movie/gif will also be updated with this report.
Suggestions and comments are always welcome!

Daily county 1P3A update

We need a volunteer lead to start updating the county case file we have from 1P3A and help determine a more efficient protocol/workflow process. Things to consider:

  • API seems to be ~2 days behind, but we need to track changes more efficiently.
  • update with new daily numbers as of 9am CST (or some better time?)
  • check if total assigned county cases (+ unassigned) equal total confirmed state cases
  • other parameters to consider?

Once we have a protocol in place we will be able to better take advantage of volunteers and potential RAs to help update this on a regular (daily) basis

Add baseline-forecast model for counties

Tagging Xun and Pedro; Pedro's code is almost ready to go, and needs to be integrated. Would update daily based on new data.

Later we'll need to identify an easy way for groups to add their models in the format we need, etc.

Add README info

add README info, including:

  • the content of each folder (where to download data, etc)
  • our workflow, collaborators
  • decisions we made when merging in data

anything else?

Add navbar above map

Based on stakeholder feedback, so users can access about, methods, blog pages from map.

Add label (hover) to explain LISA legend

Suggesting a few potential enhancements to the map that I believe would increase the clarity of the message:

  • Outside users of this visualization may not understand "high-high", "high-low", "low-low" and "low-high" labels at first. I found the meaning under methods, but maybe an image/legend can be included under "Covid clusters" to further explain what they mean
  • [Opinion] the County, confirmed cases per 1M is a really high value add visual, and it might benefit from being the default value for the map.
  • Round cases per 1M population in the legend (seems to be doing this already in the hover) to make it more readable

Happy to help if I can on this, just point me in the right direction!

data/validation/raw dir needs to exist

On countyV branch, county_validation.py fails if the data/validation/raw directory doesn't exist. I added a file, data/validation/raw/.placeholder (as well as modifying .gitignore) to my fork. I started to submit a pull request but quick when I saw that 45 files would be in the commit, even though I thought I only those two files had been touched.

d3.js helper

I don't realize there are so many volunteers here. Anyone who knows d3.js, you are welcome to improve the d3.js made charts: the line chart and the bar chart. E.g. add mouse over function to the line chart, or add a navigation line to the line chart etc.

The code to create the line char is in index.js in functions:
addTrendLine() and updateTrendLine()

The code to create the bar chart is in index.js in function:
createTimeSlider()

Time Label

The "Show Time Label" option returns very big labels where only the year "2020" is available. I do not know if this is intentional as the data from 1p3a API might still be lagged or if this is a bug? This has been a consistent issue with the browsers I have (Mac, Safari and Opera, and Windows, Opera.)

Empirical Networks

Two successive rough questions:

  1. Can we show that empirical mobility patterns are better predictors of transmission than simple spatial proximity?
  2. Can we measure the response on the empirical network, to understand how that affects transmission --
    a. How has the network structure changed (presumably, a LOT).
    b. Do current (post-quarantine) networks predict loads better than status-quo networks?
    c. Is the observed quarantining linked to reduced transmission in simple models?
    d. If certain communities have quarantined more-effectively than others, does this affect their case load?

Negative counts in tooltip

For Weld County, CO (just north of Denver) on 4/26 there was a negative new deaths count:

Screen Shot 2020-04-27 at 6 06 48 PM

Fairly certain this is just a glitch in the data but we might want to consider defaulting these values to something like "Error", just to prevent any confusion.

rate units

as is the rate is expressed as count per million population, I believe the Johns Hopkins figures are per 10,000. for smaller counties, a million is probably too large a denominator to be meaningful. should we consider using 10,000?

population data

Does anyone have the latest population data for states and counties?

CSV-county merging

The CSV file we're getting likely just has state name and county name. We will need to merge this with the GIS shapefiles. Note that county names are not unique, but state + county names are unique. Need to identify someone to take this merging issue on as we'll like be merging on a daily basis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.