Giter Club home page Giter Club logo

covid-19_us_county-level_summaries's Introduction

County-level Socioeconomic Data for Predictive Modeling of Epidemiological Effects

County-level Number of Intensive Care Unit Beds

TL/DR: We gather a machine readable dataset related to socioeconomic factors that may affect the spread and/or consequences of epidemiological outbreaks, particularly the novel coronavirus (COVID-19). This dataset is envisioned to serve the data science, machine learning, and epidemiological modeling communities. If you want to contribute, please let us know!

Overview: Despite overoptimistic promises of an “American Resurrection” by Easter Sunday, many scientists and citizens fear that current mitigation strategies are likely insufficient to avert the collapse of the US healthcare system. Confirmed COVID-19 cases, hospitalizations, and - unfortunately - deaths are rapidly increasing; implementing an aggressive suppression strategy - “The Hammer” - seems to be the only viable option to buy time. How can we make best use of the time these measures buy?

The machine learning community should actively engage in these discussions and contribute possible solutions to actionable problems. One interesting direction could be to identify the effect that different mitigation and suppression strategies have in terms of benefits and costs. “Benefits” in this case would correspond to reductions in the effective reproduction number R, potential lives saved and long-term socio-economic benefits, while “costs” could reflect the resulting burden on the healthcare system,short-term economic consequences and possible long-term economic restructuring.

Many of the recent epidemiological predictions and analyses are performed for the US as a whole. However, identifying relationships between “benefits” and “costs” will likely require a much higher granularity of analysis. This is because highly localized contextual factors, such as population density, demographics or primary means of transportation, will affect critical parameters for computational epidemiological modeling, including the effective reproduction number R.

To facilitate research on such questions, we present a machine readable dataset that aggregates relevant data from around 10 governmental and academic sources on the county-level. In addition to county-level time-series data from the JHU CSSE COVID-19 Dashboard, our dataset contains more than 300 variables that summarize population estimates, demographics, ethnicity, housing, education, employment and income, climate, transit scores, and healthcare system-related metrics. A detailed description of all variables can be found here.

Structure

We accumulated statistics from different sources on a county level granularity.

  • ./data folder contains aggregated machine-readable file counties.csv with demographic, socioeconomic, health care, and education data for each county in the 50 states and Washington DC. Data is organized by FIPS codes - unambiguous identifiers for each county, since the same county name may appear in many states.
  • ./raw_data contains raw datasets that were used to create data folder
  • ./model under construction.
  • ./scripts - scripts for making the raw_data machine-readable

Instructions for Adding Data

Please create a new directory in ./raw_data with a sensible name based on the type of data you are adding.

Other County-level Efforts

Citation

If you find our dataset or code useful, please consider citing our paper:

@article{killeenCountylevelDatasetInforming2020,
  title = {A {{County}}-Level {{Dataset}} for {{Informing}} the {{United States}}' {{Response}} to {{COVID}}-19},
  author = {Killeen, Benjamin D. and Wu, Jie Ying and Shah, Kinjal and Zapaishchykova, Anna and Nikutta, Philipp and Tamhane, Aniruddha and Chakraborty, Shreya and Wei, Jinchi and Gao, Tiger and Thies, Mareike and Unberath, Mathias},
  year = {2020},
  month = apr,
  archivePrefix = {arXiv},
  eprint = {1909.11730},
  eprinttype = {arxiv}
}

Acknowledgements

This dataset is the result of a herculean effort by a group of students and faculty at Johns Hopkins University. Special thanks goes to Jie Ying Wu, Benjamin Killeen, Kinjal Shah, Anna Zapaishchykova, Philipp Nikutta, Aniruddha Tamhane, Shreya Chakraborty, Jinchi Wei, Tiger Gao, and Mareike Thies.

Additionally, we would like to thank our sources, which can be found in the data README.

covid-19_us_county-level_summaries's People

Contributors

anir16293 avatar benjamindkilleen avatar jieyingwu avatar jinchiwei avatar kinjmshah avatar mareikethies avatar mathiasunberath avatar nikhildave4 avatar philippnikutta avatar shreyachak15 avatar zapaishchykova avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.