Giter Club home page Giter Club logo

itf-dashboard's Introduction

ITF Dashboard Data Pipeline

ITF Internal Dashboard Refresh

This document has completed governance review per local and agency processes. This material is a draft.

Project Description:

This project is a repository housing R functions and scripts used in the US Centers for Disease Control and Prevention (CDC) COVID-19 Response International Task Force (ITF) COVID-19 Dashboard

As part of the CDC COVID-19 Response, the ITF Situational Awareness & Visualization (SAVI) Team has created and maintains an interal Power BI Dashboard to assist Task Force and response leadership with situational awareness of the global pandemic and response. The dashboard contains analyses of the most updated global case and testing data from multiple sources. The Power BI report that generates the dashboard runs multiple R scripts in order to refresh, process and update the data as CSV files which are then imported into Power BI for visualizations. The R functions in this project are used to read in case and testing data, apply algorithms and populate the underlying data tables of the report. Access to this dashboard is currently limited to CDC staff only.

The ITF has also created several curated Power BI views of global data on the public CDC COVID Data Tracker (https://covid.cdc.gov/covid-data-tracker/#global-counts-rates) to communicate to the general public the types of analyses that CDC is conducting using international data. The code saved to this repository would be used to populate the data underlying those views in a Power BI Dashboard.

Processing Steps

There are 3 steps in the processing pipeline:

  1. Pulling and processing data for all internal dashboards (itf_dashboard/0_output_data.R)
  2. Pulling and processing data for all external dashboards (covid_data_tracker/0_output_data.R)
  3. Writing out data to Azure Data Lake (export_data.R)

The GitHub Actions (GHA) workflow runs the first two at the same time and waits until they're complete to run the last one.

  • Running the pipeline manually, you should ensure that you've run the first two data pull steps before attempting to export.

  • If the data export script is successful, you should be able to see the files in Data Lake with an updated timestamp.

GitHub Actions Workflow

Scheduling

The update process is scheduled for 1935 UTC (1535 EDT) Monday Thru Friday.

Altering the Workflow

By default, the Github Action workflow runs the three R scripts mentioned above, and any changes made to those scripts on the master branch will automatically propagate to the workflow.

  • If you need to add an additional step, or modify the workflow for any reason, the process is defined in: .github/workflows/automated_dashboard_update.yaml.

  • If you want to create a new workflow altogether, you can define a new .yaml script in the .github/workflows folder.

Debugging the Workflow

Occasionally, the GHA workflow will encounter an error during processing.

Manually triggering the GHA Workflow

You might want to trigger the workflow manually if you want data updates quicker than the scheduled refresh.

Manual SOP

This applies to cloning this repository and running the pipeline manually.

Prerequisites

Renv

This project uses {renv} to handle R package dependencies. When you clone the repo, you'll need to run the following to install all dependencies before proceeding:

renv::restore()

Azure Data Lake Credentials

The final write-out process requires an Azure Service Principle to transfer files to the Data Lake location. The GitHub Actions automation has these credentials stored internally, but you can request personal access from the current ITF-SAVI Lead.

  • Credentials are not strictly required to process the data, but you will be unable to write out the data to the Data Lake and update the dashboards without it.

Passing Data Lake Credentials via Environment Var

If you run this locally, you'll need to create a .Renviron file in the root directory.
The .Renviron file contains environmental variables needed to connect and write files to the Data Lake.

The file should be parsed line-by-line and expected to have the following format:

AZURE_DL_PATH=XXXXX/XXXXXX
AZURE_TENANT_ID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
AZURE_APP_ID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
AZURE_APP_SECRET=XXXXXXXXXXXXXXXXXXXXXXXXXXXX-XXXXXXXX

Data sources referenced:

The project uses several publicly-available data sources, including:

The COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, cases and Deaths data sets:

Citation: Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1

The World Health Organization COVID-19 Global data set:

Our World In Data Testing data set (Until 6/23/2022):

Citation: Max Roser, Hannah Ritchie, Esteban Ortiz-Ospina and Joe Hasell (2020) - "Coronavirus Pandemic (COVID-19)". Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/coronavirus' [Online Resource]

FIND Testing data set:

Standardized population data:

Continent classifications:

Public Domain

This repository constitutes a work of the United States Government and is not subject to domestic copyright protection under 17 USC § 105. This repository is inthe public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication. All contributions to this repository will be released under the CC0 dedication. By submitting a pull request you are agreeing to comply with this waiver of copyright interest.

License

The repository utilizes code licensed under the terms of the Apache Software License and therefore is licensed under ASL v2 or later.

This source code in this repository is free: you can redistribute it and/or modify it under the terms of the Apache Software License version 2, or (at your option) any later version.

This source code in this repository is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Apache Software License for more details.

You should have received a copy of the Apache Software License along with this program. If not, see http://www.apache.org/licenses/LICENSE-2.0.html

The source code forked from other open source projects will inherit its license.

Privacy

This repository contains only non-sensitive, publicly available data and information. All material and community participation is covered by the Surveillance Platform Disclaimer and Code of Conduct. For more information about CDC's privacy policy, please visit http://www.cdc.gov/privacy.html.

Contributing

Anyone is encouraged to contribute to the repository by forking and submitting a pull request. (If you are new to GitHub, you might start with a basic tutorial.) By contributing to this project, you grant a world-wide, royalty-free, perpetual, irrevocable, non-exclusive, transferable license to all users under the terms of the Apache Software License v2 or later.

All comments, messages, pull requests, and other submissions received through CDC including this GitHub page are subject to the Presidential Records Act and may be archived. Learn more at http://www.cdc.gov/other/privacy.html.

Records

This repository is not a source of government records, but is a copy to increase collaboration and collaborative potential. All government records will be published through the CDC web site.

Notices

Please refer to CDC's Template Repository for more information about contributing to this repository, public domain notices and disclaimers, and code of conduct.

itf-dashboard's People

Contributors

beansrowning avatar kimkimroll avatar jamesfuller-cdc avatar boris-ning-usds avatar

Stargazers

 avatar

Watchers

Barton Day avatar  avatar

itf-dashboard's Issues

Update Testing source to FIND for Country/Area Summary Page

Relevant code

owid_test_source = "https://covid.ourworldindata.org/data/owid-covid-data.csv"
testing1<-data.table::fread(owid_test_source, data.table = F, showProgress = F, verbose = F) %>%
select(iso_code,date,positive_rate,new_tests,total_tests,new_tests_smoothed_per_thousand,new_tests_per_thousand,tests_per_case) %>%
mutate(iso_code = recode(iso_code, "OWID_KOS" = "XKX")) %>%
filter(!grepl("OWID", iso_code))
data.table::fwrite(testing1, paste0(output.dir, "owid_testing.csv"), na="", row.names=FALSE)

TODO

  • Use SaviR to pull FIND testing data
  • Write out to data lake as a new file and test with PBIX file
  • Update dashboard accordingly and merge into master

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.