Giter Club home page Giter Club logo

data-engineering-challenge's Introduction

data-engineering-challenge

Challenge: Wrangle the raw data into a useable format and and design basic data infrastructure for a Global Health project

Context

As a Data Engineer on a Global Health Project, looking at vaccine supply and demand forecasting in the country Verdania, you are investigating setting up a data pipeline to pull data from the country's health data system, wrangle it and insert into your own database.

You have been given a sample data file from querying the API endpoint regional_vaccine_supply for the health data system which tracks vaccine supply to health facilities at the regional level. The data received is in json format and is nested.

To perform efficient analytics, you need to flatten the JSON structure into a tabular format. The goal is to create a flat table that can be easily loaded into a database and additionally, the Machine Learning team would like CSVs of the data to start the modelling process. There is also information about each vaccine that needs to be extracted into a separate CSV file.

Challenge steps

The following is a guideline of the minimal steps to follow:

  1. Flatten the JSON Structure:

Write a Python program or script to flatten the nested JSON structure for vaccine_supply into a flat table. Each row should represent a unique combination of the data. Save this as a CSV file.

  1. Create a CSV for Vaccine Information:

Extract information about each vaccine from the "vaccine_information" section and save it as a separate CSV file named "vaccine_information.csv".

  1. Design a Relational Database Model:

Design a relational database model to store the data. Use SQL to create the necessary tables and define relationships.

  1. Documentation:

Provide a brief explanation of your code, including any assumptions or design choices made during the flattening process and database model design.

  1. Bonus - Version control:

Submit your challenge using a version control system, such as Github, to demonstrate use of versioning.

Submission

Please use this Google Form for submission using either one of the two options to do so:

  1. A URL to the online repository using a version control system, such as Github.
  2. Upload a zipped folder containing your code, CSV files and documentation.

data-engineering-challenge's People

Contributors

meganbeckett avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.