Giter Club home page Giter Club logo

dend-datawarehouse's Introduction

Data Warehouse on AWS with Redshift

The purpose of this project is to build an adapted data model thanks to python to load data in a S3 bucket and wrangle them into a star schema (see the ERD).

Prerequisite

  1. Install Python 3.x.

  2. This project is build with conda instead of pip. Install anaconda or modify the script to make use of pip.

  3. You need also to have a AWS Redshift cluster up and running (4 to 8 nodes suggested)

Main Goal

The compagny Sparkify need to analyses theirs data to better know the way users (free/paid) use theirs services. With this data model we will be able to ask question like When? Who? Where? and What? about the data. The task is to build an ETL Pipeline that extract data from a S3, stagging it in Redshift to be able to transform the data into a Star Schema (Dimensional and Fact Tables) to let the Analytics Team to find insights easily.

Data Model

Song ERD

This data model is called a start schema data model. At it's aim is a Fact Table -songplays- that containg fact on song play like user agent, location, session or user's level and then have columns of foreign keys (FK) of 4 dimension tables :

  • Songs table with data about songs
  • Artists table
  • Users table
  • Time table

This model enable search with the minimum SQL JOIN possible and enable fast read queries.

Run it

Few steps

  1. Launch create_tables.py to prepare the database
  2. Run etl.py to wrangle the data

dend-datawarehouse's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

saudalmajed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.