scd_in_warehouse's Introduction

SCD_in_Warehouse

Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Complete Overview

Execution Flow

SCD 1

This method overwrites old with new data, and therefore does not track historical data.

SCD 2

This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert.
In this Project we have used flag method

Copy all new record from the source which is not present in the target, copy all updated records from the source to the temp table, copy all not updated records from source to temp ( set all the flag as true)
Copy all records from target (which are updated in the source record) set flag as false, Copy all the record which is not present in the source-target set the flag as true
Finally after step 1 & 2 override the customer_temp to the store.customer(target)

SCD 4

SCD type 4 provides a solution to handle the rapid changes in the dimension tables. The concept lies in creating a junk dimension or a small dimension table with all the possible values of the rapid growing attributes of the dimension.

The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of some or all changes. Both the surrogate keys are referenced in the fact table to enhance query performance.

Reference Link1
Reference Link2

Manual Triggering

Link

Airflow Output

Recommend Projects

subbu4696 / scd_in_warehouse Goto Github PK

scd_in_warehouse's Introduction

SCD_in_Warehouse

Understand the various types of SCDs and implement these slowly changing dimesnsion in Hadoop Hive and Spark.

Complete Overview

Execution Flow

SCD 1

SCD 2

SCD 4

Manual Triggering

Airflow Output

scd_in_warehouse's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent