massakera Goto Github PK

followers: 8.0 following: 2.0 repos: 58.0 gists: 0.0

Name: Massaki

Type: User

Company: Igual

Bio: hello

Location: Curitiba

About me

📚 I like to read about programming languages theory and distributed systems
📺 I enjoy watching weird animes like Serial Experiments Lain and Paranoia Agent
🏋️‍♂️ I work out regularly to stay healthy and strong

Massaki 's Projects

100-day-kafka

100 days of code challenge, but about Kafka (WIP)

building-a-data-warehouse-in-aws

Building a ETL pipeline that extracts data from AWS S3 and stages them in AWS Redshift and transforms data into a set of dimensional tables, using the star schema architecture.

craking-the-coding-intervirew-in-a-pythonista-style

These are Python solutions for the book Cracking the Coding Interview, 6th Edition by Gayle Laakmann McDowell.

In this project we will build an ETL pipeline that extracts their data from the data lake hosted on S3, processes them using Spark which will be deployed on an EMR cluster using AWS, and load the data back into S3 as a set of dimensional tables in parquet format.

data-modeling-for-sparkify

In this project, I’ve applied what I’ve learned on data modeling with Postgres and build an ETL pipeline using Python. I’ve defined fact and dimension tables for a star schema for a particular analytic focus and written an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL.

data-modeling-with-apache-cassandra

In this project, we'll apply the concepts learned in data modeling with Apache Cassandra and complete an ETL pipeline using Python. I will model the data by creating tables in Apache Cassandra to run queries. We are provided with part of the ETL pipeline that transfers data from a set of CSV files within a directory to create a streamlined CSV file to model and insert data into Apache Cassandra tables.

data-pipelines-with-airflow

Orhcestrating Data PIpelines with Apache Airflow. We will create custom operators to perform tasks such as staging the data, filling the data warehouse and running checks. The tasks will need to be linked together to achieve a coherent and sensible data flow within the pipeline.