Giter Club home page Giter Club logo

About me

  • 📚 I like to read about programming languages theory and distributed systems
  • 📺 I enjoy watching weird animes like Serial Experiments Lain and Paranoia Agent
  • 🏋️‍♂️ I work out regularly to stay healthy and strong

Anurag's GitHub stats Top Langs

Massaki 's Projects

blog icon blog

repo that hosts my blog

building-a-data-warehouse-in-aws icon building-a-data-warehouse-in-aws

Building a ETL pipeline that extracts data from AWS S3 and stages them in AWS Redshift and transforms data into a set of dimensional tables, using the star schema architecture.

data-lake-with-spark icon data-lake-with-spark

In this project we will build an ETL pipeline that extracts their data from the data lake hosted on S3, processes them using Spark which will be deployed on an EMR cluster using AWS, and load the data back into S3 as a set of dimensional tables in parquet format.

data-modeling-for-sparkify icon data-modeling-for-sparkify

In this project, I’ve applied what I’ve learned on data modeling with Postgres and build an ETL pipeline using Python. I’ve defined fact and dimension tables for a star schema for a particular analytic focus and written an ETL pipeline that transfers data from files in two local directories into these tables in Postgres using Python and SQL.

data-modeling-with-apache-cassandra icon data-modeling-with-apache-cassandra

In this project, we'll apply the concepts learned in data modeling with Apache Cassandra and complete an ETL pipeline using Python. I will model the data by creating tables in Apache Cassandra to run queries. We are provided with part of the ETL pipeline that transfers data from a set of CSV files within a directory to create a streamlined CSV file to model and insert data into Apache Cassandra tables.

data-pipelines-with-airflow icon data-pipelines-with-airflow

Orhcestrating Data PIpelines with Apache Airflow. We will create custom operators to perform tasks such as staging the data, filling the data warehouse and running checks. The tasks will need to be linked together to achieve a coherent and sensible data flow within the pipeline.

es-benchmark icon es-benchmark

A brief benchmark os using python with elastic search vs pyo3 with elasticsearch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.