Giter Club home page Giter Club logo

nyc-zillow-listings-with-apache-airflow-aws's Introduction

NYC Zillow Listings with Apache Airflow & AWS

Author: Pharoah Evelyn

Overview

This repository outlines the curation of a data pipeline that pulls data from a Zillow API into AWS using Apache Airflow.

Once inside an S3 bucket, we employ Lambda functions to run our transformations and utilize QuickSight for data visualization.

Business Problem

A real estate agency needs its latest real estate updates for its clientele. They specifically want to market the best listings and highlight which neighborhoods have the highest listings based on most amenities and other details.

Data Preparation

I Utilized Rapid API to employ a web scraper on Zillow for listing within the NYC area.

The API is what's responsible for retrieving our sample data. I did this in JSON format for demonstration purposes, but CSV was also possible.

Methods Used

I incorporated this API within an Airflow DAG, which stored the data pulled locally onto my EC2 server and then copied the data onto an S3 bucket.

I configured Lambda functions to trigger based on S3 PutObject activities:

  • Function #1 reacts to the raw S3 bucket, transforms that data into Parquet format, and places it into a second transformed S3 bucket
  • Function #2 reacts to the transformed S3 bucket, triggering a Glue Crawler to crawl the bucket and catalog the data from all files for visualization.

Data Visuralization

Courtesy of Amazon Quicksight

Ways to improve this project

We can go entirely serverless; instead of using Airflow on an EC2 instance, we can utilize Amazon Managed Workflows for Apache Airflow - a serverless solution for operating Airflow within the cloud.

Furthermore, we could also use Airflow to orchestrate different AWS services within DAGs for this scenario if we wanted to, instead of using Lambda functions.

Lastly, we could build a web scraper that grabs data from all web pages from the Zillow search, totaling up to roughly 20,000 records.

nyc-zillow-listings-with-apache-airflow-aws's People

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.