Giter Club home page Giter Club logo

etl-houseslima-properati's Introduction

ETL-Houseslima-Properati

This project creates a pipeline that takes data from Properati web page ( Properati is a real estate search site), processes it using lambda functions and, finally, stores it in a redshift database. This pipeline is orchestrated using AWS Step Functions and scheduled with AWS EventBridge. Also, we'll build a FLASK REST API to interact with the database. This Flask App allow us to retrieve data and is hosted in AWS Lightsail container service.

The tools that were used for the project are:

Project's Architecture

project_arch

  1. Extracting data from Properati
  2. The extracted data is validated, cleaned and uploaded to redshift.
  3. A Flask REST API is created for the database so we can interact with the data inside our Data Warehouse.
  4. Users can now analyze the data using any visualization tool they prefer or use the API to develop new solutions.

Project's requirements

These next requirements need to be installed locally for the correct functioning of the solution:

  1. AWS CLI for account configuring and terraform provisioning.
  2. AWS CLI Lighstail plugin for deploying our containers and pushing the docker images to the AWS Lightsail Containers' Repository.
  3. Terraform to provision the infraestructure.
  4. Docker to containerize the Flask REST API App image.

Start Pipeline

For testing, let's go to our root folder and run:

pytest: This will run some tests to make sure the web page works as we want to.

  1. The first test will make sure that we receive the response 200, meaning that the webpage exists and we have access to it.
  2. The second test will make sure that the limit of elements per page is 30.

Now to create the pipeline, terraform will initialize everything that we need. Just clone the repo and execute the next commands inside the terraform folder:

  1. aws configure: This command is used to log in into an AWS Account using your secret access keys.
  2. terraform init: This will initiate terraform in the folder.
  3. terraform apply: This will create our infraestructure. You will be prompt to input a redshift password and user.
  4. (Only run if you want to destroy the infraestructure) terraform destroy: This destroys the created infraestructure.

This pipeline is scheduled hourly, so we can wait 1 hour for the pipeline to run or run our Step Functions' State Machines manually.

Flask REST API

Path Request Type Parameters
/properties GET No parameters required. This request retrieves all the data from our database.
/properties POST id(int), type(str), title(str), bedrooms(int), bathrooms(int), price(int), surface(int), district(str), geo_lon(float), geo_lat(float), place_lon(float), place_lat(float)
/properties/<int:id> GET No parameters required. This request retrieves an specific property from our database by its id.
  • The Flask API URL can be found in the AWS lightsail container service.
  • The path URL/swagger-ui will show the documentation of the Flask API.

etl-houseslima-properati's People

Contributors

sebasmbk avatar

Stargazers

Pierre Mishra avatar Pie Samliam avatar Simon Späti avatar Rustem Saitkulov avatar Thomas Gremm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.