Giter Club home page Giter Club logo

airbyte-airflow-api-intergration's Introduction

airbyte-airflow-simple-scraper

Overview

This is a guide to setup an Airbyte Api connection that is orchestrated by Airflow.

  • Airflow orchestration Dag example

    Dag Example

Setup

Step 1:

  • Create an AWS EC2 instance as follows

    • Name : airbyte
    • AWS Machine Image : Amazon Linux 2 Kernel 5.10 / 64 bit (x86)
    • Instance type : t2.large 8g Memory
    • Key pair (login) : Create a key pair to SSH login from your host
    • Network settings : Create security group & ALLOW SSH traffic from your host ip address
    • Configure storage : 15 GB gp2

Step 2:

  • Edit security group inbound rules and expose the following ports

    • 8000 → for Airbyte webapp
    • 8080 → for airflow webapp
    • 5555 → for flower-monitoring

Step 3:

  • SSH Log in to your AWS instance & Run the following commands to install docker & clone the github repo

    sudo yum -y update && \
    sudo yum install -y git && \
    sudo yum install -y docker && \
    sudo usermod -a -G docker $USER && \
    git clone https://github.com/ahmedfarag9/airbyte-airflow-simple-scraper.git && \
    cd airbyte-airflow-simple-scraper && \
    newgrp docker
  • Prepare the environment

    bash install.sh
  • Build airflow image

    make build-airflow-image
  • Start Airflow/Airbyte stack

    make start-airflow-airbyte-stack
  • Now your setup is ready!

  • The current terminal session will be used for logging stream

Step 4:

  • Access Airbyte at http://localhost:8000 and set up a connection

    Set up a Source

    Airbyte Source

    Set up a Destination

    Airbyte Destination

    Set up a Connection

    Airbyte Connection

    Go to the connections page and choose your connection then copy connection id from page url

    Airbyte Connection Id

Step 5:

  • Open a new terminal session & SSH Log in to your AWS instance & Run the following commands to set the API connection id

    cd airbyte-airflow-simple-scraper/ && \
    bash auto_connection.sh
  • Then Enter the copied Airbyte connection ID when prompted

Step 6:

  • Access Airflow at http://localhost:8080 & enter the credentials

    • USERNAME=airflow

    • PASSWORD=airflow

    Airflow Webapp

  • Activate dag to triger Airbyte to fetch Api data

    Airflow Dag Trigger

  • Then you will notice task is executed successfully in airflow task diagram

    Airflow Task Diagram

  • And data sync is triggered successfully in Airbyte UI

    Airbyte Data Sync

  • Finally you can Access Airflow flower for data monitoring at http://localhost:5555

    Airflow Flower Monitoring


Commands

  • Stop then start Airflow/Airbyte stack again

    • Hit control C to stop the stack then execute the following command to start it once again

      make start-airflow-airbyte-stack
  • Uninstall Airflow/Airbyte stack

    make uninstall-airflow-airbyte-stack
  • Remove containers and restart stack

    make restart-airflow-airbyte-stack
  • Purge then clean install everything

    make purge-then-clean-install
  • Refer to the Makefile for more info


Architecture

  • Airbyte --> data integration engine

    • UI: An easy-to-use graphical interface for interacting with the Airbyte API. (runs on port 8000)

    • Server: Handles connection between UI and API. (runs on port 800)

    • Scheduler: The scheduler takes work requests from the API and sends them to the Temporal service to parallelize.

    • Worker: The worker connects to a source connector, pulls the data and writes it to a destination.

  • Airflow --> workflow management platform

    • Airflow consists of several components:

    • Postgres Database for Metadata --> Contains information about the status of tasks, DAGs, Variables, connections, etc.

    • Scheduler --> Reads from the Metadata database and is responsible for adding the necessary tasks to the queue

    • Executor --> Works closely with the Scheduler to determine what resources are needed to complete the tasks as they are queued

    • Web server --> HTTP Server provides access to DAG/task status information

  • Postgres --> Metadata Database

  • Celery Executor --> The Remote Executor to scale out the number of workers.

  • Redis --> Used as a message broker by delivering messages to the celery workers.

  • Flower --> Celery Monitoring Tool (runs on port 5555)

airbyte-airflow-api-intergration's People

Contributors

ahmedfarag9 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.