Giter Club home page Giter Club logo

event-based-processing's Introduction

event-based-processing

Event based ingestion and processing service to handle batch or streaming data.

Prerequisites

Python 3.11 or below. Python 3.12 removed some setup utilities required by flink packages.

The repo uses Flink 1.17. The latest version Flink supports is Java 11, so configure your maven build to target it.

Running in docker-compose

Contains the following services

  • flink-taskmanager
  • flink-jobmanager
  • kafka-test-producer
  • debezium
  • kafka-ui
  • kafka
  • pgadmin
  • postgres
  • zookeeper

UI and dashboard links

Setup

Build and run services

Run docker-compose build && docker-compose up.

Add Debezium connector

Add Debezium connector from ./data/add-connector.sh (you can just copy-paste the contents in your terminal).

Create a table in Postgres

Open Postgres UI the password is postgres.

Create a new connection to server postgres.

Open and create the table and data from ./data/create-assessment-table.sql.

Create a bucket in Minio

cd data
python3 -m venv .venv-data
source .venv-data/bin/activate
pip install -r requirements.txt
python make_minio_bucket.py

Types of ingestion

Batch processing using Flink and Kafka

[TBD]

Stream processing using Flink and Kafka

Run Kafka consumer from kafka-consumer-iceberg folder.

Open Postgres UI, go to real-estate database, assessments table.

Start making changes to the assessments table. Watch Debezium (CDC) piping the data into Kafka.

Submit a Test Flink Job

Package your jar.

mvn package

Use Flink UI to upload the jar. Specify the entry point class, for example com.example.KafkaConsumerIceberg.

Add Flink Jar

Run the job. Monitor flink-jobmanager container logs for errors.

Running in k8s

Follow the instructions in k8s/README.md

Cleanup and Troubleshooting

Sometimes you need to delete docker volumes with the containers:

docker-compose down -v

This will clear all data stored and settings, so you'll have to restart from scratch.

If that doesn't help you may have to remove docker data or reset your docker desktop.

Resources

Data Engineering Resources.pdf

EventBased Processing.pdf

event-based-processing's People

Contributors

rickzee avatar robertheiberger avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.