Giter Club home page Giter Club logo

spark-kafka-kstreams's Introduction

Beyond Streaming

Integration of Kafka, KStreams, Spark, Schema Registry, Kafka-UI and many more

Target Audience

The main focus of this repo is to help everyone who are looking to learn KAFKA with Kstreams and Spark Streaming. We take away all the abstraction of setting up the resources and running different services. And we have provided with the use cases and examples which can be run individually and as well as a group.

This repo comprises multiple sections.

  1. Kafka
    • KStreams
    • KsqlDB
    • Kafka Connect
    • Kafka Schema Registry
    • Kafka REST
  2. Spark Streaming
    • Connect KStreams with Spark Streaming
    • Connect Spark Streaming with KStreams with Avro Schema
  3. KAFKA-UI to monitor manage Kafka easily

Architecture

Image Here

Setting up the Docker

To help you use these projects we will be bringing up below set of containers.

Sub Projects Details
Zoo-Keeper Zookeeper is used by Kafka brokers to determine which broker is the leader of a given partition and topic and perform leader elections
Broker A Kafka cluster is a group of multiple Kafka brokers.
Schema Registry Schema Registry provides a centralized repository for managing and validating schemas for topic message data, and for serialization and deserilazation of the data over the network
Rest Proxy The Confluent REST Proxy provides a RESTful interface to an Apache Kafka® cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients.
Spark Master Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
Spark worker-1
Spark worker-2
Kafka UI UI for Apache Kafka is a free, open-source web UI to monitor and manage Apache Kafka clusters.
KsqlDB Server ksqlDB is an event streaming database for Apache Kafka. It is distributed, scalable, reliable, and real-time.
KsqlDB CLI ksqlDB CLI to one ksqlDB server per cluster.
Datagen Generating data
Kafka Connect Kafka Connect is a free, open-source component of Apache Kafka® that serves as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems
MINIO inIO is a high-performance, S3 compatible object store. It is built for large scale AI/ML, data lake and database workloads. It runs on-prem and on any cloud (public or private) and from the data center to the edge.

Setting up properties for running the exercise code

For easy setup we have the fully ready docker-compose.yaml file which will bring up the necessary containers.

We have all the default configurations defined in a file:

src/main/resources/streams.properties

Running the exercises

Run the below script to start the containers

sh start_containers.sh

1. Kafka Kstreams

This section is cloned from learn-kafka-courses And we will only be working on submodule [kafka-streams](./gradlew runStreams -Pargs=)

We have extended the Above repo to have our specific requirements and added few extra modules.

2. Spark Streaming

We have multiple example and use cases with Spark Streaming.

Sub Projects
spark-python
spark-scala

3. KStreams with Schema Registry

Can be viewed from KAFKA UI. More details to follow.

4. Kafka REST API

List of Commands

5. Connect KStreams with Spark Streaming

We have multiple example and use cases with Spark Streaming.

Sub Projects
spark-scala-enrich-topic
spark-scala-functions

6. Connect Spark Streaming with KStreams

Sub Projects
spark-scala-enrich-topic
spark-scala-functions

7. KAFKA-UI

This project is to help Kafka developers to manage there clusters with UI. Using KAFKA-UI you can take advantage of all the maintenance and monitoring capabilities of Kafka using a click of a button.

spark-kafka-kstreams's People

Contributors

ajithshetty avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.