Giter Club home page Giter Club logo

kafka_demo's Introduction

Kafka Knowledge Base

This KB is intended to introduce the reader to the fundamental concepts of Kafka. Included in the Repo are a few demos on running kafka clusters as well as references pulled from various usefull articles found on the web.

Table of contents

Demos

Linux local server demo

Docker + Kafkajs (Node) demo

Docker multi network demo (Coming Soon)

References

Articles

Confluent: How to choose the number of partitions in a kafka cluster

StackOverflow : How to create a new consumer group in kafka

StackOverflow : Kafka server configuration - listeners vs. advertised.listeners

Confuent : Kafka Listeners – Explained

Conduktor: Producer Default Partitioner & Sticky Partitioner

Confluent: Top 5 Things Every Apache Kafka Developer Should Know

Books

O'Reilly: Kafka, The Definitive Guide - Neha Narkhede, Gwen Shapira & Todd Palino

Official Documentation

Apache Foundation: Kafka Documentation

Apache Foundation: Kafka Protocole Guide

Docker : Networking overview

Key Concepts / Glossary

  • BackPressure :
  • Big Data Ingestion :
  • Brokers : Brokers are kafka’s servers. They receive from producers and hand out data to consumers.
  • Cluster Controllers : A designated broker in a kafka cluster that reads/writes metadata to/from Zookeper and is responsible for assigning leadership across the brokers for each partition.
  • Clusters : A collection of kafka brokers.
  • Consumer Groups : A collection of consumers that are assigned to reading the same topic, they are assigned the same group-id. Kafka ensures that each partition of a topic is consumed by only one member of the group, a message is only read once by a single member of a consumer group.

    This process allows parallel consumption of a topic, enabling higher throughput, however the maximum benefit is limited to the number of partitions on a topic. You can have more consumers than partitions on a topics (N+1) where a consumer is idle to implement a hot failover (in case a consumer fails another is ready to take its place)
  • Disk Based Retention :
  • ETL / CDC :
  • Events : An event is a fact that happened in the past, they are representation of something that
  • Fault Tolerance : happened in a system as an array of bytes.
  • Java Message System :
  • Leaders :
  • Offsets: sequential identifiers that are unique to each record in the partition. When a new record is written to the end of the log file (partition )it is assigned a new offset that is incremented. They are useful for consumers who need to know where to pick up from the last time they read from the partition. Messages within the partition are always ordered but there is no guarantee of ordering ACROSS partitions
  • Partition : Partitions of topics hold a subset of records owned by a topic. Each partition is single log file that holds records in an append only fashion
    .
    A rough formula for determining the number of partitions needed is max(t/p ,t/c) where t is target throughput, p is producer throughput (on a single partition) and c is consumer throughput (on a single partition). One has to decide on a trade off between speed and reliability when it come to the number of partitions on a broker as in the case of failure, having too many partitions on a broker will slow down the process of electing a new partition leader (which are the only ones that can be written to from the producers) Partition keys: The key that indicates which partition a message will be written to. When producers send a message to a topic they use information from the application context to derive a key, using a hash function, this could be a userID or a deviceID. If the input that decides the partition key is not well distributed (1 clientID generating a disproportionate amount of records), a solution would be to let kafka decide the partition assignment using round robin.

    Using hashed keys may be necessary if you need messages coming from the same user/device to be consumer in the order that they were produced.
  • Redundancy : Each partition has replicas across a cluster’s broker which are continuously updating and “catching” to the partition leader. This allows for failover to the follower partitions in case the leader suffers a crash.
  • Replication :
  • Retention Policy :
  • Scalability : Kafka distributes partitions across its cluster’s brokers which leads to higher throughput, a topic can be read by multiple consumers concurrently because they don’t all have to read from the same partition.
  • Streams : Streams are related events that flow in sequence.
  • Topics : Topics group related events and store them. They are similar to tables in databases and folders in file systems. In * Kafka they are used to decouple the producers and consumers of data. Producers will "POST" while consumers will "GET" from a topic without having to be directly connected between themselves, they reach out to the kafka brokers (specifically the leaders of a topic) for such ends

kafka_demo's People

Contributors

karimwitani avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.