Giter Club home page Giter Club logo

hello-kafka-streams's Introduction

This is an equivalent of hello-samza project which transforms wikipedia irc channel events into wikipdia-raw topic and does some basic stateful aggregation of stats over this record stream. It uses only Apache Kafka components, namely Kafka Connect and Kafka Streams.

You can find a walk-through tutorial for this demo and more background about the APIs used in this post at Confluent's blog.

Quick Start

Prerequisites

The demo uses Java 1.8 features so you'll need the correct jdk to run it.

Some of the features of Kafka used in this demo are only available since the 0.10.x release.

Setup local environment

The master branch of this demo uses 0.10.x features of Apache Kafka so all you need to do is clone and install kafka 0.10.0.0 into your local maven (you'll need to have gradle command already installed):

$ git clone https://github.com/apache/kafka.git $KAFKA_HOME
$ cd $KAFKA_HOME
$ git checkout 0.10.0.0
$ gradle
$ ./gradlew -PscalaVersion=2.11 jar 
$
$ # IF YOU ARE USING WINDOWS, USE `.bat` IN PLACE OF `.sh` FOR THE LAUNCH SCRIPTS BELLOW:
$ export SCALA_VERSION="2.11.8"; export SCALA_BINARY_VERSION="2.11";
$ ./bin/zookeeper-server-start.sh ./config/zookeeper.properties &
$ ./bin/kafka-server-start.sh ./config/server.properties

Initialize topics: if you're already running a local Zookeeper and Kafka and you have topic auto-create enabled on the broker you can skip the following setup, just note that if your default partitions number is 1 you will only be able to run a single instance demo.

$ # IF YOU ARE USING WINDOWS, USE `.bat` IN PLACE OF `.sh` FOR THE LAUNCH SCRIPTS BELLOW:
$ ./bin/kafka-topics.sh --zookeeper localhost --create --topic wikipedia-raw --replication-factor 1 --partitions 4
$ ./bin/kafka-topics.sh --zookeeper localhost --create --topic wikipedia-parsed --replication-factor 1 --partitions 4
$ ./bin/kafka-topics.sh --zookeeper localhost --create --topic wikipedia-streams-wikipedia-edits-by-user-changelog \
                        --replication-factor 1 --partitions 4

Build and start the executable demo app

In another terminal cd into the directory where you have cloned the hello-kafka-streams project and use provided gradle wrapper to create executable jar.

$ ./gradlew build

Use the following command to start an instance of the integrated demo topology.

./build/scripts/hello-kafka-streams          # hello-kafka-streams.bat IF YOU ARE USING WINDOWS

If you have created topics more than 1 partition you can run the command above again in another terminal to see how the re-balance mechanism will scale both the embedded connect workers as well as the streams processors.

You should see changelog for the each user's cummulative number of edits being printed into the standard out.

Digging in

For inspecting the underlying intermediate topics, in yet another terminal cd into kafka home directory and run some console consumer commands:

$ cd $KAFKA_HOME
$ # IF YOU ARE USING WINDOWS, USE `.bat` IN PLACE OF `.sh` FOR THE LAUNCH SCRIPTS BELOW:
$ ./bin/kafka-console-consumer.sh --zookeeper localhost --topic wikipedia-raw
$ ./bin/kafka-console-consumer.sh --zookeeper localhost --topic wikipedia-parsed

See WikipediaStreamDemo.java for more details about the actual Topology.

hello-kafka-streams's People

Contributors

michal-harish avatar ewencp avatar mmerdes avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.