Giter Club home page Giter Club logo

kafka-kubernetes's Introduction

Apache Kafka on Kubernetes

This project enables deployment of a Kafka 0.10.0.1 cluster using Kubernetes with Zookeeper 3.4.6. The code borrows heavily from Sylvain Hellegouarch's post. Sylvain's writeup is quite good. However, there are a couple of hacks listed below in Known Issues

To launch the Kafka cluster, let's assume you have a working Kubernetes cluster and the kubectl CLI tool in your path. First, create a kafka-cluster namespace on Kubernetes and set it as the current context.

$ kubectl create -f namespace-kafka.yaml
kubectl config set-context kafka --namespace=kafka-cluster --cluster=${CLUSTER_NAME} --user=${USER_NAME}
kubectl config use-context kafka

Given that Kafka depends on Zookeeper for reasons, we need to deploy Zookeeper before we create the Kafka cluster. Following Sylvain's post, we launch three distinct deployments to specify the Zookeeper ID of each instance and to list all brokers in the pool. As of now, we are unable to use a single deployment with three Zookeeper instances. Even worse, we need to use three distinct services as well.

$ kubectl create -f zookeeper-services.yaml
$ kubectl create -f zookeeper-cluster.yaml

After the Zookeeper cluster is launched, check that all three deployments are Running.

$ kubectl get pods
zookeeper-deployment-1-dbauf   1/1       Running   0          2h
zookeeper-deployment-2-mp6nb   1/1       Running   0          2h
zookeeper-deployment-3-26ere   1/1       Running   0          2h

One of the deployments should be LEADING, while the other two should be FOLLOWERS. To check that, look at the pod logs.

$ kubectl logs zookeeper-deployment-1-dbauf
...
2016-10-06 14:04:05,904 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@358] - LEADING - LEADER ELECTION TOOK - 2613

Next, let's create the Kafka service behind a load balancer.

$ kubectl create -f kafka-service.yaml

At this point, we need to set the KAFKA_ADVERTISED_HOST_NAME in kafka-cluster.yaml before deploy the Kafka brokers. You can use either a custom DNS name or one generated by your cloud provider. To see the DNS name (on AWS at least), type:

$ kubectl describe service kafka-service
Name:              kafka-service
...
LoadBalancer Ingress:      xxxxxx.us-east-1.elb.amazonaws.com

After you copy/pasta the entry next to LoadBalancer Ingress to KAFKA_ADVERTISED_HOST_NAME in kafka-cluster.yaml, we are finally ready to create the Kafka cluster:

$ kubectl create -f kafka-cluster.yaml

If you wish to create a topic at deployment time, add a key KAFKA_CREATE_TOPICS to the environment variables with the topic name. For instance, the following environment variable creates the topic ramhiser with 2 partitions and 1 replica:

env:
- name: KAFKA_CREATE_TOPICS
value: ramhiser:2:1

Congrats! You have a working Kafka cluster running on Kubernetes. Next, a useful way to test your setup is via kafkacat. Once installed, you can pipe a local log file to Kafka using the Kafka hostname from above:

cat /var/log/system.log | kafkacat -b xxxxxx.us-east-1.elb.amazonaws.com:9092 -t ramhiser

And then to consume the same log, type:

kafkacat -b xxxxxx.us-east-1.elb.amazonaws.com:9092 -t ramhiser

Known Issues

  • AWS ELB must be hardcoded in kafka-cluster.yaml
  • Zookeeper instances do not use a single replication controller
  • Scaling the number of instances via Kubernetes does not automatically replicate data to new brokers

kafka-kubernetes's People

Contributors

adrianmkng avatar ramhiser avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

kafka-kubernetes's Issues

Error connecting to node

Hi,
I've cloned your repo and changed DNS with kubernetes-master, pods are running fine but kafka is generating the errors, that are given below
[2021-03-30 08:46:32,929] WARN [Controller id=1001, targetBrokerId=1001] Error connecting to node kubernetes-master:9092 (id: 1001 rack: null) (org.apache.kafka.clients.NetworkClient) java.net.UnknownHostException: kubernetes-master at java.net.InetAddress.getAllByName0(InetAddress.java:1281) at java.net.InetAddress.getAllByName(InetAddress.java:1193) at java.net.InetAddress.getAllByName(InetAddress.java:1127) at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:110) at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:501) at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:458) at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:169) at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:1002) at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:311) at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:65) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:292) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:246) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
Can you please help me out ??

Update Replication Controllers to Deployments

In both Zookeeper and Kafka, I'm running replication controllers instead of deployments.

Why the update? From Kubernetes docs...

Replica Set is the next-generation Replication Controller.
...
we recommend using Deployments instead of directly using Replica Sets, unless you require custom update orchestration or don’t require updates at all.

Remove LoadBalancer from Kafka cluster

Currently, a DNS name must be hardcoded in kafka-cluster.yaml. Currently, I'm getting the DNS name from kafka-service after its deployed via kubectl get service kafka-service.

By removing the load balancer, there's no concern of exposing to the outside world (unless that's warranted by use case) and there's no need to hardcode the DNS host.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.