Kafka benchmarking

Scripts to run a performance test against a Kafka cluster.
Uses the kafka-producer-perf-test/kafka-consumer-perf-test scripts, which are included in Kafka deplyoments.
In the OpenSource Kafka tgz, these scripts have a suffix .sh, whereas in the Confluent distribution they don't....just be aware of that ;)

benchmark-producer.sh : execute one producer benchmark run with a dedicated set of properties
benchmark-consumer.sh : execute one consumer benchmark run with a dedicated set of properties
benchmark-suite-producer.sh : wrapper around benchmark-producer.sh to execute multiple benchmark runs with varying property settings

The output of the benchmark execution will be stored within a .txt file in the same directory as the benchmark-*.sh scripts are, if parameter --output-to-file is specified only. Repeating benchmark executions with the same properties will append the output to existing output file.

Prerequisites

a running Kafka cluster
a host which has the Kafka client tools installed (scripts kafka-topics, kafka-producer-perf-test, kafka-consumer-perf-test, ...)
tools readlink and tee installed
ensure that your Kafka cluster has enough free space to maintain the data which is being created during the benchmark run(s)

Single benchmark execution

Producer benchmark

NOTE

the scripts for running producer benchmark(s) you'll find in directory producer/scripts

A single benchmark execution for a Producer can be executed via calling benchmark-producer.sh directly, providing commandline parameters. You can set environment variables pointing to the executables for kafka-topics-command as well as for kafka-producer-perf-test-command as shown below.

Environment variables

variable	description	example
KAFKA_TOPICS_CMD	how to execute the kafka-topics command (full path or relative)	/opt/kafka/bin/kafka-topics.sh
KAFKA_BENCHMARK_CMD	how to execute the kafka-producer-perf-test command (full path or relative)	/opt/kafka/bin/kafka-producer-perf-test-sh

Parameters are:

parameter	description	default
-p \| --partitions <number>	where <number> is an int, telling how many partitions the benchmark topic shall have. Only required if you want the script manage the topic for the benchmark	2
-r \| --replicas <number>	where <number> is an int, telling how many replicas the benchmark topic shall have. OOnly required if you want the script manage the topic for the benchmark	2
--num-records <number>	where <number> specifies how many messages shall be created during the benchmark.	100000
--record-size <number>	where <number> specifies how big (in bytes) each record shall be.	1024
--producer-props <string>	list of additional properties for the benchmark execution, like e.g. `acks`, `linger.ms`, ...	'acks=1 compression.type=none'
--bootstrap-servers <string>	comma separated list of <host>:<port> of your Kafka brokers. This property is mandatory
--throughput <string>	specifies the throughput to use during the benchmark run	-1
--topic <string>	specifies the topic to use for the benchmark execution. This topic must exist before you execute this script.
--producer-config <config-file>	config-file to provide additional attributes to connect to broker(s), mainly SSL & authentication

Output

The script generates 2 output files
- .txt: one file per execution, this file includes all messages printed during the performance test execution
- .csv: one file per day, this file includes the pure metrics, comma separated, and commented the start-/finish-time as well as the parameters for the test execution

NOTE

Topic management for the execution:
If you don't have a pre-existing topic and you don't want to manage the topic by yourself, just omit property --topic. By that the script will create a topic before the benchmark run, and delete it afterwards.
If you already have a topic you want to use for the benchmark execution, then provide its name via --topic

Usage examples

run benchmark with minimal parameters, use the existing topic bench-topic on local Kafka broker with port 9091:
```
./benchmark-producer.sh --bootstrap-servers localhost:9091 --topic bench-topic
```
run benchmark with minimal parameters, let the script manage the topic on local Kafka broker with port 9091:
```
./benchmark-producer.sh --bootstrap-servers localhost:9091 --partitions 5 --replicas 2
```

tuning for Throughput:

./benchmark-producer.sh --bootstrap-servers localhost:9091 --partitions 3 --replicas 2 --record-size 1024 --num-records 200000 --producer-props 'acks=1 compression.type=lz4 batch.size=100000 linger.ms=50'

tuning for Latency:

./benchmark-producer.sh --bootstrap-servers localhost:9091 --partitions 3 --replicas 2 --record-size 1024 --num-records 200000 --producer-props 'acks=1 compression.type=lz4 batch.size=10000 linger.ms=0'

tuning for Durability

./benchmark-producer.sh --bootstrap-servers localhost:9091 --partitions 3 --replicas 2 --record-size 1024 --num-records 200000 --producer-props 'acks=1 compression.type=lz4 batch.size=10000 linger.ms=0'

run benchmark with minimal parameters, let the script manage the topic on local Kafka broker with port 9092 and provide producer.config including SASL_PLAINTEXT info:

./benchmark-producer.sh --bootstrap-servers localhost:9091 --partitions 5 --replicas 2 --producer-config ./sample-producer-sasl.config

where content of sample-producer-sasl.config for a PLAINTEXT SASL auth (user + password) can be:

security.protocol=SASL_PLAINTEXT
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="admin" password="admin-secret";

Consumer benchmark

NOTE

the scripts for running consumer benchmark(s) you'll find in directory consumer/scripts

A single benchmark execution for a Consumer can be executed via calling benchmark-consumer.sh directly, providing commandline parameters.
Parameters are:

parameter	description	default
--topic <string>	specifies the topic to use for the benchmark execution. This property is mandatory
--bootstrap-server <string>	comma separated list of <host>:<port> of your Kafka brokers.	localhost:9091
--messages <number>	where <number> specifies how many messages shall be consumed during the benchmark.	10000
--fetch-max-wait-ms <number>	specifies the timeout the server waits to collect fetch-min-bytes to return to the client (official doc)	500
--fetch-min-bytes <number>	The minimum amount of data the server should return for a fetch request. Value of "1" means do not wait and send data as soon as there is some (official doc)	1
--fetch-size	The amount of data to fetch in a single request	1048576
--enable-auto-commit <number>	If true the consumer's offset will be periodically committed in the background	true
--isolation-level <string>	specifies how transactional message are being read (official doc)	read_uncommitted
--consumer-config <config-file>	config-file to provide additional attributes to connect to broker(s), mainly SSL & authentication
--group-id <string>	the ConsumerGroup id for the consumer. Important if you have ACLs enabled, to specify permissions for this group as well	consumer-benchmark
--verbose	if specified, additional text output will be printed to the terminal

Usage examples

run consumer benchmark with minimal arguments, using kafka broker on localhost:9091 and topicname my-benchmark-topic:

./benchmark-consumer.sh --topic my-benchmark-topic
providing an config file containing e.g. properties for SASL authentication (as shown in sasl-properties.config):

./benchmark-consumer.sh --topic my-benchmark-topic --consumer-config ./sasl-properties.config
same as above, but with some more output on the commandline:

./benchmark-consumer.sh --topic my-benchmark-topic --consumer-config ./sasl-properties.config --verbose
tuning for Throughput:

./benchmark-consumer.sh --topic my-benchmark-topic --fetch-min-bytes 100000
tuning for Latency:

./benchmark-consumer.sh --topic my-benchmark-topic --fetch-size 25000 --fetch-max-wait-ms 100

Batch of benchmark executions

Script benchmark-suite-producer.sh is just a wrapper around benchmark-producer.sh to run a variety of performance test runs against your Kafka cluster. It loops over the properties you want to change between test runs and calls benchmark-producer.sh once for each single combination of properties.
The properties, which are possible to iterate over, you'll find within benchmark-suite-producer.sh, section variables.

The only mandatory parameter is: --bootstrap-servers , the bootstrap server(s) to connect to as comma separated list --bootstrap-servers <host>:<port>

Parameters to adjust for your UseCase, in benchmark-suite-producer.sh, are:

Parameter	Description	Example
PARTITIONS	space separated list of number of partitions for the benchmark topic	PARTITIONS="2 10"
REPLICAS	space separated list of number of replicas for the benchmark topic	REPLICAS="2"
NUM_RECORDS	space separated list of number of records to produce	NUM_RECORDS="100000"
RECORD_SIZES	space separated list of record sizes	RECORD_SIZES="1024 10240"
THROUGHPUT	space separated list of desired throughput limits, "-1" means: no limit, full speed	THROUGHPUT="-1"
ACKS	space separated list of values for the producer property "acks", valid values "0 1 -1"	ACKS="0 -1"
COMPRESSION	compression type to use, e.g. "none","lz4",...	COMPRESSION="none"
LINGER_MS	space separated list of desired values for "linger.ms" kafka property	LINGER_MS="0"
BATCH_SIZE	space separated list of desired values for "batch.size" kafka property, "0": disable batching	BATCH_SIZE="10000"

Docker

to be able to run the benchmarks within a container, you can use the provided Dockerfile to create such a container.
Alternatively you'll find prebuilt docker container on Docker Hub:

Producer Image: gkoenig/kafka-producer-benchmark
Consumer Image: gkoenig/kafka-consumer-benchmark

NOTE

you have to mount a local directory to the container path /tmp/output so that you can save the output files from the benchmark run on your local workstation.

ensure that the host directory (which you mount into the container) has permissions, so that the container can write file(s) to it

Producer benchmark usage examples

To be able to store the output files from the benchmark run on your local workstation/laptop, create a local dir and mount it into the container. Otherwise you won't be able to access the files after the benchmark run is finished.

minimal example

the following example shows a scenario with minimal parameters. It will run a producer benchmark, connecting to Kafka (unauthenticated) on port 9091 on IP 000.111.222.333 you obviously have to provide you IP address of your Kafka broker here !!!

mkdir ./output
chmod 777 ./output
docker run  -v ./output/:/tmp/output gkoenig/kafka-producer-benchmark:0.1 --bootstrap-servers 000.111.222.333:9091

with authentication

if you have a kafka cluster with authentication, then you can provide additional parameters as you would do on a plain execution of benchmark-producer.sh script, e.g. if your brokers listen on port 9092 for SASL_PLAINTEXT authentication, then specify the port and provide additional producer-config as below (you obviously have to provide you IP address of your Kafka broker here !!!):

git clone [email protected]:gkoenig/kafka-benchmarking.git
cd producer/scripts
chmod 777 ./output
docker run  -v ./output/:/tmp/output gkoenig/kafka-producer-benchmark:0.1 --bootstrap-servers 000.111.222.333:9092 --producer-config sample-producer-sasl.config

providing more parameters

You can provide the same list of parameters as if you execute the producer-benchmark outside of Docker containers, means plain on the terminal. Detailled description of parameters can be found here

git clone [email protected]:gkoenig/kafka-benchmarking.git
cd scripts
chmod 777 ./output
docker run  -v ./output/:/tmp/output gkoenig/kafka-producer-benchmark:0.1 --bootstrap-servers 000.111.222.333:9092 --producer-config sample-producer-sasl.config --num-records 2000000 --compression lz4

Kubernetes

Description

To run the producer benchmark container within K8s cluster, you need to ensure the following prereq's:

you need to have the possibility to access a persistent volume (to store the output of the benchmark run), mounted on /tmp/output within the container
!! only if you have to specify SASL properties to connect to your Kafka brokers to be able to pass the SASL properties, you need to create a ConfigMap from the file sample-producer-sasl.config (or whichever file you created including your SASL configuration to talk to the Kafka brokers) and use that ConfigMap as a volume in the container spec. Ensure that the mountpoint of this ConfigMap corresponds with the args property specifying the --producer-config parameter (see example below). If you can connect to your brokers unauthenticated, you do not need this configmap, and you have to delete the corresponding volumeMount in the Job yaml as well.

ConfigMap

# assuming you are in the root folder of the Github repo you cloned
cd producer
kubectl create configmap kafka-sasl --from-file=scripts/sample-producer-sasl.config

Job specification

To execute the benchmark it is sufficient to create a Kubernetes Job, that

mounts the persistent volume to store the output
mounts the configmap, including the SASL config !! of course only, if you have to specify SASL properties to connect to your Kafka brokers

Ensure that you replace the IP (or hostname) and port of your Kafka broker inside the producer-benchmark-job.yaml. The placeholder is set to 111.222.333.444:9092

# assuming you are in the root folder of the Github repo you cloned
# if the job already exists, you first have to delete it, before it is created again
cd producer
# kubectl delete -f ./producer-benchmark-job.yaml
kubectl apply -f ./producer-benchmark-job.yaml

rritsoft / kafka-benchmarking Goto Github PK