confluentinc / cp-ansible Goto Github PK

View Code? Open in Web Editor NEW

475.0 244.0 409.0 12.03 MB

Ansible playbooks for the Confluent Platform

License: Apache License 2.0

Python 30.82% Shell 7.88% Jinja 57.64% Groovy 3.66%

ansible confluent playbook kafka kafka-connect ksql rest-proxy schema-registry

cp-ansible's Introduction

CP-Ansible

Introduction

Ansible provides a simple way to deploy, manage, and configure the Confluent Platform services. This repository provides playbooks and templates to easily spin up a Confluent Platform installation. Specifically this repository:

Installs Confluent Platform packages or archive.
Starts services using systemd scripts.
Provides configuration options for many security options including encryption, authentication, and authorization.

The services that can be installed from this repository are:

ZooKeeper
Kafka
Schema Registry
REST Proxy
Confluent Control Center
Kafka Connect (distributed mode)
KSQL Server
Replicator

Documentation

You can find the documentation for running CP-Ansible at https://docs.confluent.io/current/installation/cp-ansible/index.html.

You can find supported configuration variables in VARIABLES.md

Contributing

If you would like to contribute to the CP-Ansible project, please refer to the CONTRIBUTE.md

License

Apache 2.0

cp-ansible's People

Contributors

Stargazers

Watchers

Forkers

coughman goryszewskig mcqueary hedinfaok cjmatta framiere amsmite shoe116 mohamedaittaleb johnroach aukkwat ehabqadah onecricketeer ravindranathakila sajfarahani mdeangelo272 dloriot dcamach0 damolaakinleye josephlim75 michalklempa gwenshap jcustenborder ajurcenk prince2015999 bbc dcallao kkshebby corneliupopescu adamhutson kpmeen pravatvmware gmic tejashreeaher nickko04 kpweiler omkreddy frodeanonsen marksakurada barney-spencer r1manepalli alabaojo aar6ncai madhuraarun adolfoeliazat hdulay skhtor pvr1 chuck-confluent vishalgpt19 matt-mangia wooparadog soenkeliebau domenicbove quyetnguyen0989 kishoreku nbuesing mlahouar armohamm avocader vishal-dev ekbomp jvalderrama bigdatapassionpl rl-frankiejay vklohiya danphanley rayokota revivalliu cryptoe nklmish benjigoldberg acmck zzac1 bohblue2 caalberts kshirsagarg tstuber datumgeek gugalnikov efbar sirius-social dariachica yifeili3 chandan-rapido c204632 aglahe vigneshsivar santhoshsurapuram chetan1221 bkarski jrdm thompson42 stan-is-hate pliakas cotedm ambyi jeffhuang26 jernster ironforgev

cp-ansible's Issues

Unable to override a single config key

In the process of making a Vagrantfile, and while I was starting up a single broker, I wanted to limit the log retention and decrease the replicator factors, so I tried this

brokers = ["cp-kafka"]
replication_factor = [3, brokers.length].min
ansible.groups = {
  "all:vars" => {"security_mode" => "plaintext"},
  "zookeeper" => ["cp-zk"],
  "broker" => ["cp-kafka"],
  "broker:vars" => {"kafka" => "{'broker': {'config': {"\
    "'log.retention.hours': 48,"\
    "'offsets.topic.replication.factor': #{replication_factor},"\
    "'transaction.state.log.replication.factor': #{replication_factor}"\
    "}}"
},

However, upon running, I got the error that stopped at the user-group creation because kafka.broker was now overridden with only having a config dictionary. It doesn't have the kafka.broker.group anymore.

I have a workaround that is this, but wanted to know if there was an alternative before patching.

Prefix with kafka_ and replace . with _

"broker:vars" => {
    "kafka_log_retention_hours" => 48
    ...

Update the defaults.yml

kafka_log_retention_hours: 168
kafka_offsets_topic_replication_factor: 3
kafka_transaction_state_log_replication_factor: 3

kafka:
  broker:
    ...
    config:
      ...
      log.retention.hours: "{{kafka_log_retention_hours}}"
      ...

Same error might happen if setting any single property under kafka => broker because it seems the group variables aren't merged

When creating broker datadir directories, you cannot specify multiple datadirs

If I override the datadir to be /dir01,/dir02,/dir03, the parameter logs.dir is correct in server.properties - but it also tries to create the directories with a mkdir command which fails because it takes it as "mkdir /dir01,/dir02,/dir03" instead of 3 separate mkdir commands "mkdir /dir01; mkdir /dir02; mkdir /dir03"

Install community edition?

Is it possible to use your playbook to install component which are only in the community edition?

I'd like to use the broker, schema registry and rest proxy components only.

Thanks

Add documentation to setup `hosts.yml`

with a link to http://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html#hosts-and-groups

Namespace of Variable {{broker.id}} In Kafka Broker server properties file

In the template file for the kafka broker properties, we use the variable {{kafka.broker.id}} referring to the inventory file hosts.yml. But the {{kafka}} variable is defined in the role's defaults.

This will still run correctly because of Ansible's hierarchy of variable precedence, as in the value of {{kafka.broker.id}} will first be taken from the inventory_host dictionary, but for clarity it would be better not to mix namespaces.

I think we should use {{ inventory_hostname.kafka.broker.id }} to make this clear.

Setting file descriptors allowed per process

Given that Kafka should have maximum file descriptors opened set to high values (https://docs.confluent.io/current/kafka/deployment.html#file-descriptors-and-mmap), the Ansible roles should enable user to set the limits for nofile.

According to https://unix.stackexchange.com/a/345596 these limits may be set using systemd override file, which is already present in these roles:

[Service]
{% for key, value in kafka.connect.distributed.environment.items() %}
Environment="{{key}}={{value}}"
{% endfor %}

The format for NOFILE should be:

[Service]
{% for key, value in kafka.connect.distributed.environment.items() %}
Environment="{{key}}={{value}}"
{% endfor %}
LimitNOFILE=100000

The repo confluent.packages.io seems broken

I have installed the platform and the repo packages worked fine,now I am trying another installation and I constantly found a Failed Connection for Red Hat distribution

This system is receiving updates from RHN Classic or Red Hat Satellite.
https://packages.confluent.io/rpm/5.1/repodata/repomd.xml: [Errno 14] curl#7 - "Failed connect to packages.confluent.io:443; Operation now in progress"
Trying other mirror.
https://packages.confluent.io/rpm/5.1/repodata/repomd.xml: [Errno 14] curl#7 - "Failed connect to packages.confluent.io:443; Operation now in progress"
Trying other mirror.
https://packages.confluent.io/rpm/5.1/repodata/repomd.xml: [Errno 14] curl#7 - "Failed connect to packages.confluent.io:443; Operation now in progress"

stdout/stderr for the exact error", "rc": 1}

Hi All,

I tried to install the Kafka confluent in redhat 7.4 but I am getting below error .

ansible-playbook -i ./ansible/cp-ansible-5.1.x/hosts.yml ./ansible/cp-ansible-5.1.x/all.yml

PLAY [preflight] ***************************************************************

TASK [Gathering Facts] *********************************************************
$ ssh-agent -k
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 13307 killed;
[ssh-agent] Stopped.
fatal: [127.58.7.166]: FAILED! => {"changed": false, "module_stderr": "Shared connection to 127.58.7.166 closed.\r\n", "module_stdout": "sudo: a password is required\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error", "rc": 1}

Please guide me to fx this issue .

Regards,
Deepak

Move ssl passwords into variables

Why is the password hardcoded in configs and scripts? This should be changeable between environments...

kafka-consumer-groups with SASL_SSL

I try to edit the offset of certain consumer groups with the kafka-consumer-group tool
This is how I run the program: KAFKA_HEAP_OPTS="-Xmx2G" kafka-consumer-groups --bootstrap-server 172.31.31.132:9092 --list --command-config client.properties
Output:

[2018-11-08 13:51:23,607] ERROR [NetworkClient clientId=admin-1] Connection to node -1 failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient)
Error: Executing consumer group command failed due to SSL handshake failed

Client.properties content:

sasl.mechanism=PLAIN
# Configure SASL_SSL if SSL encryption is enabled, otherwise configure SASL_PLAINTEXT
security.protocol=SASL_SSL
ssl.ca.location=/var/ssl/private/snakeoil-ca-1.crt
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \
  username="client" \
  password="client-secret";

This is from our test deployment using the provided ansible playbook for deployment, hence the default values.
How should I run the kafka-consumer-group with SASL_SSL enabled?

how to deploy use SASL_PLAINTEXT protcol environment

Kerberos support

Hi
Please could you add support for kerberos? Why "Setting up a Kerberos KDC or integration with Active Directory " is out of Scope?

SASL_PLAINTEXT is not enough. Via ansible variables we can just give our own keytab, principal, path and similar. This should be better for automation, testing and debugging across all the nodes in the cluster
Source:
https://docs.confluent.io/current/tutorials/cp-ansible/docs/index.html

Thank you very much, cheers
-Rudolf

Ubuntu Installation steps missing

Team - Please suggest is it only Redhat variants OS are supported by these Ansible scripts as I couldn't find the installation steps for Ubuntu/Debian specific platforms.

confluent repositories out of sync with version tags

Hi there. On version tag "v5.1.0", the confluent.common role installs 5.0 repos instead of 5.1.

I would fix this with a pull request, but I'm not sure which branch you would want it on.

Add Connectors using confluent-hub

Add a module to run confluent-hub-install {{ item }} with a list of Confluent Hub items.

kafka:
  connect:
    distributed:
      confluent_hub_packages:
        - debezium/debezium-connector-mysql:latest

https://docs.ansible.com/ansible/2.6/dev_guide/developing_modules_general.html

Make HEAP and JMX OPTS configurable

Hello,
This project is awesome for people getting started to roll out a Kafka cluster into production.
One issue though is trying to set KAFKA_HEAP_OPTS and JMX_OPTS into broker configurations.
Tried to add
environment:
KAFKA_HEAP_OPTS: '-Xms4g -Xmx4g .... '
JMX_PORT: 9999
KAFKA_JMX_OPTS: '-.. '
into the Broker configuration role, but it doesn't seem to be taken into account.
So ended up do it manually. Any pointer on this.

Would also be nice to have a restart and uninstall role.

Sorry for rambling, thanks again for your help.

Add a restart methodology

We should have a way to restart the services via a play. This is potentially quite complex because rolling restarting Kafka brokers and ZooKeeper servers is non-trivial.

Breakout playbooks

all.yml actually has several functions in it, which is great for setting up a POC. We should make some more targeted playbooks to handle cases after the initial setup (e.g. update a config and restart services).

The roles are quite not really usable like this

It would require a lot of effort for using these roles.
It would be better to have a single role with some subroles.

eg: cp-ansible/tasks/zookeeper/main.yml
kafka-broker/main.yml

Along with something like confluent_role: ["zookeeper", "kafka-broker"] that will include the correct tasks

Confluent cli

I installed confluent using this ansible project and I have problems with confluent cli tool:

[root@test1 ~]# confluent status
This CLI is intended for development only, not for production
https://docs.confluent.io/current/cli/index.html

control-center is [DOWN]
ksql-server is [DOWN]
connect is [DOWN]
kafka-rest is [DOWN]
schema-registry is [DOWN]
kafka is [DOWN]
zookeeper is [DOWN]

[root@test1 ~]# confluent start
This CLI is intended for development only, not for production
https://docs.confluent.io/current/cli/index.html

Using CONFLUENT_CURRENT: /tmp/confluent.XUwsaQe7
Starting zookeeper
|Zookeeper failed to start
zookeeper is [DOWN]
Cannot start Kafka, Zookeeper is not running. Check your deployment

but zookeper, kafka are running and working

[root@test1 ~]# systemctl |grep conflu
● confluent-kafka-connect.service                                                     loaded failed failed    Apache Kafka Connect - distributed
  confluent-kafka-rest.service                                                        loaded active running   A REST proxy for Apache Kafka
  confluent-kafka.service                                                             loaded active running   Apache Kafka - broker
  confluent-schema-registry.service                                                   loaded active running   RESTful Avro schema registry for Apache Kafka
  confluent-zookeeper.service                                                         loaded active running   Apache Kafka - ZooKeeper

[root@test1 ~]# kafka-topics --zookeeper localhost:2181 --list
OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N
__confluent.support.metrics
__consumer_offsets
_confluent-command
_confluent-controlcenter-5-0-0-1-AlertHistoryStore-changelog
_confluent-controlcenter-5-0-0-1-Group-ONE_MINUTE-changelog
_confluent-controlcenter-5-0-0-1-Group-THREE_HOURS-changelog
...
_confluent-metrics
_confluent-monitoring
_schemas
users

I'm new to confluent and in confluent docs, there are a lot of examples with confluent cli which I can't use (like adding new connector)
How can I make confluent cli works with this ansible project?

TASK [confluent.ssl_CA : copy ssl related files to local ansible host]

This fails as the /var/ssl/private does not exist on the target node. I manually created the directories and the task passed.

Add uninstall methodology

We should have a way to uninstall the various services. This can be useful for POC or dev environments that are short lived that need to be upgraded without regard for existing data.

TASK [confluent.common : Install the Confluent Platform]

Getting this error:
fatal: [server1.westus2.cloudapp.azure.com]: FAILED! => {"changed": false, "msg": "No package matching 'confluent-platform-2.11' found available, installed or updated", "rc": 126, "results": ["No package matching 'confluent-platform-2.11' found available, installed or updated"]}

Documentation Shortcomings

Hi,
The following needs to be updated:

Confluent Platform 4.1 or higher
Ansible 2.5.x or higher (on control node)
Confluent Platform Ansible playbooks
passwordless ssh between all hosts
sudo access for ssh user for all hosts
Redhat Enterprise or Centos 7 only

The current play list supports RHEL and Centos NOT Debian or Ubuntu which I discovered after creating 7 Ubuntu VMs. I note that this is listed as a future enhancement but that not would not be apparent to those who read the Blog post and then the Documentation.

properties should be sorted during generation

I'm seeing services restarted when I run more than once. Looking at my configs they are getting generated in the hash order which changes each iteration.

Loop is deprecated

fatal: [broker-1.example.com]: FAILED! => {"msg": "Unexpected failure in finding the lookup named '{{kafka.broker.datadir}}' in the available lookup plugins"}

Fixed it by using with_items

Add HTTPS to services

We have some services that have HTTP front ends. Those services should be secured with HTTPS in the SSL enabled configurations.

zookeeper-server-start[21650]: java.io.IOException: Unable to create data directory

Sep 29 09:35:47 confluent_node1 zookeeper-server-start[21650]: java.io.IOException: Unable to create data directory /usr/local/data/zookeeper/version-2

Feedback 2 questions：

1.Missing task to create a directory

2.Missing installation jdk devel package 。such as jps tools

Systemd ulimits

In Confluent Slack

Normally, to changes your number of open files you have to modify your kernel params (fs.file-nr) and also your /etc/security/limits.conf. With systemd, there is another step. You must modify the *.service file as well for the daemons to pick up the new ulimit params. This post shows the directives that apply. Once I added LimitNOFILE to the delivered /lib/systemd/system/confluent-\*.service file then /proc/<pid>/limits showed the java daemon recognizing the open file limit. There probably should be either a documentation note or a change to the delivered systemd file since that delivered file will now need to be managed on every upgrade

https://confluentcommunity.slack.com/archives/C49R61XMM/p1537208485000100

Refer - https://unix.stackexchange.com/questions/345595/how-to-set-ulimits-on-service-with-systemd

confluent-control-center sends ERR_INVALID_HTTP_RESPONSE

Hi there.

I used these ansible playbooks to set up my confluent platform with 7 EC2 provisioned in AWS.

3 EC2 for [broker + zookeepers]
1 EC2 for [Schema registry]
1 EC2 for [Control center]
1 EC2 for [Kafka Connect, Kafka Rest proxy]
1 EC2 for [KSQL]

Currently, the Kafka and Zookeepers are accessible from my local machine, which is fine. However when trying to open the confluent control center, [Exact address is http://ec2-34-243-126-24.eu-west-1.compute.amazonaws.com:9021/], it sends back a ERR_INVALID_HTTP_RESPONSE.

I logged into the machine of the control center and issued the following commands:
systemctl status confluent-control-center gave me:

● confluent-control-center.service - Confluent Control Center
   Loaded: loaded (/lib/systemd/system/confluent-control-center.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/confluent-control-center.service.d
           └─override.conf
   Active: active (running) since Fri 2019-02-08 16:00:03 UTC; 4min 36s ago
     Docs: http://docs.confluent.io/
 Main PID: 29496 (java)
    Tasks: 79
   Memory: 515.6M
      CPU: 1min 1.796s
   CGroup: /system.slice/confluent-control-center.service
           └─29496 java -cp /usr/share/java/confluent-control-center/*:/usr/share/java/monitoring-interceptors/*:/usr/share/java/rest-utils/*:/usr/share/java/confluent-common/*: -Xmx6g -server -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -D

Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: Feb 08, 2019 4:00:56 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: WARNING: A provider io.confluent.controlcenter.rest.ClusterResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluen
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: Feb 08, 2019 4:00:56 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: WARNING: A provider io.confluent.controlcenter.rest.StatusResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: Feb 08, 2019 4:00:56 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: WARNING: A provider io.confluent.controlcenter.rest.MetricsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluen
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: Feb 08, 2019 4:00:56 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: WARNING: A provider io.confluent.controlcenter.rest.AuthResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluent.c
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: Feb 08, 2019 4:00:56 PM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
Feb 08 16:00:56 ip-172-31-23-230 control-center-start[29496]: WARNING: A provider io.confluent.controlcenter.rest.CommandResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider io.confluen
root@ip-172-31-23-230:/var/log/confluent/control-center#

# curl localhost:9021 gave me a single P.

Additional Information

####### Zookeeper 1 + Broker 1
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1287/sshd       
tcp6       0      0 :::9092                 :::*                    LISTEN      29767/java      
tcp6       0      0 :::2181                 :::*                    LISTEN      29047/java      
tcp6       0      0 172.31.15.21:3888       :::*                    LISTEN      29047/java      
tcp6       0      0 :::44438                :::*                    LISTEN      29767/java      
tcp6       0      0 :::22                   :::*                    LISTEN      1287/sshd       
tcp6       0      0 :::33335                :::*                    LISTEN      29047/java      

####### Zookeeper 2 + Broker 2
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1300/sshd       
tcp6       0      0 :::46173                :::*                    LISTEN      26737/java      
tcp6       0      0 :::9092                 :::*                    LISTEN      27472/java      
tcp6       0      0 :::2181                 :::*                    LISTEN      26737/java      
tcp6       0      0 172.31.11.9:2888        :::*                    LISTEN      26737/java      
tcp6       0      0 172.31.11.9:3888        :::*                    LISTEN      26737/java      
tcp6       0      0 :::22                   :::*                    LISTEN      1300/sshd       
tcp6       0      0 :::41561                :::*                    LISTEN      27472/java      

####### Confluent Controll Center
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1262/sshd       
tcp6       0      0 :::9021                 :::*                    LISTEN      29496/java      
tcp6       0      0 :::41120                :::*                    LISTEN      29496/java      
tcp6       0      0 :::22                   :::*                    LISTEN      1262/sshd       

####### Zookeeper 3 + Broker 3
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1297/sshd       
tcp6       0      0 :::9092                 :::*                    LISTEN      27430/java      
tcp6       0      0 :::2181                 :::*                    LISTEN      26686/java      
tcp6       0      0 :::44133                :::*                    LISTEN      26686/java      
tcp6       0      0 :::38921                :::*                    LISTEN      27430/java      
tcp6       0      0 172.31.0.127:3888       :::*                    LISTEN      26686/java      
tcp6       0      0 :::22                   :::*                    LISTEN      1297/sshd       

####### Rest Proxy + Kafka Connect
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1261/sshd       
tcp6       0      0 :::46823                :::*                    LISTEN      22618/java      
tcp6       0      0 :::38802                :::*                    LISTEN      26780/java      
tcp6       0      0 :::8082                 :::*                    LISTEN      22618/java      
tcp6       0      0 :::8083                 :::*                    LISTEN      26780/java      
tcp6       0      0 :::22                   :::*                    LISTEN      1261/sshd       

####### Schema Registry
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1300/sshd       
tcp6       0      0 :::38051                :::*                    LISTEN      24588/java      
tcp6       0      0 :::8081                 :::*                    LISTEN      24588/java      
tcp6       0      0 :::22                   :::*                    LISTEN      1300/sshd       

####### KSQL server
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1266/sshd       
tcp6       0      0 :::38892                :::*                    LISTEN      24714/java      
tcp6       0      0 :::22                   :::*                    LISTEN      1266/sshd       
tcp6       0      0 :::8088                 :::*                    LISTEN      24714/java

Any help will be appreciated! :)

Better ansible.cfg

There's plenty of ways to make ansible faster, I recommend to look into https://github.com/AdopteUnOps/rancher-ansible/blob/master/ansible.cfg to have ideas to improve our current configuration.

Using headless JDK on Redhat?

Any reason why the headless JDK package isn't used on Redhat? Specifically, has anyone tested the java-1.8.0-openjdk-headless package instead of java-1.8.0-openjdk?

ZooKeeper SASL

ZooKeeper is currently open. We can do ZooKeeper SASL, so we should have that in the SASL_SSL path.

Unexpected failure in finding the lookup named '{{kafka.broker.datadir}}' in the available lookup plugins"

Hi all,

I'm a bit confuse why, but I'm getting an unexpected failure... Can anyone help here?
Seems this is a problem in the ansible playbooks, am I right?
This are completely fresh CentOS vms in AWS.

TASK [confluent.kafka-broker : create broker data directories] *******************************************************************************************************************************************************************************
fatal: [instance1101]: FAILED! => {"msg": "Unexpected failure in finding the lookup named '{{kafka.broker.datadir}}' in the available lookup plugins"}
fatal: [instance2101]: FAILED! => {"msg": "Unexpected failure in finding the lookup named '{{kafka.broker.datadir}}' in the available lookup plugins"}
fatal: [instance2102]: FAILED! => {"msg": "Unexpected failure in finding the lookup named '{{kafka.broker.datadir}}' in the available lookup plugins"}
fatal: [instance1102]: FAILED! => {"msg": "Unexpected failure in finding the lookup named '{{kafka.broker.datadir}}' in the available lookup plugins"}
	to retry, use: --limit @/home/centos/cp-ansible/all.retry

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
instance1101 : ok=11   changed=0    unreachable=0    failed=1   
instance1102 : ok=11   changed=0    unreachable=0    failed=1   
instance2101 : ok=11   changed=0    unreachable=0    failed=1   
instance2102 : ok=11   changed=0    unreachable=0    failed=1

Regards
-Sergei

Change KSQL server port from 8080 to 8088

Change this line: https://github.com/confluentinc/cp-ansible/blob/master/roles/ksql/defaults/main.yml#L8

To stay in sync with KSQL:
confluentinc/ksql#929

"Port 8080 for server is very common and can result in conflict. Changing it to 8088 will reduce this probability."

integrate KSQL/Schema Registry and Control Center

the setup currently doesn't integrate KSQL/Schema Registry with Control Center, that should happen for 5.0+

Non deterministic zookeeper.properties and myid files generation

Using these roles, the zookeeper.properties and myid files are generated non-deterministically.

The list of hosts in group zookeeper is in fact not a list, but a dictionary (hash) in Ansible(Python) (see somewhat related ansible/ansible#35495).
So the code for myid.j2 (https://github.com/confluentinc/cp-ansible/blob/5.1.x/roles/confluent.zookeeper/templates/myid.j2):


{% for host in groups['zookeeper'] %}
{% if host == inventory_hostname %}
{{ loop.index }}
{% endif %}
{% endfor %}

may generate different results between runs. Moreover, users usually name their servers in a pattern like:

zookeeper-01
zookeeper-02

And with this code, it may happen that zookeeper-01 will be set with myid =2 and vice-versa, which is counter-intuitive.

Same issue is with zookeeper.properties (https://github.com/confluentinc/cp-ansible/blob/5.1.x/roles/confluent.zookeeper/templates/zookeeper.properties.j2):

# Maintained by Ansible
{% for key, value in zookeeper.config.items() %}
{{key}}={{value}}
{% endfor %}
{% for host in groups['zookeeper'] %}
server.{{ loop.index }}={{ host }}:2888:3888
{% endfor %}

This code may (and it happend to me) generate this listing:

server.1=zookeeper-02:2888:3888
server.2=zookeeper-01:2888:3888

LOG_DIR in zookeeper is specified but not used

In roles/confluent.zookeeper/defaults/main.yml, there is an entry for zookeeper:environment:LOG_DIR. This isn't used by ZK, it uses a configuration setting. Incidentally, that setting isn't mentioned, but probably should be.

Kafka Connect Distributed can't communicate with broker

We run Confluent 5.0.1 from cp-ansible with three nodes configured with connect-distribued.
Since yesterday we got problems with connect.

When I push a new connection:

{
    "name": "signal-elastic-sink",
    "config": {
        "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
        "tasks.max": "6",
        "topics": "signal-source",
        "topic.index.map": "signal-source:signal",
        "schema.ignore": "false",
        "connection.url": "http://elastic01:9200",
        "key.ignore": "true",
        "type.name": "signal",
        "name": "signal-elastic-sink"
    }
}

Syslog get a frenzy with the following:
Nov 9 07:24:25 BROKER02 connect-distributed[16386]: [2018-11-09 07:24:25,451] ERROR [│worker rebalance)"}% Producer clientId=confluent.monitoring.interceptor.consumer-4] Connection to node -1 failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient:663)

development environment - java memory update

Java memory is setup to use 1GB though this can be ok for production however there are cases where you need to adjust this value (increase or decrease) for kafka and zookeeper whether to improve the performance or for testing.

We should identify the minimum Ansible version 2.5.x in the README.MD

We should identify the minimum Ansible version 2.5.x in the README.MD. Confluent docs mentions this but it would be good to refer to the version here as well

There should an option to let broker listen (bind) on specific IP address using listeners configuration option

Currently, the listeners are set up in roles/confluent.kafka-broker/templates/server.properties.j2 (and corresponding sasl_ssl and ssl versions) as:

# Maintained by Ansible
listeners=PLAINTEXT://:{{broker.config.port}}

{% include './includes/base_server.properties.j2' %}

# Confluent Support
{% include './includes/confluent_support.j2' %}

This leaves user of the role powerless in respect of setting the bind address of broker. In cloud environments, it is common that machines have two network interfaces : one for public internet access and one for VPC/internal network.

It would be great if role enabled user to set the listen host as IP address into these template files.

ansible_ssh_user variable deprecated after Ansible version 2.0

In our hosts.yml file, we use a variable ansible_ssh_user=centos.

As of version 2.0 of Ansible, the variable has been renamed to ansible_user. https://ansible-tips-and-tricks.readthedocs.io/en/latest/ansible/inventory/

There should be a note in the documentation about certain assumptions we're making, like the machine running Ansible has username centos, or a step that says to change the user to the user where the playbook is being run from.

Let me know if I've misunderstood something!

Kafka expects to store log data in a persistent location

I have a complete installation, and it is working quite well but I have a ERROR message in the control-center log that make me crazy:

ERROR [control-center-heartbeat-1] broker=1 is storing logs in /tmp/kafka-logs, Kafka expects to store log data in a persistent location (io.confluent.controlcenter.healthcheck.HealthCheck)

How can I can fix this issue ?

My current configuration in /etc/kafka/server.properties is:

# Maintained by Ansible
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://10.251.64.5:9092

zookeeper.connect=10.251.64.8:2181,10.251.64.7:2181,10.251.64.5:2181
--override log.dir=/var/lib/kafka/data
broker.id=1

log.segment.bytes=1073741824
socket.receive.buffer.bytes=102400
socket.send.buffer.bytes=102400
confluent.metrics.reporter.topic.replicas=3
num.network.threads=8
ssl.endpoint.identification.algorithm=
num.io.threads=16
confluent.metrics.reporter.ssl.endpoint.identification.algorithm=
transaction.state.log.min.isr=2
zookeeper.connection.timeout.ms=6000
offsets.topic.replication.factor=3
socket.request.max.bytes=104857600
log.retention.check.interval.ms=300000
group.initial.rebalance.delay.ms=0
metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter
auto.create.topics.enable=False
num.recovery.threads.per.data.dir=2
transaction.state.log.replication.factor=3
confluent.metrics.reporter.bootstrap.servers=10.251.64.5:9092
log.retention.hours=168
num.partitions=1

# Confluent Support
confluent.support.metrics.enable=true
confluent.support.customer.id=anonymous

Force users to specify the protocol

I've been thinking about how to use this project to do rolling upgrades. I think it makes sense if we force have the user specify inter.broker.protocol.version and log.message.format.version. This would prevent an on the fly upgrade if someone does yum upgrade

Upgrading platform via playbook

Hi,

what would be the best way to upgrade an existing version (installed with the playbooks) to a newer version, say, 5.1.0 to 5.1.2?

Would it be sufficient to include an upgrade: yes in the respective -common task files, e.g. for debian
in roles/cp-ansible/roles/confluent.common/tasks/debian.yml

- name: Install the Confluent Platform
  apt:
    name: "{{confluent.package_name}}"
    update_cache: yes
    upgrade: yes

Is this something worth externalizing?

Thanks!

confluent.metrics.reporter.ssl.endpoint.identification.algorithm does not exist

roles/confluent.kafka-broker/defaults/main.yml sets the attribute
confluent.metrics.reporter.ssl.endpoint.identification.algorithm

but this property does not exist in the documentation and is probably superfluous. Should be removed to avoid confusion.

Add support for debian/ubuntu installs

We're using systemd, so it should be the case that everything besides the yum tasks should translate well. There should be another playbook we can call ubuntu.yml to take care of the case ansible_distribution == 'Debian' or ansible_distribution == 'Ubuntu'

Make Schema Registry Service Port Configurable so not to break dependent services

For the Schema Registry, if you change it to listen on 0.0.0.0:28081, for example, then the Playbooks that reference it will be wrong e.g. Connect Distributed

Without extracting the port into the defaults yaml, the simple error-prone Schema Registry solution might be

{% set schema_registry_port = schema.registry.config.listeners.split(':')[2] %}
...
.... http://{{ host }}:{{ schema_registry_port }}/

Break down playbook so dry runs are effective

Today, if you do a dry run for the whole playbook from scratch, it will fail on the installation step. This is because the repo cannot be added in a dry run (this requires a change to the file system). This enhancement would break down the playbook into consumable chunks such that dry runs could be effective. A breakdown could be like 1) setup 2) install 3) start services 4) other operations after services have been started.