This post is older than a year. Consider some information might not be accurate anymore.
Used: apache zookeeper v3.4.11 docker v1.10.3
Apache Kafka uses Apache ZooKeeper. Apache Kafka needs coordination and Apache ZooKeeper is the piece of software which provides it. Coordinating distributed applications is ZooKeeper’s job. As part of my Kafka evaluation I investigated how to run Apache ZooKeeper in a production scenario for Apache Kafka. This a detailed documentation and summary of my observations. I won’t go into detail how coordination is done for Apache Kafka with ZooKeeper. I might explain it in another article. This article focus on ZooKeeper in a production environment concerning High Availability scenarios.
General
Some details about my scenario:
- Using 9 nodes (Yes that my production, btw. Raspberry Pi is pretty cheap in case you wanna try it out yourself)
- Node List: 1. alpha, 2. beta, 3. gamma, 4. delta, 5. epsilon, 6. zeta, 7. eta, 8. theta, 9. omega
- Deployment in Docker Containers with Ansible to above nodes, using this ZooKeeper Docker image
- Ports:
-
2181
- the client port for Apache Kafka or other clients -
2888
- port for ZooKeeper to connect to other ZooKeeper peers to coordinate -
3888
- the port for leader election, if the cluster needs to determine who is in charge - port
2888
and3888
must not be exposed regarding Docker Networking
-
Getting Started
- From the Apache Zookeeper Guide: Using Apache ZooKeeper in production, you should run it in replicated mode.
What does replicated mode mean?
- A replicated group of servers in the same application is called a
quorum
, - and in replicated mode, all servers in the quorum have copies of the same configuration file.
Some notes on Kafka’s ZooKeeper usage:
- You have more than one ZooKeeper instance for Apache Kafka.
- If Kafka uses a ZooKeeper cluster, some called it
ensemble
(Kafka in Action).
In the Kafka server.properties
you can provide a connection string with all the ZooKeeper instances. What about a Load Balancer? Let’s leave that out of the equation .
Kafka’s server.properties
zookeeper.connect="alpha:2181,beta:2181,gamma:2181,delta:2181,epsilon:2181,zeta:2181,eta:2181,theta:2181,omega:2181"
Architecture Design
Some information from Apache ZooKeeper
Minimum Requirement
Apache ZooKeeper Getting Started:
For replicated mode, a minimum of three servers are required, and it is strongly recommended that you have an odd number of servers. If you only have two servers, then you are in a situation where if one of them fails, there are not enough machines to form a majority quorum. Two servers is inherently less stable than a single server, because there are two single points of failure.
Recommended Setup
Apache ZooKeeper Administration Guide:
Usually three servers is more than enough for a production install, but for maximum reliability during maintenance, you may wish to install five servers. With three servers, if you perform maintenance on one of them, you are vulnerable to a failure on one of the other two servers during that maintenance. If you have five of them running, you can take one down for maintenance, and know that you’re still OK if one of the other four suddenly fails.
Summary:
- minimum 3 servers
- odd numbers is better for majority election
- recommendation is 5 servers
To my specific scenario:
- Having 9 servers the majority election takes place with 5 servers!
- 3 ZooKeeper servers are not the majority in a cluster of 9!
Deployment
As I have stated Ansible is used to ship ZooKeeper in Docker containers.
Ansible Playbook
Here is the playbook.yml
and explained in detail.
# playbook for Apache Zookeeper deployment with docker containers
# some problems with exposed ports, switch to network_mode host as workaround
---
- hosts: all
vars:
image: zookeeper
ports:
- "2181:2181"
- "2888:2888"
- "3888:3888"
ids:
alpha: 1
beta: 2
gamma: 3
delta: 4
epsilon: 5
zeta: 6
eta: 7
theta: 8
omega: 9
mappings:
ZOO_SERVERS: >
server.1=alpha:2888:3888
server.2=beta:2888:3888
server.3=gamma:2888:3888
server.4=delta:2888:3888
server.5=epsilon:2888:3888
server.6=zeta:2888:3888
server.7=eta:2888:3888
server.8=theta:2888:3888
server.9=omega:2888:3888
ZOO_MY_ID: "{{ids[ansible_hostname]}}"
tasks:
- name: deploy
docker_container:
env:
"{{mappings}}"
name: zookeeper
image: zookeeper
network_mode: host
pull: true
log_opt:
max-file: "3"
max-size: 25m
state: started
restart: yes
restart_policy: always
restart_retries: 10
volumes:
- "/var/opt/zookeeper:/data"
- "/var/log/zookeeper:/datalog"
The environment variables are important.
-
ZOO_MY_ID
= Each ZooKeeper server needs a unique id. We pass it by using a dictionary, i.e.alpha
has the id1
. -
ZOO_SERVERS
= The server list is mandatory for Apache ZooKeeper. Each line is concatenated to a single line and pass as dockerenvironment
argument.
Run to deploy ansible-playbook playbook.yml
.
Inspect Deployment
Each ZooKeeper server is shipped in a docker container with the name zookeeper
. Inspecting the alpha container:
tan@alpha:~> docker exec -it zookeeper /bin/bash
bash-4.4# cat /conf/
configuration.xsl log4j.properties zoo.cfg zoo_sample.cfg
bash-4.4# cat /conf/zoo.cfg
clientPort=2181
dataDir=/data
dataLogDir=/datalog
tickTime=2000
initLimit=5
syncLimit=2
maxClientCnxns=60
server.1=alpha:2888:3888
server.2=beta:2888:3888
server.3=gamma:2888:3888
server.4=delta:2888:3888
server.5=epsilon:2888:3888
server.6=zeta:2888:3888
server.7=eta:2888:3888
server.8=theta:2888:3888
server.9=omega:2888:3888
- The passed arguments are used to write the ZooKeeper configuration in
/conf
. - Pay attention that the docker image does not used the
conf
directory within the ZooKeeper shipment.
bash-4.4# cd conf/
bash-4.4# ls
bash-4.4# pwd
/zookeeper-3.4.11/conf
We have seen that the server list is written in the ZooKeeper configuration, but where is ZOO_MY_ID
is stored?
ZooKeeper stores it in myid
in the data
directory. This is the mounted volume /var/opt/zookeeper
.
On the docker host system:
tan@alpha:/var/opt/zookeeper> cat myid
1
Leader Election
How does ZooKeeper elect its leader? As we know this is a majority election. For demonstration purposes, I will start node by node to illustrate the behavior of ZooKeeper, since I have found some bogus information on various blog pages, I want to prevent you from misinformation. As we know 5 servers are mandatory for leader election.
Initial
Start Server 1: alpha
tan@alpha:/opt/ansible/zookeeper> ansible-playbook -l alpha playbook.yml
PLAY [all] *********************************************************************
TASK [setup] *******************************************************************
ok: [alpha]
TASK [deploy] ******************************************************************
changed: [alpha]
PLAY RECAP *********************************************************************
alpha: ok=2 changed=1 unreachable=0 failed=0
ZooKeeper allows us to issue four letter commands via telnet
or nc
(netcat) to check its status with the stats
command.
tan@alpha:/opt/ansible/zookeeper> telnet alpha 2181
Trying 10.22.62.124...
Connected to alpha.
Escape character is '^]'.
stat
This ZooKeeper instance is not currently serving requests
Connection closed by foreign host.
The message This ZooKeeper instance is not currently serving requests
is important for us, though this node is up but not operational. Another command is ruok
(are you ok?).
tan@alpha:/opt/ansible/zookeeper> telnet alpha 2181
Trying 10.22.62.124...
Connected to fo-ppd01-dc1.
Escape character is '^]'.
ruok
imok
Connection closed by foreign host.
ZooKeeper responds imok
(I am ok ). The alpha node is up.
Start other nodes
Starting beta ZooKeeper server 2 and check with netcat
ansible-playbook -l beta playbook.yml
echo stats | nc beta 2181
This ZooKeeper instance is not currently serving requests
Repeat this until server 5 (epsilon).
tan@alpha:/opt/ansible/zookeeper> echo stats | nc epsilon 2181
Zookeeper version: 3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
Clients:
/10.22.62.128:56598[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 2
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x600000000
Mode: leader
Node count: 4
We see that ZooKeeper was able to elect the leader as the majority vote could take place. Check the logs:
2018-03-08 10:42:59,758 [myid:5] - INFO [LearnerHandler-/10.22.62.124:55684:LearnerHandler@535] - Received NEWLEADER-ACK message from 1
2018-03-08 10:42:59,758 [myid:5] - INFO [LearnerHandler-/10.22.190.126:39461:LearnerHandler@535] - Received NEWLEADER-ACK message from 4
2018-03-08 10:42:59,758 [myid:5] - INFO [LearnerHandler-/10.22.190.121:49735:LearnerHandler@535] - Received NEWLEADER-ACK message from 3
2018-03-08 10:42:59,775 [myid:5] - INFO [LearnerHandler-/10.22.62.126:39509:LearnerHandler@535] - Received NEWLEADER-ACK message from 2
2018-03-08 10:42:59,776 [myid:5] - INFO [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Leader@962] - Have quorum of supporters, sids: [ 1,2,3,4,5 ]; starting up and setting last processed zxid: 0x600000000
Check the previous nodes:
tan@alpha:~> for i in alpha beta gamma delta; do echo $i: $(echo stats | nc $i 2181 | grep "Mode"); done
alpha: Mode: follower
beta: Mode: follower
gamma: Mode: follower
delta: Mode: follower
I have written a small gist how to check for all nodes who is the leader.
Important: The startup order of a ZooKeeper server is not relevant, i.e. it does not matter that alpha has to be started first.
High Availability Scenarios
- Having 9 nodes, means that you must operate with 5 nodes, in order to keep the cluster operational.
-
4 nodes can be altered or upgraded in the mean time.
- The recommended scenario is 5 nodes, that means 3 nodes must be alive.
- If only 2 nodes are alive, ZooKeeper will stop serving requests until the third node is up again.
Summary
- Running Zookeeper in Replicated Mode is simple.
- Ansible und Docker are great and essential for maintaining a production cluster.
- Some details where intentionally left out, e.g. how much disk space must or should a ZooKeeper server have. This is really dependent on your use case with Apache Kafka or Apache Cassandra using Apache ZooKeeper.