Loading...

Apache ZooKeeper in Production: Replicated ZooKeeper

:heavy_exclamation_mark: This post is older than a year. Consider some information might not be accurate anymore. :heavy_exclamation_mark:

Used:   apache zookeeper v3.4.11  docker v1.10.3 

Apache Kafka uses Apache ZooKeeper. Apache Kafka needs coordination and Apache ZooKeeper is the piece of software which provides it. Coordinating distributed applications is ZooKeeper’s job. As part of my Kafka evaluation I investigated how to run Apache ZooKeeper in a production scenario for Apache Kafka. This a detailed documentation and summary of my observations. I won’t go into detail how coordination is done for Apache Kafka with ZooKeeper. I might explain it in another article. This article focus on ZooKeeper in a production environment concerning High Availability scenarios.

General

Some details about my scenario:

  • Using 9 nodes (Yes that my production, btw. Raspberry Pi is pretty cheap in case you wanna try it out yourself)
  • Node List: 1. alpha, 2. beta, 3. gamma, 4. delta, 5. epsilon, 6. zeta, 7. eta, 8. theta, 9. omega
  • Deployment in Docker Containers with Ansible to above nodes, using this ZooKeeper Docker image
  • Ports:
    • 2181 - the client port for Apache Kafka or other clients
    • 2888 - port for ZooKeeper to connect to other ZooKeeper peers to coordinate
    • 3888 - the port for leader election, if the cluster needs to determine who is in charge
    • port 2888 and 3888 must not be exposed regarding Docker Networking

Getting Started

  • From the Apache Zookeeper Guide: Using Apache ZooKeeper in production, you should run it in replicated mode.

What does replicated mode mean?

  • A replicated group of servers in the same application is called a quorum,
  • and in replicated mode, all servers in the quorum have copies of the same configuration file.

Some notes on Kafka’s ZooKeeper usage:

  • You have more than one ZooKeeper instance for Apache Kafka.
  • If Kafka uses a ZooKeeper cluster, some called it ensemble (Kafka in Action).

In the Kafka server.properties you can provide a connection string with all the ZooKeeper instances. What about a Load Balancer? Let’s leave that out of the equation :wink:.

Kafka’s server.properties

zookeeper.connect="alpha:2181,beta:2181,gamma:2181,delta:2181,epsilon:2181,zeta:2181,eta:2181,theta:2181,omega:2181"

Architecture Design

Some information from Apache ZooKeeper

Minimum Requirement

Apache ZooKeeper Getting Started:

For replicated mode, a minimum of three servers are required, and it is strongly recommended that you have an odd number of servers. If you only have two servers, then you are in a situation where if one of them fails, there are not enough machines to form a majority quorum. Two servers is inherently less stable than a single server, because there are two single points of failure.

Apache ZooKeeper Administration Guide:

Usually three servers is more than enough for a production install, but for maximum reliability during maintenance, you may wish to install five servers. With three servers, if you perform maintenance on one of them, you are vulnerable to a failure on one of the other two servers during that maintenance. If you have five of them running, you can take one down for maintenance, and know that you’re still OK if one of the other four suddenly fails.

Summary:

  • minimum 3 servers
  • odd numbers is better for majority election
  • recommendation is 5 servers

To my specific scenario:

  • Having 9 servers the majority election takes place with 5 servers!
  • 3 ZooKeeper servers are not the majority in a cluster of 9!

Deployment

As I have stated Ansible is used to ship ZooKeeper in Docker containers.

Ansible Playbook

Here is the playbook.yml and explained in detail.

# playbook for Apache Zookeeper deployment with docker containers
# some problems with exposed ports, switch to network_mode host as workaround
---

- hosts: all
  vars:
    image: zookeeper
    ports:
      - "2181:2181"
      - "2888:2888"
      - "3888:3888"
    ids:
      alpha:    1
      beta:     2
      gamma:    3
      delta:    4
      epsilon:  5
      zeta:     6
      eta:      7
      theta:    8
      omega:    9
    mappings:
      ZOO_SERVERS: >
        server.1=alpha:2888:3888
        server.2=beta:2888:3888
        server.3=gamma:2888:3888
        server.4=delta:2888:3888
        server.5=epsilon:2888:3888
        server.6=zeta:2888:3888
        server.7=eta:2888:3888
        server.8=theta:2888:3888
        server.9=omega:2888:3888
      ZOO_MY_ID: "{{ids[ansible_hostname]}}"
  tasks:
    - name: deploy
      docker_container:
        env:
          "{{mappings}}"
        name: zookeeper
        image: zookeeper
        network_mode: host
        pull: true
        log_opt:
          max-file: "3"
          max-size: 25m
        state: started
        restart: yes
        restart_policy: always
        restart_retries: 10
        volumes:
          - "/var/opt/zookeeper:/data"
          - "/var/log/zookeeper:/datalog"

The environment variables are important.

  • ZOO_MY_ID = Each ZooKeeper server needs a unique id. We pass it by using a dictionary, i.e. alpha has the id 1.
  • ZOO_SERVERS = The server list is mandatory for Apache ZooKeeper. Each line is concatenated to a single line and pass as docker environment argument.

Run to deploy ansible-playbook playbook.yml.

Inspect Deployment

Each ZooKeeper server is shipped in a docker container with the name zookeeper. Inspecting the alpha container:

tan@alpha:~> docker exec -it zookeeper /bin/bash
bash-4.4# cat /conf/
configuration.xsl  log4j.properties   zoo.cfg            zoo_sample.cfg
bash-4.4# cat /conf/zoo.cfg
clientPort=2181
dataDir=/data
dataLogDir=/datalog
tickTime=2000
initLimit=5
syncLimit=2
maxClientCnxns=60
server.1=alpha:2888:3888
server.2=beta:2888:3888
server.3=gamma:2888:3888
server.4=delta:2888:3888
server.5=epsilon:2888:3888
server.6=zeta:2888:3888
server.7=eta:2888:3888
server.8=theta:2888:3888
server.9=omega:2888:3888
  • The passed arguments are used to write the ZooKeeper configuration in /conf.
  • Pay attention that the docker image does not used the conf directory within the ZooKeeper shipment.
bash-4.4# cd conf/
bash-4.4# ls
bash-4.4# pwd
/zookeeper-3.4.11/conf

We have seen that the server list is written in the ZooKeeper configuration, but where is ZOO_MY_ID is stored?

ZooKeeper stores it in myid in the data directory. This is the mounted volume /var/opt/zookeeper.

On the docker host system:

tan@alpha:/var/opt/zookeeper> cat myid
1

Leader Election

How does ZooKeeper elect its leader? As we know this is a majority election. For demonstration purposes, I will start node by node to illustrate the behavior of ZooKeeper, since I have found some bogus information on various blog pages, I want to prevent you from misinformation. As we know 5 servers are mandatory for leader election.

Initial

Start Server 1: alpha

tan@alpha:/opt/ansible/zookeeper> ansible-playbook -l alpha playbook.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [alpha]

TASK [deploy] ******************************************************************
changed: [alpha]

PLAY RECAP *********************************************************************
alpha: ok=2    changed=1    unreachable=0    failed=0

ZooKeeper allows us to issue four letter commands via telnet or nc (netcat) to check its status with the stats command.

tan@alpha:/opt/ansible/zookeeper> telnet alpha 2181
Trying 10.22.62.124...
Connected to alpha.
Escape character is '^]'.
stat
This ZooKeeper instance is not currently serving requests
Connection closed by foreign host.

The message This ZooKeeper instance is not currently serving requests is important for us, though this node is up but not operational. Another command is ruok (are you ok?).

tan@alpha:/opt/ansible/zookeeper> telnet alpha 2181
Trying 10.22.62.124...
Connected to fo-ppd01-dc1.
Escape character is '^]'.
ruok
imok
Connection closed by foreign host.

ZooKeeper responds imok (I am ok :smile:). The alpha node is up.

Start other nodes

Starting beta ZooKeeper server 2 and check with netcat

ansible-playbook -l beta playbook.yml
echo stats | nc beta 2181
This ZooKeeper instance is not currently serving requests

Repeat this until server 5 (epsilon).

tan@alpha:/opt/ansible/zookeeper> echo stats | nc epsilon 2181
Zookeeper version: 3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
Clients:
 /10.22.62.128:56598[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 2
Sent: 1
Connections: 1
Outstanding: 0
Zxid: 0x600000000
Mode: leader
Node count: 4

We see that ZooKeeper was able to elect the leader as the majority vote could take place. Check the logs:

2018-03-08 10:42:59,758 [myid:5] - INFO  [LearnerHandler-/10.22.62.124:55684:LearnerHandler@535] - Received NEWLEADER-ACK message from 1
2018-03-08 10:42:59,758 [myid:5] - INFO  [LearnerHandler-/10.22.190.126:39461:LearnerHandler@535] - Received NEWLEADER-ACK message from 4
2018-03-08 10:42:59,758 [myid:5] - INFO  [LearnerHandler-/10.22.190.121:49735:LearnerHandler@535] - Received NEWLEADER-ACK message from 3
2018-03-08 10:42:59,775 [myid:5] - INFO  [LearnerHandler-/10.22.62.126:39509:LearnerHandler@535] - Received NEWLEADER-ACK message from 2
2018-03-08 10:42:59,776 [myid:5] - INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Leader@962] - Have quorum of supporters, sids: [ 1,2,3,4,5 ]; starting up and setting last processed zxid: 0x600000000

Check the previous nodes:

tan@alpha:~> for i in alpha beta gamma delta; do echo $i: $(echo stats | nc $i 2181 | grep "Mode"); done
alpha: Mode: follower
beta: Mode: follower
gamma: Mode: follower
delta: Mode: follower

I have written a small gist how to check for all nodes who is the leader.

Important: The startup order of a ZooKeeper server is not relevant, i.e. it does not matter that alpha has to be started first.

High Availability Scenarios

  • Having 9 nodes, means that you must operate with 5 nodes, in order to keep the cluster operational.
  • 4 nodes can be altered or upgraded in the mean time.

  • The recommended scenario is 5 nodes, that means 3 nodes must be alive.
  • If only 2 nodes are alive, ZooKeeper will stop serving requests until the third node is up again.

Summary

  • Running Zookeeper in Replicated Mode is simple.
  • Ansible und Docker are great and essential for maintaining a production cluster.
  • Some details where intentionally left out, e.g. how much disk space must or should a ZooKeeper server have. This is really dependent on your use case with Apache Kafka or Apache Cassandra using Apache ZooKeeper.
Please remember the terms for blog comments.