A Journey of a Thousand Miles Begins with a Single Step

Shard Allocation in a Elasticsearch Cluster

Shards are parts of an Apache Lucene Index, the storage unit of Elasticsearch. An index may consists of more than one shard. Elasticsearch distributes the storage to its nodes. In a regular case each shard (as primary) has a replica. Primary and Replica are never stored on the same node. If a node fails, the replica takes over as primary and Elasticsearch tries to allocate a replica shard in the remaining cluster nodes. Cluster Shard Allocation is a pretty decent mechanism to ensure high availability. This post gives some insights and recipes how to deal with cluster shard allocation in a hot-warm architecture.

Read more

Elasticsearch Certificates

Since Version 6 X-Pack Security for Elasticsearch requires Node to Node encryption to secure the Elasticsearch cluster. The main reason is, that no unknown node can join the cluster and gets data by shard allocation. Since V6, V6.1 and V6.2 the tool certgen became deprecated and was replaced by certutil. My use case scenario: Created certificates with certgen for my cluster and needed to generate a new certificate for a new data node.

Read more

Using Sidecar Container for Elasticsearch Configuration

Applications shipped in Docker containers are a major game changer, especially having a Elasticsearch cluster. My production cluster consists of 11 nodes. In the core, Elasticsearch is the same. Each node though has its specific configuration, settings and purpose. On top of that, Elasticsearch X-Pack Security in Version 6 requires that the communication within the cluster must run encrypted. This is accomplished by SSL certificates. Each node has its own private key and certificate. So I was facing with the problem, how to ship the node specific parts along with the core elasticsearch container. Use the core container as baseline and copy the configuration and certificate into the container? This would resolve in 11 specific images. Not in the spirit of reusability though. :thinking: The better approach or answer came by remembering the tech talk Docker Patterns by Roland Huss, given at the Java Conference (Javaland 2016). Use a configuration container as a sidecar!

Read more

Apache Kafka Management and Monitoring

Monitoring for Apache Kafka is crucial to know the moment when to act or scale out your Kafka clusters. Besides the CLI commands, metrics are also accessible over JMX and jconsole. A more convenient way is to have a GUI that displays it. This post focus on Kafka Manager, a administration GUI for Kafka by Yahoo.

Read more

Testing YAML

YAML (YAML Ain’t Markup Language) is a essential part of Ansible playbooks. If you have really long options to pass to programs, yaml offers several possibilities to maintain it readable and thus maintainable. This article demonstrates how how to split a string over multiple lines in yaml.

Read more

Apache ZooKeeper in Production: Replicated ZooKeeper

Apache Kafka uses Apache ZooKeeper. Apache Kafka needs coordination and Apache ZooKeeper is the piece of software which provides it. Coordinating distributed applications is ZooKeeper’s job. As part of my Kafka evaluation I investigated how to run Apache ZooKeeper in a production scenario for Apache Kafka. This a detailed documentation and summary of my observations. I won’t go into detail how coordination is done for Apache Kafka with ZooKeeper. I might explain it in another article. This article focus on ZooKeeper in a production environment concerning High Availability scenarios.

Read more

Live Debugging Elasticsearch

Elasticsearch offers the capability to alter the log level at runtime, for troubleshooting. I got some problems with TLS and this was really helpful and the good thing: No cluster downtime! Elasticsearch uses Apache Log4j 2

Read more

Configure Git Credentials

Gitlab and Github offers personal access tokens for git access over https. They are the only accepted method of authentication when you have Two-Factor Authentication (2FA) enabled. Since I have a Yubikey, I have to use a personal access token, if SSH is not viable, e.g. working in safe guarded environment. A token however has the advantage that it can expire, thus forcing me to exchange it more frequently to hinder attack scenarios.

Read more

Jaegertracing with Elasticsearch Storage

Distributed Tracing with Jaeger by Uber Technologies is pretty impressive. As default you use Apache Cassandra as storage. Jaeger is also capable of using Elasticsearch 5/6. It took me some time and some code diving on github to find the respective options for Elasticsearch. But I finally got it together in this docker-compose.yml. My Elasticsearch Cluster runs with a commercial X-Pack license so we have to pass some authentication.

Read more