This post is older than a year. Consider some information might not be accurate anymore.
Used: elasticsearch v5.6.8
Shards are parts of an Apache Lucene Index, the storage unit of Elasticsearch. An index may consists of more than one shard. Elasticsearch distributes the storage to its nodes. In a regular case each shard (as primary) has a replica. Primary and Replica are never stored on the same node. If a node fails, the replica takes over as primary and Elasticsearch tries to allocate a replica shard in the remaining cluster nodes. Cluster Shard Allocation is a pretty decent mechanism to ensure high availability. This post gives some insights and recipes how to deal with cluster shard allocation in a hot-warm architecture.
Index Details
Since this is about an index, use the Index API to check how many primaries and replicas an index has.
GET _cat/indices/metricbeat-6.2.2-2018.03.27?v
The output as plain text with headers.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open metricbeat-6.2.2-2018.03.27 H_sYkWaiRmS2H4dJesu9Pg 1 1 2024496 0 5.2gb 2.6gb
Above example shows us one primary and one replica for index metricbeat-6.2.2-2018.03.27
.
Cluster Scenario
Cluster Shard Allocation is a pretty cool feature of Elasticsearch. If you have a hot warm architecture you can e.g. prioritize data regarding IO. Hot and warm are just semantic values to describe Elasticsearch nodes regarding IO. For instance hot is super fast (SSD or SAN), warm slower (magnetic drives). My cluster setup regarding data nodes:
- 5 hot nodes
- 2 warm nodes
- 1 volatile node
The volatile node is for insignificant data, like Machine Learning (currently in evaluation ). You can define custom attributes in the elasticsearch.yml
like
and use box_type
as discriminator for data nodes in a hot-warm architecture.
Analyze Index Settings
If you wanna check if a allocation exists on the index
GET metricbeat-6.2.2-2018.03.27/_settings
The output
{
"metricbeat-6.2.2-2018.03.27": {
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"box_type": "volatile,hot"
}
}
},
"mapping": {
"total_fields": {
"limit": "10000"
}
},
"refresh_interval": "30s",
"number_of_shards": "1",
"provided_name": "metricbeat-6.2.2-2018.03.27",
"creation_date": "1522101603804",
"unassigned": {
"node_left": {
"delayed_timeout": "30m"
}
},
"number_of_replicas": "1",
"uuid": "H_sYkWaiRmS2H4dJesu9Pg",
"version": {
"created": "5060899"
}
}
}
}
}
"box_type": "volatile,hot"
on the include part means that the index can reside on a volatile
or hot
node. The combination is a OR conjunction.
Explain API
Elasticsearch offers a powerful explain API, to check that:
GET /_cluster/allocation/explain
{
"index": "metricbeat-6.2.2-2018.03.27",
"shard": 0,
"primary": true
}
Basically the shortened output tells you about the decisions the cluster (master node) has made. In the node_allocation_decisions
you find all the details.
{
"index": "metricbeat-6.2.2-2018.03.27",
"shard": 0,
"primary": true,
"current_state": "started",
"current_node": {
"name": "machine-learning-master",
"attributes": {
"box_type": "hot"
},
"weight_ranking": 2
},
"can_remain_on_current_node": "yes",
"can_rebalance_cluster": "throttled",
"can_rebalance_cluster_decisions": [],
"can_rebalance_to_other_node": "throttled",
"rebalance_explanation": "rebalancing is throttled",
"node_allocation_decisions": [
{
"node_name": "machine-learning-slave",
"node_attributes": {
"box_type": "volatile"
},
"node_decision": "throttled",
"weight_ranking": 1,
"deciders": [
{
"decider": "throttling",
"decision": "THROTTLE",
"explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
},
{
"node_name": "hot-node",
"node_attributes": {
"box_type": "hot"
},
"node_decision": "no",
"weight_ranking": 6,
"deciders": [
{
"decider": "same_shard",
"decision": "NO",
"explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[metricbeat-6.2.2-2018.03.27][0], node[p87j8OP3R4GzTrgau_chsw], [R], s[STARTED], a[id=NeGf1la8TC-czqIMLspnXg]]"
}
]
},
{
"node_name": "warm-node",
"node_attributes": {
"box_type": "warm"
},
"node_decision": "no",
"weight_ranking": 7,
"deciders": [
{
"decider": "filter",
"decision": "NO",
"explanation": "node does not match index setting [index.routing.allocation.include] filters [box_type:\"volatile OR hot\"]"
}
]
}
//..
]
}
Change Shard Allocation
If you want to have a different allocation, following template list all available options:
PUT metricbeat-6.2.2-2018.03.27/_settings
{
"index": {
"routing": {
"allocation": {
"include": {
"box_type": "hot"
},
"exclude": {
"box_type": "warm"
},
"require": {
"box_type": ""
}
}
}
}
}
The use case for a custom routing is for instance a performance reason. For instance: You have to investigate 2 days/indices in the past, residing on the warm nodes, you can allocate them to the hot node during the investigation and let Elasticsearch Curator (the housekeeping tool) move them automatically back after the investigation.
Include
Above example would allocate the index on hot nodes only. If you add multiple values like volatile,hot
, Elasticsearch will also use the volatile node.
Exclude
In this example the warm node is excluded. To unset it use ""
. As tested in v5.6.8 null
does not work.
Require
If you choose require
instead of include
, multiple values are AND joined. If you use
PUT metricbeat-6.2.2-2018.03.27/_settings
{
"index.routing.allocation.require.box_type": "volatile,hot"
}
a node must have both attributes. In my above scenario, that won’t work. To unset it:
PUT metricbeat-6.2.2-2018.03.27/_settings
{
"index.routing.allocation.require.box_type": ""
}
Cluster Reroute
If you want to intervene in the master node decision, the reroute command allows to explicitly execute a cluster reroute allocation command including specific commands. A simple replica allocation to a specific data node. Elasticsearch allows to use the node names. Node ids are unique, but are hard to remember .
POST /_cluster/reroute
{
"commands": [
{
"allocate_replica": {
"index": "metricbeat-6.2.2-2018.03.27",
"shard": 0,
"node": "hot-node-3"
}
}
]
}
Summary
- Elasticsearch does a wonderful job keeping your indices distributed among data nodes.
- If you need to check the Cluster Routing examine the index settings.
- Specific Cluster Filtering with
include
,exclude
andrequire
allows a fine grained control on the distribution among data nodes. - The Cluster Allocation Explain gives you detailed information about the decisions.
- You can adjust or intervene in the allocation with the Cluster Reroute commands.
A template for cluster filtering