Loading...

Coercion in Elasticsearch

:heavy_exclamation_mark: This post is older than a year. Consider some information might not be accurate anymore. :heavy_exclamation_mark:

If a field with its datatype in the mapping is defined, e.g. duration as Integer, Elasticsearch has a default behavior of coercion, if the value for the duration field is String. The String will be written, but interpreted as Integer. This can be a little bit misleading if you use only the document perspective.

Field duration has default behavior. Coercion is on.

PUT vinh
{
  "mappings": {
    "logs": {
      "properties": {
        "duration": {
          "type": "integer"
        }
      }
    }
  }
}

Create two documents (1st is an Integer, 2nd is a string) with the bulk API. If you reverse the order, it does not matter.

POST _bulk
{ "index" : { "_index" : "vinh", "_type" : "logs", "_id" : "1" } }
{ "duration" : 10 }
{ "index" : { "_index" : "vinh", "_type" : "logs", "_id" : "2" } }
{ "duration" : "7" }

If you query the data, you see the String 7.

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "vinh",
        "_type": "logs",
        "_id": "2",
        "_score": 1,
        "_source": {
          "duration": "7"
        }
      },
      {
        "_index": "vinh",
        "_type": "logs",
        "_id": "1",
        "_score": 1,
        "_source": {
          "duration": 10
        }
      }
    ]
  }
}

But for Kibana it is a number and thus usable for visualizations.

Coercion on duration

Some might argue this must be expensive for Elasticsearch or Kibana. It really is important to ensure that either you report the data correctly or turn off coercion, so the party who is generating the index events has to write the correct data type.

Please remember the terms for blog comments.