Loading...

Get distinct field values in Elasticsearch

:heavy_exclamation_mark: This post is older than a year. Consider some information might not be accurate anymore. :heavy_exclamation_mark:

The aggregations framework helps provide aggregated data based on a search query. It is based on simple building blocks called aggregations, that can be composed in order to build complex summaries of the data. There are several types of aggregations. The cardinality aggregation is the exact match for distinct field values.

A single-value metrics aggregation that calculates an approximate count of distinct values. Values can be extracted either from specific fields in the document or generated by a script.

Elasticsearch Reference Cardinality Aggregation

Example: Terminals may have more than one transaction. We want to look for all terminals that have made transactions in the last 15 minutes.

GET transactions/_search
{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        { "match": { "origin": "EPDEUX" }},
        { "match": { "respCode": "00" }},
        { "range": { "@timestamp": { "gte": "now-15m" }}}
      ]
    }
  },
  "aggs" : {
        "distinct_terminals" : {
            "cardinality" : {
              "field" : "terminalId"
            }
        }
    }
}

The response looks like this. From 84075 hits/transactions there were 29700 terminals involved.

{
  "took": 602,
  "timed_out": false,
  "_shards": {
    "total": 33,
    "successful": 33,
  },
  "hits": {
    "total": 84075,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "distinct_terminals": {
      "value": 29700
    }
  }
}
Please remember the terms for blog comments.