Loading...

The all meta field in Elasticsearch

:heavy_exclamation_mark: This post is older than a year. Consider some information might not be accurate anymore. :heavy_exclamation_mark:

Used:   elasticsearch v5.1 

Are you wondering yourself how Elasticsearch finds the text, that you are searching for? Learn more about the _all meta field.

Definition

The _all field is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved. Elasticsearch Reference 5.1

Demo

Let’s look at some examples.

Setup Index

First create the testdata index. _all is usually always enabled, but this is a demo :smirk:.

PUT testdata
{
  "settings": {
    "index": {
      "number_of_shards": 1
    }
  },
  "mappings": {
    "logs": {
      "_all": {
        "enabled": true
      }
    }
  }
}

Add some test data

Some fancy data to search for. The data contains PANs (Primary Account Numbers or Credit Card Numbers). As you can see, the PAN occurs in different fields.

POST testdata/logs
{
  "pan" : "4000000000000002",
  "firstname": "John",
  "lastname": "Legend",
  "profession": "Musician"
}
POST testdata/logs
{
  "fistname" : "4026000000000002",
  "lastname": "Danger"
}
POST testdata/logs
{
  "merchant": "tancomat 3000",
  "firstname": "Beat",
  "lastname": "Sommer",
  "profession": "issuer",
  "comment": "5100000000000008"
}

Search for PANs

To search for a PAN we just use the query_string search.

When not explicitly specifying the field to search on in the query string syntax, the `index.query.default_field` will be used to derive which field to search on. It defaults to `_all` field.

Query String Query

GET testdata/_search
{
  "query": {
    "query_string": {
      "query": "/[3-9][0-9]{13,18}/"
    }
  }
}

This will give you all documents, that matches the regexp for a PAN.

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "testdata",
        "_type": "logs",
        "_id": "AVm2NAF6bFiGlYewuu9_",
        "_score": 1,
        "_source": {
          "pan": "4000000000000002",
          "firstname": "John",
          "lastname": "Legend",
          "profession": "Musician"
        }
      },
      {
        "_index": "testdata",
        "_type": "logs",
        "_id": "AVm2NCI_bFiGlYewuu-F",
        "_score": 1,
        "_source": {
          "fistname": "4026000000000002",
          "lastname": "Danger"
        }
      },
      {
        "_index": "testdata",
        "_type": "logs",
        "_id": "AVm2NEF0bFiGlYewuu-L",
        "_score": 1,
        "_source": {
          "merchant": "tancomat 3000",
          "firstname": "Beat",
          "lastname": "Sommer",
          "profession": "issuer",
          "comment": "5100000000000008"
        }
      }
    ]
  }
}

Exclude fields

You may want to exclude certain fields from the _all search. In the index type mapping just configure it not to include in the all field. See below the example for date field.

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "title": {
          "type": "text"
        },
        "content": {
          "type": "text"
        },
        "date": {
          "type": "date",
          "include_in_all": false
        }
      }
    }
  }
}

Summary

  • _all is a metadata field
  • all values are appended into one field value as concatenated String
  • is enabled by default

:warning:️ But remember: :warning:

The _all field is not free: it requires extra CPU cycles and uses more disk space. If not needed, it can be completely disabled or customised on a per-field basis.

Please remember the terms for blog comments.