This post is older than a year. Consider some information might not be accurate anymore.
Used: elasticsearch v5.1
Are you wondering yourself how Elasticsearch finds the text, that you are searching for? Learn more about the _all
meta field.
Definition
The
_all
field is a special catch-all field which concatenates the values of all of the other fields into one big string, using space as a delimiter, which is then analyzed and indexed, but not stored. This means that it can be searched, but not retrieved. Elasticsearch Reference 5.1
Demo
Let’s look at some examples.
Setup Index
First create the testdata
index. _all
is usually always enabled, but this is a demo .
PUT testdata
{
"settings": {
"index": {
"number_of_shards": 1
}
},
"mappings": {
"logs": {
"_all": {
"enabled": true
}
}
}
}
Add some test data
Some fancy data to search for. The data contains PANs (Primary Account Numbers or Credit Card Numbers). As you can see, the PAN occurs in different fields.
POST testdata/logs
{
"pan" : "4000000000000002",
"firstname": "John",
"lastname": "Legend",
"profession": "Musician"
}
POST testdata/logs
{
"fistname" : "4026000000000002",
"lastname": "Danger"
}
POST testdata/logs
{
"merchant": "tancomat 3000",
"firstname": "Beat",
"lastname": "Sommer",
"profession": "issuer",
"comment": "5100000000000008"
}
Search for PANs
To search for a PAN we just use the query_string search.
When not explicitly specifying the field to search on in the query string syntax, the `index.query.default_field` will be used to derive which field to search on. It defaults to `_all` field.
GET testdata/_search
{
"query": {
"query_string": {
"query": "/[3-9][0-9]{13,18}/"
}
}
}
This will give you all documents, that matches the regexp for a PAN.
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "testdata",
"_type": "logs",
"_id": "AVm2NAF6bFiGlYewuu9_",
"_score": 1,
"_source": {
"pan": "4000000000000002",
"firstname": "John",
"lastname": "Legend",
"profession": "Musician"
}
},
{
"_index": "testdata",
"_type": "logs",
"_id": "AVm2NCI_bFiGlYewuu-F",
"_score": 1,
"_source": {
"fistname": "4026000000000002",
"lastname": "Danger"
}
},
{
"_index": "testdata",
"_type": "logs",
"_id": "AVm2NEF0bFiGlYewuu-L",
"_score": 1,
"_source": {
"merchant": "tancomat 3000",
"firstname": "Beat",
"lastname": "Sommer",
"profession": "issuer",
"comment": "5100000000000008"
}
}
]
}
}
Exclude fields
You may want to exclude certain fields from the _all
search. In the index type mapping just configure it not to include in the all field. See below the example for date field.
PUT my_index
{
"mappings": {
"my_type": {
"properties": {
"title": {
"type": "text"
},
"content": {
"type": "text"
},
"date": {
"type": "date",
"include_in_all": false
}
}
}
}
}
Summary
-
_all
is a metadata field - all values are appended into one field value as concatenated String
- is enabled by default
️ But remember:
The
_all
field is not free: it requires extra CPU cycles and uses more disk space. If not needed, it can be completely disabled or customised on a per-field basis.