Loading...

Reindex Subset Data in Elasticsearch

:heavy_exclamation_mark: This post is older than a year. Consider some information might not be accurate anymore. :heavy_exclamation_mark:

The Elasticsearch Reindex API is a powerful way to index a subset of data from existing data. If you think of a long term statistics solution, you can aggregate data and store the aggregated values instead the atomic details. In my company we have an index that contains approximately 150 fields in each document. For a longterm solution only 30 are relevant. The Reindex API can just fetch the 30 desired fields and store them in a new index.

The reindex template

curl -XPOST "http://elasticsearch:9200/_reindex" -H 'Content-Type: application/json' -d'{
"source": {
 "index": "source-index-2017.07.26",
  "_source": [
     "field_1",
     "field_2",
     ..
     "field_30",      
   ],
   "query": {
     "match_all": {}
   }
},
"dest": {
 "index": "target-index-2017.07.26"
}}'

The general approach is to use source filtering for reindex action.

Please remember the terms for blog comments.