ReIndexing Elasticsearch in Scala
The following scala script reads from one index and writes to another script using Scan and scroll method. The script also takes in a partial function where the values from one index can be manipulated before saving into another index. This script assumes you have a field called "id" and an field called "submitDate" so it can continually perform scan and scroll once the preliminary index copy is done, so keep the index's in sync. Notes:
- The ESClient is an extension of on wabisabi Library for elasticsearch
- The Actor initially performs a scan-scroll with submit date gte 1900
- Once the initial scan-scroll is done, it pauses for a minute and performs a scan-scroll again with the submitDate of previous endTime (dateTime.now minus 1 minute)
- This way every minute after the previous run it will continually keep the index in sync
- The partial function "processData" provides a way to manipulate the original data, manipulate it and save it to the new index
- Bulk-indexing is used for saving to the new index, hence a the "id" field is required to determine the "id" of the new document
Labels: Actors, Elasticsearch, Scala
<< Home