WYSIWYG

http://kufli.blogspot.com
http://github.com/karthik20522

Tuesday, April 14, 2015

ReIndexing Elasticsearch in Scala

The following scala script reads from one index and writes to another script using Scan and scroll method. The script also takes in a partial function where the values from one index can be manipulated before saving into another index. This script assumes you have a field called "id" and an field called "submitDate" so it can continually perform scan and scroll once the preliminary index copy is done, so keep the index's in sync. Notes:
  • The ESClient is an extension of on wabisabi Library for elasticsearch
  • The Actor initially performs a scan-scroll with submit date gte 1900
  • Once the initial scan-scroll is done, it pauses for a minute and performs a scan-scroll again with the submitDate of previous endTime (dateTime.now minus 1 minute)
  • This way every minute after the previous run it will continually keep the index in sync
  • The partial function "processData" provides a way to manipulate the original data, manipulate it and save it to the new index
  • Bulk-indexing is used for saving to the new index, hence a the "id" field is required to determine the "id" of the new document
Usage:

Labels: , ,