WYSIWYG: ReIndexing Elasticsearch in Scala

The following scala script reads from one index and writes to another script using Scan and scroll method. The script also takes in a partial function where the values from one index can be manipulated before saving into another index. This script assumes you have a field called "id" and an field called "submitDate" so it can continually perform scan and scroll once the preliminary index copy is done, so keep the index's in sync. Notes:

The ESClient is an extension of on wabisabi Library for elasticsearch
The Actor initially performs a scan-scroll with submit date gte 1900
Once the initial scan-scroll is done, it pauses for a minute and performs a scan-scroll again with the submitDate of previous endTime (dateTime.now minus 1 minute)
This way every minute after the previous run it will continually keep the index in sync
The partial function "processData" provides a way to manipulate the original data, manipulate it and save it to the new index
Bulk-indexing is used for saving to the new index, hence a the "id" field is required to determine the "id" of the new document

Usage:

Labels: Actors, Elasticsearch, Scala

WYSIWYG

Tuesday, April 14, 2015

ReIndexing Elasticsearch in Scala

WYSIWYG

About Me

Previous Posts