WYSIWYG: April 2015

Tuesday, April 14, 2015

ReIndexing Elasticsearch in Scala

The following scala script reads from one index and writes to another script using Scan and scroll method. The script also takes in a partial function where the values from one index can be manipulated before saving into another index. This script assumes you have a field called "id" and an field called "submitDate" so it can continually perform scan and scroll once the preliminary index copy is done, so keep the index's in sync. Notes:

The ESClient is an extension of on wabisabi Library for elasticsearch
The Actor initially performs a scan-scroll with submit date gte 1900
Once the initial scan-scroll is done, it pauses for a minute and performs a scan-scroll again with the submitDate of previous endTime (dateTime.now minus 1 minute)
This way every minute after the previous run it will continually keep the index in sync
The partial function "processData" provides a way to manipulate the original data, manipulate it and save it to the new index
Bulk-indexing is used for saving to the new index, hence a the "id" field is required to determine the "id" of the new document

Usage:

Labels: Actors, Elasticsearch, Scala

WYSIWYG

Tuesday, April 14, 2015

ReIndexing Elasticsearch in Scala

WYSIWYG

About Me

Previous Posts

Archives