WYSIWYG

http://kufli.blogspot.com
http://github.com/karthik20522

Friday, November 28, 2014

Elasticsearch - No downtime reindexing

As you probably know that mappings in elasticsearch cannot be changed, for example like changing a property type from a string to an int etc. The only way to make such changes is to copy the entire index into a brand new index with new mappings.

Reindexing is an unavoidable common practice as data model changes effects how data is indexed in elastic search. So while designing the system, having an alias assigned to all indexes is a good choice as we can swap indexes in and out. Alias is basically providing an alternate name to an index. For example:



Now all that you need to do is to create a new index with new mappings and copy the data over from the original index to the new index. To perform a bulk copy operation, I prefer to use tools such as elasticsearch-dump which helps in this bulk copy operation.

Following query performs a copy from one keyword index to a second index: Now all that you need to do is to delete the alias from the original index and assign it the new index. This way the calling client using the alias for querying and indexing would have no impact But what about the documents that were updated during the scan and scroll process? Well, that's tricky but if your model does have a update date property you can always re run the es dump to fetch only the documents that were updated after a certain date time.

Labels: