Elasticsearch - Dynamic Data Mapping
Data in Elasticsearch can be indexed without providing any information about it's content as ES accepts dynamic properties and ES detects if the property value is a string, integer, datetime, boolean etc. In this article, lets work on getting dynamic mapping setup the right way along with some commonly performed search operations.
To start with a simple example. Lets consider the following object:
1 2 3 4 5 6 | $ curl -XPOST http: //localhost:9200/keywords/keyword/61669 -d '{ "keywordId" : 61669, "keywordText" : "Massaging" , "keywordType" : "Submitted" }' |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | { "keywords" : { "mappings" : { "keyword" : { "properties" : { "keywordId" : { "type" : "long" }, "keywordText" : { "type" : "string" }, "keywordType" : { "type" : "string" } } } } } } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | $ curl -XPUT http: //localhost:9200/keywords -d '{ "mappings" : { "keyword" : { "dynamic" : "true" , "properties" : { "keywordType" : { "type" : "string" , "index" : "not_analyzed" } } } } }' |
Now that we have KeywordType to "not_analyzed" which basically now is an "exact match" search including the case (upper and lower case). But how do I make KeywordType to be a case insensitive exact match? One way is to to lower the keywordType and have the calling system provide a lower case searches only. For this, the following mapping changes need to happen:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | $ curl -XPUT http: //localhost:9200/keywords -d '{ "settings" : { "index" : { "analysis" : { "analyzer" : { "analyzer_keyword" : { "tokenizer" : "keyword" , "filter" : "lowercase" } } } } }, "mappings" : { "keyword" : { "dynamic" : "true" , "properties" : { "keywordType" : { "type" : "string" , "analyzer" : "analyzer_keyword" } } } } }' |
So far so good but I dont want users to search on all fields which Elasticsearch by default provides. I would rather have the user provide which field they want to search on. Why? Doing a "_all" search on an index with 100's of fields is a very expensive operation; that's why! To disable "_all" search:
1 2 3 4 5 6 7 8 9 10 11 | #mapping configuration from above "mappings" : { "keyword" : { "_ttl" : { "enabled" : true , "default" : "5d" }, "dynamic" : "true" , "_all" : { "enabled" : false }, ... } |
1 2 3 4 5 6 7 8 9 10 11 12 13 | #mapping configuration from above ... "properties" : { "keywordType" : { "type" : "string" , "analyzer" : "analyzer_keyword" }, "blob" : { "type" : "string" , "enabled" : false } } ... |
Since dynamic mapping is enabled, Elasticsearch parses thru every single property to determine it;s type. As much I would love for Elasticsearch to perform all the magic mappings, let give it a helping hand by letting the Elasticsearch know that certain properties are DateTime type based on it;s name.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #mapping configuration from above ... "mappings" : { "keyword" : { "dynamic" : "true" , "date_detection" : false , "dynamic_templates" : [ { "date_index" : { "mapping" : { "type" : "date" }, "match" : ".*Date|date" , "match_pattern" : "regex" } } ] ... |
How about make all string into exact lower case match?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | #mapping configuration from above ... "dynamic_templates" : [ { "date_index" : { "mapping" : { "type" : "date" }, "match" : ".*Date|date" , "match_pattern" : "regex" } }, { "string_index" : { "mapping" : { "analyzer" : "analyzer_keyword" , "type" : "string" }, "match" : "*" , "match_mapping_type" : "string" } } ] ... |
Just as an extra, Elasticsearch provides a way to match templates to index names. This means that we provide elasticsearch a template file with mapping information and when an index is created, ES automatically matches the index name with the give template and auto applies the mappings. This template needs to be saved in the "/etc/elasticsearch/templates" folder. An example template file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | #/etc/elasticsearch/templates/keywords_template.json { "keywords_template" : { "template" : "keywords" , "order" : 0, "settings" : { "index.number_of_shards" : 7, "index.number_of_replicas" : 1 }, "mappings" : { "keyword" : { "dynamic" : "true" , "dynamic_templates" : [ { "disable_string_index" : { "mapping" : { "type" : "string" , "index" : "not_analyzed" , "enabled" : false }, "match" : "*" , "match_mapping_type" : "string" } } ], "_all" : { "enabled" : false } } } } } |
Labels: Elasticsearch
<< Home