Elasticsearch - Advanced settings and Tweaks
Now that we have Elasticsearch installed and confirmed working, we can start looking into more advanced settings, more of tweaking, to improve Elasticsearch performance. For most use cases, following three area's of Elasticsearch configuration needs to be addressed:
- Memory configuration
- Threadpool configuration
- Data Store configuration
Memory configuration:
By default Elasticsearch assigns the minimum heap size of 256MB and 1GB maximum heap size. But in real world server environments with many gb in memory availablity, it;s always good to provide 50% of the server memory as a rule of thumb to Elasticsearch process. This setting can be set using:1 | $ export ES_HEAP_SIZE=2048m |
1 2 | #elasticsearch.yml bootstrap.mlockall: true |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | $curl http: //localhost:9200/_nodes/process?pretty "nodes" : { "Oaa7jVWNSeyHlmhyYCr32g" : { "name" : "Mister Fear" , "transport_address" : "inet[/127.0.0.1:9300]" , "host" : "ip-127-0-0-1" , "ip" : "127.0.0.1" , "version" : "1.3.2" , "build" : "dee175d" , "http_address" : "inet[/127.0.0.1:9200]" , "attributes" : { "master" : "true" }, "process" : { "refresh_interval_in_millis" : 1000, "id" : 3599, "max_file_descriptors" : 100000, "mlockall" : false } } |
Note that you will always have to run ulimit -l unlimited before elasticsearch restart or else mlockall is set back to false, this is probably because the the User ESprocess is running on is not root
Threadpool Configuration:
Elasticsearch can holds several thread pools with a queue bound to each of these pools which allow pending requests to be held instead of discarded. For example, by default for index operation, it has a fixed thread pool size of # no of processors in the system and a queue_size of 200. So if there are more than 200 requests, the new requests are discarded and following exception is returned back to the client: EsRejectedExecutionException[rejected execution (queue capacity 200)..]To overcome this limitation and increase the concurrency of elasticsearch processing messages, following setting are be tweaked:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | #elasticsearch.yml #for search operation threadpool.search.type: fixed threadpool.search.size: 50 threadpool.search.queue_size: 200 #for bulk operations threadpool.bulk.type: fixed threadpool.bulk.size: 10 threadpool.bulk.queue_size: 100 #for indexing operations threadpool.index.type: fixed threadpool.index.size: 60 threadpool.index.queue_size: 1000 |
ES by default assumes that you're going to use it mostly for searching and querying, so it allocates 90% of its allocated total HEAP memory for searching. This can be changed with the following settings. Note that implication of this setting can be significant as you are reducing the memory allocated for search purposes!
1 2 3 | #elasticsearch.yml indices.memory.index_buffer_size: 30% #above settings grants ES 30% of it's heap memory for index buffer purpose |
Store and indices Configuration:
The store module allows you to control how index data is stored. The index can either be stored in-memory (no persistence) or on-disk (the default). Unless your data is temporary data using in-memory store is a bad idea as you will loose the data upon restart. For disk based storage, we need to have fast disk seeks if the data to be looked up is not in memory. The most optimal way is to use mmap fs which is basically memory mapped files.1 2 | #elasticsearch.yml index.store.type: mmapfs |
Labels: Elasticsearch
<< Home