ElasticSearch Cluster Management
Index Management
Index Template
For time-based index for your stats, we want to create index per day or month. So, expired index can be easily removed. Since most of the queries can be fulfilled by the most recent indices, elasticsearch can make use of cache to improve performance. However, to create an index per day is tedious operational task that can be automated by index template. Below is an example of index template and how we create it and use it.
curl -XPUT localhost:9200/_template/dns_template -d '
{
"template" : "dnslog-*",
"settings" : {
"number_of_shards" : 3,
"number_of_replicas": 1
},
"mappings" : {
"log" : {
"properties" : {
"Timestamp" : {"type" : "date"},
"URL" : {"type" : "string", "index": "not_analyzed"},
"IP Address" : {"type": "ip"}
}
}
}
} '
Once index template is created, it will apply for your next document index.
curl -XPOST 'http://localhost:9200/dnslog-2015-04-09/log' -d '
{
"Timestamp" : "2015-04-09T14:12:12",
"URL" : "opendns.com/enterprise-security",
"IP Address" : "127.0.0.1"
}'
Index Upgrade and Migration
Cluster Migration
Monitoring
What to monitor
- Cluster health
- Heap usage + GC patterns
- Load/cpu usage
- Index growth/ statistics
- Query response time
- Cache evictions/ Usage
- Thread pools
Tools
# install the head plugin and access it via: http://<hostname>:9200/_plugin/head/
bin/plugin --install mobz/elasticsearch-head
Alert
We need a system component that continuously checks clusters for potential problems and creates an event to notify us through email or SMS. Alert types include:
- Disk capacity
- Unresponsive nodes
- High heap usage
- Unassigned shards
- Too many shards
http://qbox.io/blog/automatic-cluster-alerting
Cluster Testing
Note: Configuration changes can be done in both:
- elasticsearch.yml and
- Cluster setting API
Test failure modes
- How many nodes can you lose and still accept reads and writes?
- What happens if one of your nodes runs of the disk space?
- What happens if you set ES_HEAP_SIZE too small?
Test rolling cluster restart
- For example, how to upgrade elasticsearch version?
# Rolling restart process
# Prevent rebalancing whilst our node is unavailable.
$ curl -XPUT 'http://some-node:9200/_cluster/settings' -d '{
"transient" : {
"cluster.routing.allocation.enable" : "none"
}
}'
# Shutdown this node.
$ curl -XPOST 'http://some-node:9200/_cluster/nodes/_local/_shutdown'
# Now start the node again, and verify that it's joined the cluster.
# Enable rebalancing again.
$ curl -XPUT 'http://some-node:9200/_cluster/settings' -d '{
"transient" : {
"cluster.routing.allocation.enable" : "all"
}
}'
# Repeat on other nodes.
Cluster rebalancing can be slow. You can improve performance by increasing the bandwidth limit and max number of shards that a node will recover.
# The smaller your shards are, the higher you should set this value.
# Since logstash splits its data into daily indices, you may have
# a larger number of smaller shards.
cluster.routing.allocation.node_concurrent_recoveries: 5
indices.recovery.max_bytes_per_sec: 50mb