ElasticSearch Cluster Management

Index Management

Index Template

For time-based index for your stats, we want to create index per day or month. So, expired index can be easily removed. Since most of the queries can be fulfilled by the most recent indices, elasticsearch can make use of cache to improve performance. However, to create an index per day is tedious operational task that can be automated by index template. Below is an example of index template and how we create it and use it.

curl -XPUT localhost:9200/_template/dns_template -d ' 
{    
    "template" : "dnslog-*",    
    "settings" : {        
        "number_of_shards" : 3,        
        "number_of_replicas": 1    
    },    
    "mappings" : {        
        "log" : {            
            "properties" : {               
                "Timestamp" : {"type" : "date"},                
                "URL" : {"type" : "string", "index": "not_analyzed"},                
                "IP Address" : {"type": "ip"}            
            }        
        }    
    } 
} '

Once index template is created, it will apply for your next document index.

curl -XPOST 'http://localhost:9200/dnslog-2015-04-09/log' -d '
{    
    "Timestamp" : "2015-04-09T14:12:12",    
    "URL" : "opendns.com/enterprise-security",    
    "IP Address" : "127.0.0.1" 
}'

Index Upgrade and Migration

Cluster Migration

Monitoring

What to monitor

Cluster health
Heap usage + GC patterns
Load/cpu usage
Index growth/ statistics
Query response time
Cache evictions/ Usage
Thread pools

Tools

# install the head plugin and access it via: http://<hostname>:9200/_plugin/head/
bin/plugin --install mobz/elasticsearch-head

Alert

We need a system component that continuously checks clusters for potential problems and creates an event to notify us through email or SMS. Alert types include:

Disk capacity
Unresponsive nodes
High heap usage
Unassigned shards
Too many shards

http://qbox.io/blog/automatic-cluster-alerting

Cluster Testing

Note: Configuration changes can be done in both:

elasticsearch.yml and
Cluster setting API

Test failure modes

How many nodes can you lose and still accept reads and writes?
What happens if one of your nodes runs of the disk space?
What happens if you set ES_HEAP_SIZE too small?

Test rolling cluster restart

For example, how to upgrade elasticsearch version?

# Rolling restart process
# Prevent rebalancing whilst our node is unavailable.
$ curl -XPUT 'http://some-node:9200/_cluster/settings' -d '{
    "transient" : {
        "cluster.routing.allocation.enable" : "none"
    }
}'

# Shutdown this node.
$ curl -XPOST 'http://some-node:9200/_cluster/nodes/_local/_shutdown'

# Now start the node again, and verify that it's joined the cluster.

# Enable rebalancing again.
$ curl -XPUT 'http://some-node:9200/_cluster/settings' -d '{
    "transient" : {
        "cluster.routing.allocation.enable" : "all"
    }
}'

# Repeat on other nodes.

Cluster rebalancing can be slow. You can improve performance by increasing the bandwidth limit and max number of shards that a node will recover.

# The smaller your shards are, the higher you should set this value.
# Since logstash splits its data into daily indices, you may have
# a larger number of smaller shards.

cluster.routing.allocation.node_concurrent_recoveries: 5
indices.recovery.max_bytes_per_sec: 50mb

Technical Notes