ElasticSearch Cluster Management

Index Management

Index Template

For time-based index for your stats, we want to create index per day or month. So, expired index can be easily removed. Since most of the queries can be fulfilled by the most recent indices, elasticsearch can make use of cache to improve performance. However, to create an index per day is tedious operational task that can be automated by index template. Below is an example of index template and how we create it and use it.

curl -XPUT localhost:9200/_template/dns_template -d ' 
{    
    "template" : "dnslog-*",    
    "settings" : {        
        "number_of_shards" : 3,        
        "number_of_replicas": 1    
    },    
    "mappings" : {        
        "log" : {            
            "properties" : {               
                "Timestamp" : {"type" : "date"},                
                "URL" : {"type" : "string", "index": "not_analyzed"},                
                "IP Address" : {"type": "ip"}            
            }        
        }    
    } 
} '

Once index template is created, it will apply for your next document index.

curl -XPOST 'http://localhost:9200/dnslog-2015-04-09/log' -d '
{    
    "Timestamp" : "2015-04-09T14:12:12",    
    "URL" : "opendns.com/enterprise-security",    
    "IP Address" : "127.0.0.1" 
}'

Index Upgrade and Migration

Cluster Migration


Monitoring

What to monitor

  • Cluster health
  • Heap usage + GC patterns
  • Load/cpu usage
  • Index growth/ statistics
  • Query response time
  • Cache evictions/ Usage
  • Thread pools

Tools

# install the head plugin and access it via: http://<hostname>:9200/_plugin/head/
bin/plugin --install mobz/elasticsearch-head


Alert

We need a system component that continuously checks clusters for potential problems and creates an event to notify us through email or SMS. Alert types include:

  • Disk capacity
  • Unresponsive nodes
  • High heap usage
  • Unassigned shards
  • Too many shards

http://qbox.io/blog/automatic-cluster-alerting


Cluster Testing

Note: Configuration changes can be done in both:

Test failure modes

  • How many nodes can you lose and still accept reads and writes?
  • What happens if one of your nodes runs of the disk space?
  • What happens if you set ES_HEAP_SIZE too small?

Test rolling cluster restart

  • For example, how to upgrade elasticsearch version?
# Rolling restart process
# Prevent rebalancing whilst our node is unavailable.
$ curl -XPUT 'http://some-node:9200/_cluster/settings' -d '{
    "transient" : {
        "cluster.routing.allocation.enable" : "none"
    }
}'

# Shutdown this node.
$ curl -XPOST 'http://some-node:9200/_cluster/nodes/_local/_shutdown'

# Now start the node again, and verify that it's joined the cluster.

# Enable rebalancing again.
$ curl -XPUT 'http://some-node:9200/_cluster/settings' -d '{
    "transient" : {
        "cluster.routing.allocation.enable" : "all"
    }
}'

# Repeat on other nodes.

Cluster rebalancing can be slow. You can improve performance by increasing the bandwidth limit and max number of shards that a node will recover.

# The smaller your shards are, the higher you should set this value.
# Since logstash splits its data into daily indices, you may have
# a larger number of smaller shards.

cluster.routing.allocation.node_concurrent_recoveries: 5
indices.recovery.max_bytes_per_sec: 50mb

Load Testing