ElasticSearch Cheatsheet

Index Manipulation

  • field definition
    • type: string, integer, long, date, float, double, boolean, geo_point
    • index: non_analyzed, analyzed (default = analyze)
    • store: true, false (default = false)
    • null_value: "na"
    • norms
    • similarity: default: TF/IDF, BM25
    • fielddata
# delete index
curl -s -XDELETE 'http://localhost:9200/[index_name]/'

# create index with mapping and custom index
curl -s -XPUT 'http://localhost:9200/[index_name]/' -d '{
  "mappings": {
    "document": {
      "properties": {
        "content": {
          "type": "string",
          "analyzer" : "lowercase_with_stopwords"
        }
      }
    }
  },
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
    },
    "analysis": {
      "tokenizer" : {
        "pipe":{
           "type" : "pattern",
           "pattern" : "\\|"
        }
      },
      "filter" : {
        "stopwords_filter" : {
          "type" : "stop",
          "stopwords" : ["http", "https", "ftp", "www"]
        }
      },
      "analyzer": {
        "lowercase_with_stopwords": {
          "type": "custom",
          "tokenizer": "lowercase",
          "filter": [ "stopwords_filter" ]
        },
        "csv": {
          "type": "custom",
          "tokenizer": "pipe",
          "filter":["trim", "lowercase"]
        }
      }
    }
  }
}'

# analyze view
http://localhost:9200/[index_name]/_analyze?analyzer=csv&text=Kobe%20Bryant|Lamar%20Odom&pretty

# index document
curl -s -XPUT 'http://localhost:9200/[index_name]/document/1?pretty=true' -d '{
  "content" : "Small content with URL http://example.com."
}'

# retrieve document
curl -s XGET http://localhost:9200/[index_name]/document/1

# refresh index
curl -s -XPOST 'http://localhost:9200/[index_name]/_refresh'
# search thru url
http://localhost:9200/[index]/[type]/_search?[parameter list]

parameter list:
  * q=title:elasticsearch
  * from=10&size=10 (default size = 10)
  * sort=date:asc (once used no built-in relevancy scoring used)
  * _source=title,date (used if raw document size is too big)

# search with dsl
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
  "query" : {
    "query_string" : {
        "query" : "content:example"
    }
  }
}'

Search operators thru DSL

  • term (exact match on non-analyzed field or whole document)
  • match (by defaut search by term under OR opertaor and you can change it via doing "operator": "and")
  • multi_match (similar to match but it can apply to > 1 fields)
  • query_string: default search against _all field but you can do [field]:[value] to limit it. You can do perform compound query and negative like " name:nosql AND -description:mongodb"
# phrase match
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
  "query" : {
    "match" : {
        "name" : {
            "type" : "phrase",
            "query": "enterprise london"
            "slop": 1
         }
    }
  },
  "_source":["name", "desc"]
}'

# multi match against > 1 fields
% curl 'localhost:9200/get-together/_search' -d'{
  "query": {
    "multi_match": {
      "query": "elasticsearch hadoop",
      "fields": [ "name", "description" ]
    }
  } 
}'

# phrase prefix search
% curl 'localhost:9200/get-together/group/_search' –d '
{
  "query": {
    "match": {
        "name": {
             "type": "phrase_prefix",
            "query": "Elasticsearch den",
            "max_expansions": 1
        } 
    }
  },
  "_source": ["name"]
}'

# compound queries

Aggregation

Aggregations can be categorized as either Metrics Aggregations or Bucket Aggregations. Metrics Aggregations return a value (single-value e.g. avg) or values (multi-value e.g. stats) calculated over documents returned by the query. Bucket aggregations define criteria to put documents into relevant groups (called buckets).

"aggregations" : {
    "<aggregation_name>" : {
        "<aggregation_type>" : {
            <aggregation_body>
        },
        ["aggregations" : { [<sub_aggregation>]* } ]
    }
    [,"<aggregation_name_2>" : { ... } ]*
}

Each aggregation can have:

  • name
  • type
    • value_count
    • cardinality (distinct unqiue)
    • terms (specify a field and show buckets. If the field is analyzed, you are bucketed by its analyzed terms. If the field is non-analyzed, you are bucketed by its whole phrase)
    • avg
    • range
    • geo_distance - returns number of documents within a distance range from a specified origin
  • body
  • sub-aggregation

ref: http://zaiste.net/2014/06/concisely_about_aggregations_in_elasticsearch/

Management

# install elasticsearch plugins
./bin/plugin -install mobz/elasticsearch-head
./bin/plugin --install lmenezes/elasticsearch-kopf/1.2
./bin/plugin --install elasticsearch/marvel/latest

# list installed plugins
./bin/plugin --list

# view it
http://localhost:9200/_plugin/head/
http://localhost:9200/_plugin/marvel/sense/index.html

# cluster health
http://localhost:9200/_cluster/health?pretty
http://localhost:9200/_cluster/health?level=indices&pretty
http://localhost:9200/_cluster/health?level=shards&pretty
# see which node is elected as a master
http://localhost:9200/_cluster/state/master_node,nodes?pretty 

# decommissioning a node (elasticsearch will move all shards from the decommissioned node to other nodes in the cluster)
curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
       "cluster.routing.allocation.exclude._ip" : "192.168.1.10"
    }
}'

# check index mapping
http://localhost:9200/[index_name]/_mapping/[type]?pretty

# list all aliases fo an index
http://localhost:9200/[index_name]/_alias/*?pretty

# list nodes with plugins
http://localhost:9200/_nodes?plugin=true&pretty

Reference