ElasticSearch Cheatsheet
Index Manipulation
- field definition
- type: string, integer, long, date, float, double, boolean, geo_point
- index: non_analyzed, analyzed (default = analyze)
- store: true, false (default = false)
- null_value: "na"
- norms
- similarity: default: TF/IDF, BM25
- fielddata
# delete index
curl -s -XDELETE 'http://localhost:9200/[index_name]/'
# create index with mapping and custom index
curl -s -XPUT 'http://localhost:9200/[index_name]/' -d '{
"mappings": {
"document": {
"properties": {
"content": {
"type": "string",
"analyzer" : "lowercase_with_stopwords"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"analysis": {
"tokenizer" : {
"pipe":{
"type" : "pattern",
"pattern" : "\\|"
}
},
"filter" : {
"stopwords_filter" : {
"type" : "stop",
"stopwords" : ["http", "https", "ftp", "www"]
}
},
"analyzer": {
"lowercase_with_stopwords": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [ "stopwords_filter" ]
},
"csv": {
"type": "custom",
"tokenizer": "pipe",
"filter":["trim", "lowercase"]
}
}
}
}
}'
# analyze view
http://localhost:9200/[index_name]/_analyze?analyzer=csv&text=Kobe%20Bryant|Lamar%20Odom&pretty
# index document
curl -s -XPUT 'http://localhost:9200/[index_name]/document/1?pretty=true' -d '{
"content" : "Small content with URL http://example.com."
}'
# retrieve document
curl -s XGET http://localhost:9200/[index_name]/document/1
# refresh index
curl -s -XPOST 'http://localhost:9200/[index_name]/_refresh'
Search
# search thru url
http://localhost:9200/[index]/[type]/_search?[parameter list]
parameter list:
* q=title:elasticsearch
* from=10&size=10 (default size = 10)
* sort=date:asc (once used no built-in relevancy scoring used)
* _source=title,date (used if raw document size is too big)
# search with dsl
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
"query" : {
"query_string" : {
"query" : "content:example"
}
}
}'
Search operators thru DSL
- term (exact match on non-analyzed field or whole document)
- match (by defaut search by term under OR opertaor and you can change it via doing "operator": "and")
- multi_match (similar to match but it can apply to > 1 fields)
- query_string: default search against _all field but you can do [field]:[value] to limit it. You can do perform compound query and negative like " name:nosql AND -description:mongodb"
# phrase match
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
"query" : {
"match" : {
"name" : {
"type" : "phrase",
"query": "enterprise london"
"slop": 1
}
}
},
"_source":["name", "desc"]
}'
# multi match against > 1 fields
% curl 'localhost:9200/get-together/_search' -d'{
"query": {
"multi_match": {
"query": "elasticsearch hadoop",
"fields": [ "name", "description" ]
}
}
}'
# phrase prefix search
% curl 'localhost:9200/get-together/group/_search' –d '
{
"query": {
"match": {
"name": {
"type": "phrase_prefix",
"query": "Elasticsearch den",
"max_expansions": 1
}
}
},
"_source": ["name"]
}'
# compound queries
Aggregation
Aggregations can be categorized as either Metrics Aggregations or Bucket Aggregations. Metrics Aggregations return a value (single-value e.g. avg) or values (multi-value e.g. stats) calculated over documents returned by the query. Bucket aggregations define criteria to put documents into relevant groups (called buckets).
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
},
["aggregations" : { [<sub_aggregation>]* } ]
}
[,"<aggregation_name_2>" : { ... } ]*
}
Each aggregation can have:
- name
- type
- value_count
- cardinality (distinct unqiue)
- terms (specify a field and show buckets. If the field is analyzed, you are bucketed by its analyzed terms. If the field is non-analyzed, you are bucketed by its whole phrase)
- avg
- range
- geo_distance - returns number of documents within a distance range from a specified origin
- body
- sub-aggregation
ref: http://zaiste.net/2014/06/concisely_about_aggregations_in_elasticsearch/
Management
# install elasticsearch plugins
./bin/plugin -install mobz/elasticsearch-head
./bin/plugin --install lmenezes/elasticsearch-kopf/1.2
./bin/plugin --install elasticsearch/marvel/latest
# list installed plugins
./bin/plugin --list
# view it
http://localhost:9200/_plugin/head/
http://localhost:9200/_plugin/marvel/sense/index.html
# cluster health
http://localhost:9200/_cluster/health?pretty
http://localhost:9200/_cluster/health?level=indices&pretty
http://localhost:9200/_cluster/health?level=shards&pretty
# see which node is elected as a master
http://localhost:9200/_cluster/state/master_node,nodes?pretty
# decommissioning a node (elasticsearch will move all shards from the decommissioned node to other nodes in the cluster)
curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" : {
"cluster.routing.allocation.exclude._ip" : "192.168.1.10"
}
}'
# check index mapping
http://localhost:9200/[index_name]/_mapping/[type]?pretty
# list all aliases fo an index
http://localhost:9200/[index_name]/_alias/*?pretty
# list nodes with plugins
http://localhost:9200/_nodes?plugin=true&pretty