
Elasticsearch APIs
There are many APIs available for managing Elasticsearch. These APIs help us to manage cluster, indices, search, and so on. In this section, we will look at each of these APIs in detail.
We can use these APIs through Command Prompt, Console in Kibana, or any tool that can make calls to RESTful APIs.
Note
By default, Elasticsearch runs on port 9200
to listen to HTTP requests. Kibana uses the same port to connect to Elasticsearch. To learn more about Console, refer to Chapter 4, Kibana Interface, Exploring Dev tools section.
Sense is a powerful plugin for Kibana that allows us to make calls to Elasticsearch APIs using a web interface. We will be learning about Sense in Chapter 8, Elasticsearch APIs. For this chapter, we will be using cURL, a Command Prompt utility that allows us to access HTTP requests to access the APIs.
A typical cURL request against ES contains a verb, URL, and message body:
$ curl -X{Verb} 'url' -d '{message-body}'
Verbs are GET
, PUT
, POST
, DELETE
, and HEAD
, and URL
is either HTTP (by default), or HTTPS. Message body is usually the document we want to index, query we want to perform, or some command we want Elasticsearch to follow.
Document APIs
These APIs allow users to add document(s) to index, get those documents, and edit and delete operations. These APIs are pided into two groups, as discussed in the following sections.
Single document APIs
These APIs are applicable when operations are performed on one document at a time. These can be further pided as follows:
- Index API
- Get API
- Delete API
- Update API
Index API
The Index API adds a JSON document to the index. To understand this, let's take an example of a library in which we need to add a book:
$ curl -XPUT 'http://localhost:9200/library/book/1?pretty' -d '{ "author" : "Ravi Kumar Gupta", "title" : "Test-Driven Javascript Development", "pages" : 240 }'
Tip
We have used pretty at the end of uri
to pretty print the JSON output (if any).
This will automatically create an index named library
if not already present, and add a book to the index. If the index library is already present, it will add the document to it unless there is a document with the same ID present.
Look at the following code pieces:
$ curl -XPUT 'http://localhost:9200/library/book/1?op_type=create&pretty' -d '{ "author" : "Ravi Kumar Gupta", "title" : "Test-Driven Javascript Development", "pages" : 240 }' $ curl -XPUT 'http://localhost:9200/library/book/1/_create?pretty' -d '{ "author" : "Ravi Kumar Gupta", "title" : "Test-Driven JavaScript Development", "pages" : 240 }'
In the preceding code and in the first command, we have mentioned op_type
to set the operation type and set the value as create. We can achieve the same thing by appending /_create
at the end of the URI. Both will result in the same output and they will add this document to the index. By default, the operation is create
in case we do not specify (as in the first operation of the previous code listing). Any of the three commands mentioned previously will create an index library if it does not exist and add an entry as provided for author, title, and pages.
Note
When using a create command explicitly either by using the _create
endpoint or by using the op_type
parameter, in case the document is already present with the same ID, we will get version_conflict_engine_exception
.
Let's analyze the output of the preceding code:
{ "_index" : "library", "_type" : "book", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true }
The output shows that the document has been added to an index named library
and the type
assigned is book
. The documents get the id
specified as 1
with version
as 1
. The value of result
is set to created
. The value true
of created
shows that the operation was successful. If the same document was already present in the index, while doing the same operation, we will get the value of result
as updated
, _version
will be updated to next, that is, 2
in this case, and the value of created
would be false
. The shard
header tells us that there are total of 2
shards, out of which document got copied to 1
shard successfully. If the value of successful results is at least one, the operation is considered as successful.
There would be five shards by default and one replica for each. Out of five shards, two are chosen for indexing the document. The number of successful shards depends on your cluster settings as well. If there is only one node, then the replica would not be present, and if you notice, the cluster/node health would be yellow. So the value of successful is often less than the total shards. In case you have more than one node set up in the cluster, the value will likely be same. The following is an output of the same call made to a cluster with two nodes:
{ "_index": "library", "_type": "book", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 2, "failed": 0 }, "created": true }
As we can see, the document was properly replicated. All we need is a positive value of field successful in the output. The document will be copied to multiple shards/replicas as soon as one is available.
Note that we have added an id1
in the uri
, /library/book/1/_create
. In case we want this ID to be generated automatically, we can use the verb POST
instead of PUT
for curl
.
If we use POST
, then it will automatically create the index along with ID even if index doesn't exist.
Let's add another book:
$ curl -XPOST 'http://localhost:9200/library/book?pretty' -d '{
"author" : "Yuvraj Gupta",
"title" : "Kibana Essentials",
"pages" : 210
}'
Let's analyze what the output will be now:
{ "_index" : "library", "_type" : "book", "_id" : "AVSPSXXDxiIiqkaLJfTy", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "created" : true }
This time everything is similar, but the value of id
is automatically generated by Elasticsearch.
Routing
By default, Elasticsearch, based on the id
of the document, decides which shard to put a document into. Using the id
of the document, it calculates a hash and makes a decision to select one or more shards. Instead of Elasticsearch making a decision based on id, we can provide a value that will be used to calculate the hash. This process of shard allocation is called routing. We can use a query parameter routing
with the uri
, as shown in the following command:
$ curl -XPOST 'http://localhost:9200/library/book?pretty&routing=books' -d '{ "author" : "Yuvraj Gupta", "title" : "Kibana Essentials", "pages" : 210 }'
In the preceding example, the Kibana Essentials
book will be placed into a shard based on the hash calculated using the books
routing parameter.
Get API
The Index API helped us to add a document into the index and the Get API is used to get a document using the ID of the document. Let's try to get the document we stored with id
1
:
$ curl -XGET 'http://localhost:9200/library/book/1?pretty'
We used the GET
verb to get a document and provided the uri
with the ID of the document. The output will be as shown in the following code:
{ "_index" : "library", "_type" : "book", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "author" : "Ravi Kumar Gupta", "title" : "Test-Driven Javascript Development", "pages" : 240 } }
We can see that the output includes information related to the document including index name, type, ID, and version. The value of found
shows whether the document exists with the ID or not. Source
contains the actual document we indexed.
To check if a document exists or not, we can use the HEAD
verb as well. Let's see the operations:
$ curl -XHEAD -i 'http://localhost:9200/library/book/2' HTTP/1.1 404 Not Found Content-Type: text/plain; charset=UTF-8 Content-Length: 0 $ curl -XHEAD -i 'http://localhost:9200/library/book/1' HTTP/1.1 200 OK Content-Type: text/plain; charset=UTF-8 Content-Length: 0
The source header in the result while getting a document shows the actual document we used. If you are familiar with Kibana Console and try the same command, you might see an error. This is a known issue only for Kibana Console and it is filed here - https://github.com/elastic/kibana/issues/9141. We will be learning about Kibana Console, in Chapter 4, Kibana Interface.
There might be situations where we don't want to retrieve the whole content and only a few of the fields are needed. Or sometimes, we don't want the source at all:
curl -XGET 'http://localhost:9200/library/book/1?_source=false'
The preceding code will exclude the source from the output. To skip some fields - for example, if we don't want to get pages of the book:
curl -XGET 'http://localhost:9200/library/book/1?_source_exclude=pages'
To include only authors and skip all other fields:
curl -XGET 'http://localhost:9200/library/book/1?_source_include=author'
We can use both _source_include
and _source_exclude
together. This is helpful when there are documents with many fields and you want to reduce the network overhead by requesting fewer fields. To understand this, add a book with categories:
curl -XPUT "http://localhost:9200/library/book/4/_create?pretty" -d'
{
"author" : "Ravi Kumar Gupta",
"title" : "Test-Driven JavaScript Development",
"pages" : 240,
"category" : [
{"name":"Technology","subcategory": "javascript"},
{"name":"Methodology","subcategory": "development"}
]
}'
Now if we want to get the document with category and skip subcategory, use the following command:
curl -XGET http://localhost:9200/library/book/4?_source_include=category&_source_exclude=*.subcategory
The response to the preceding command will be as follows:
{ "_index": "library", "_type": "book", "_id": "4", "_version": 1, "found": true, "_source": { "category": [ { "name": "Technology" }, { "name": "Methodology" } ] } }
When we need to use only _source_include
, we can also use the following:
curl -XGET 'http://localhost:9200/library/book/1?_source=author'
This will include only the author excluding all others.
Sometimes we might only want the source, in that case use the following:
curl -XGET 'http://localhost:9200/library/book/1/_source'
We can use _source_include
and _source_exclude
here as well to exclude or include some fields of the source. Also with the same uri
, with HEAD
verb, you can find out if a document exists or not.
Sometimes, we may not be worried about a type to which a document belongs and we want to retrieve the document with id. For such cases, we can specify _all
for the type and we will get the first document that matches the ID in any type:
curl -XGET 'http://localhost:9200/library/_all/1/_source' curl -XGET 'http://localhost:9200/library/_all/1'
While creating an index, we can explicitly specify which fields are to be stored. For example - if we create a library index with the following mappings:
curl -XPUT "http://localhost:9200/library" -d' { "mappings": { "book": { "properties": { "author": { "type": "keyword", "store": true }, "pages": { "type": "integer", "store": false } } } } }'
Please note that the preceding command will throw an index_already_exists_exception
exception in case an index library is already present. After this we add a book with id 1
, similar to what we added previously. Now this will allow us to get stored fields as an array when passing a stored_field
parameter with the following call:
curl -XGET "http://localhost:9200/library/book/1?stored_fields=author,pages"
We will get the following output:
{ "_index": "library", "_type": "book", "_id": "1", "_version": 1, "found": true, "fields": { "author": [ "Ravi Kumar Gupta" ] } }
All of the stored fields will be returned as arrays.
Delete API
The Delete API helps us to delete a document from the index. We can use the DELETE
verb for this purpose:
curl -XDELETE 'http://localhost:9200/library/book/1?pretty'
This will result in the following JSON:
{ "found" : true, "_index" : "library", "_type" : "book", "_id" : "1", "_version" : 2, "result" : "deleted", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 } }
If the document exists, the value of found
will be true
and it will delete the document.
Note
Every write operation, including delete, will increase the version.
Update API
The Update API helps us to update a document. The update happens using a script that we provide. While updating, it gets the document from the index, runs the script on the document, and indexes back. Let's try to add a category to our book that we added earlier:
curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d '{ "script" : { "inline" : "ctx._source.category = \"category\"", "lang": "painless", "params" : { "category" : "Technical" } } }'
Note
Earlier versions of Elasticsearch used groovy language for scripting by default. With the latest versions of Elasticsearch, such as 5.x, a new language for scripting is developed and embedded to Elasticsearch by default. This language is named as Painless and it has a similar syntax as groovy. We will be learning more about this later in this chapter. Wherever we need to use script, we can simply specify field lang
with value as painless
. We will be using painless script by default throughout the chapter.
In case you get a document missing error, it may be because you deleted the book with id 1
while trying previous operations in the Delete API section. Just add a book again to try this out.
Here, we are running an inline script that will add a category
field if none exists. If a category field exists, it will update the field:
"inline" : "ctx._source.category = category"
In the preceding command, we are assigning _source.category
. It will take value from the param
category in params
:
The ctx
map contains _index
, _type
, _id
, _version
, _routing
, _parent
, _timestamp
, _ttl
, and _source
variables.
Running the preceding command will result in the following JSON:
{ "_index" : "library", "_type" : "book", "_id" : "1", "_version" : 2, "result" : "updated", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 } }
Now the book with id 1
will have category
as Technical
. In case, we need to delete a field, we can do so by using ctx._source.remove()
:
$ curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d ' { "script" : "ctx._source.remove("category")", "lang" : "painless" }
We can even put a condition and then do an update:
"inline" : "ctx._source.category.contains(category) ? ctx.op = "delete" : ctx.op = "none"
As per the preceding command, it will first check whether the category field contains a category
named Technical
, and if it does, then it will delete that document, otherwise do nothing.
We can also update a document partially and in this case, we need to provide only the field that we want to update:
$ curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d '{ "doc" : { "pages" : 250 } }'
The preceding code will update only the pages in the type book with id 1
. The value is merged in this case. In case both script
and doc
are present, doc
will be ignored and only script
will run.
A document will be re-indexed only if source
differs from the old one. If we want to always update the document even if source
did not change, we can use detect_noop
and set it to false
:
$ curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d '{ "doc" : { "pages" : 250 }, "detect_noop" : false }'
Refer to the following code:
$ curl -XPOST 'http://localhost:9200/library/book/3/_update?pretty' -d '{ "script" : { "inline" : "ctx._source.category = "category"", "lang": "painless", "params" : { "category" : "Technical" } }, "upsert" : { "category" : "Technical" } }'
If the document does not exist with the ID (3
in this case) supplied, then a new document will be created with values inside the upsert
. We can also use scripted_upsert
in which the script handles the initialization of the document instead of upsert
. We should use scripted_upsert
as true
.
We can skip adding an upsert
in case we want to use the content of doc
as a new document. For this, we can set doc_as_upsert
to true
and skip adding upsert
in the operation:
$ curl -XPOST 'http://localhost:9200/library/book/5/_update?pretty' -d '{ "doc" : { "pages" : 250 }, "doc_as_upsert" : true }'
Multi-document APIs
These APIs support operations on multiple documents. Similar to single document APIs, these can also be pided. Let's take a look at each of these.
Multi-get API
As the name suggests, this API helps us to get multiple documents at a time. We need to provide index
, which is mandatory, type
, which is optional, and id
. If we want to get documents with ID 1
and 5
, this is how we can do so:
$ curl -XGET 'http://localhost:9200/_mget?pretty' -d '{ "docs" : [ { "_index" : "library", "_id" : 1 }, { "_index" : "library", "_id" : 5 } ] }'
We provide an array to the docs
and as a result we get an array of documents. If we know that all the documents are going to be from the same index, then we can specify the index name in uri
and use 'http://localhost:9200/library/_mget?pretty'
instead of 'http://localhost:9200/_mget?pretty'
.
And similarly for type as well. We can use 'http://localhost:9200/library/book/_mget?pretty
instead of 'http://localhost:9200/_mget?pretty
.
Now there is only _id
, which is repeated. In such cases we can further change the request to include the ids
parameter, which will contain an array of _id(s)
:
curl -XGET 'http://localhost:9200/library/_mget?pretty' -d '{ "ids" : [1, 5] }'
Or as follows:
curl -XGET 'http://localhost:9200/library/book/_mget?pretty' -d '{ "ids" : [1, 5] }'
If we don't want to specify type field then we can either leave that empty or use _all
, in which case it will bring the first document in the all type matching the specified ids
.
In the result set, all documents will contain the source field as well. In case we want to skip source or want some fields only, we can use _source
, _source_include
, and _source_exclude
as URL parameters as we did in the Get API on a single document. Possible usage for the source can be as shown in the following code:
curl -XGET 'http://localhost:9200/library/book/_mget?pretty' -d '{ "docs" : [ { "_index" : "library", "_id" : 1, "_source" : ["author"] }, { "_index" : "library", "_id" : 5, "_source" : { "include" : ["author"], "exclude" : ["pages"] } } ] }'
In the first one we are trying to get only authors and excluding all others. The second example is also similar in a way as we are including only author and excluding pages.
Similarly, we can pass fields as well and an array of fields as value. Only stored_fields
supplied in the array will be returned. We can even set a default list of stored_fields
to be returned in the url and for each document as well:
$ curl 'http://localhost:9200/library/book/_mget?stored_fields=author' -d '{ "docs" : [ { "_id" : 1 }, { "_id" : 5, "stored_fields":["pages"] } ] }'
The preceding operation will get the author
field for all those documents for which fields are not defined explicitly. For a document with an id
of 5
, only the pages
field will be returned provided that the pages
field is mapped as a stored field.
Bulk API
There are times when we need to do operations on a good amount of documents, the Bulk API comes in handy for such situations. We can do index, create, delete, and update with this API. We can use this API on:
/_bulk /index/_bulk /index/type/_bulk
Search APIs
These APIs help users to search into one or more indices. Let's get into these APIs in more detail.
Search API
This API allows us to search on one or more indices and zero or more types. The search operation can be done in two ways - putting query parameters in the search uri
or by using the Domain Specific Language (DSL) query in the request body. The operation returns the number of hits, which shows the number of results.
Query parameters
Using this way, we use q=
to specify the search parameters. For example, if we want to search for all the books with author containing gupta
we can use the GET
verb:
$ curl -XGET 'http://localhost:9200/library/_search?q=author:gupta&pretty'
This will result in the following JSON:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.37158427, "hits" : [ { "_index" : "library", "_type" : "book", "_id" : "AVSPSXXDxiIiqkaLJfTy", "_score" : 0.37158427, "_source" : { "author" : "Yuvraj Gupta", "title" : "Kibana Essentials", "pages" : 210 } }, { "_index" : "library", "_type" : "book", "_id" : "1", "_score" : 0.2972674, "_source" : { "author" : "Ravi Kumar Gupta", "title" : "Test-Driven JavaScript Development", "pages" : 250, "category" : "Technical" } } ] } }
As we can see, we got two hits for Kibana Essentials
and Test-Driven Javascript Development
.
Unlike POST
/PUT
operations, results from GET
operations will be returning the total number of shards in the _shards
section of the result.
In the previous call, we did not specify any type and if we want to, we can call the following:
$ curl -XGET 'http://localhost:9200/library/book/_search?q=author:gupta&pretty'
If there are multiple indices or multiple types, we can search those using the following:
$ curl -XGET 'http://localhost:9200/library,users/_search?q=author:gupta&pretty' $ curl -XGET 'http://localhost:9200/library/book,journal/_search?q=author:gupta&pretty'
Note
In case an index is not present, the index_not_found_exception
exception will be thrown.
If we want to search into all indices and all types, we can skip specifying any in the uri
:
$ curl -XGET 'http://localhost:9200/_search?q=author:gupta&pretty'
We can provide as many indices and types (comma separated) as we want.
Search shard API
This API helps us to get the indices and shards against which a search will be executed. While searching we can provide comma separate indices and types. To understand this, let's see the following uri
:
$ curl -XGET 'http://localhost:9200/library/_search_shards?pretty'
This will return us all the shards available for searching and indexing. We can provide a routing value as well:
$ curl -XGET 'http://localhost:9200/library/_search_shards?pretty&routing=gupta'
Now we will get a short list because we have specified the routing value. We can specify multiple routing values (comma separated).
Multi-search APIs
This API allows us to do multiple search operations at a time. We need to provide a file that contains the header and body part of a search request. The header part specifies index, type to search on, search_type
, preference, and routing. The body part contains the query, from, size, aggregations, and so on. For example, consider the following:
{"index" : "library", "type" : "book" } {"query" : {"term" :{"author" : "gupta"}} }
In the preceding code, the first line is the header and the second line is the body part. There can be as many pairs as we want in the file. Once we have the queries ready, we can run the operation using the GET
verb. Let's assume that we put all of our queries in a file named queries
:
$curl -XGET 'http://localhost:9200/_msearch?pretty' --data-binary @queries; echo
Using --data-binary
, we can specify the file containing all headers and bodies. @queries
is the filename.
Count API
As the name suggests, this API call results in the number of matches for a query. We can use the _count
endpoint to get the number of results:
$ curl -XGET 'http://localhost:9200/library/book/_count?pretty' -d '{ "query" : { "term" : {"author" : "gupta"} } }'
And the outcome of this operation will be as follows:
{ "count" : 2, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 } }
The result shows that there are two matches across five shards.
Validate API
If we are running a query on sensitive data or it is going to be taking too much time or some other reason, what if we could validate the query first and then run it. This API helps us to know whether the query is a valid one before executing the query.
To use this API we can use the /_validate/query
endpoint:
$ curl -XGET 'http://localhost:9200/library/book/_validate/query?pretty' -d '{ "query" : { "term" : {"author" : "gupta"} } }'
The operation will result in true
or false
for the valid field.
Explain API
This API explains the score calculation for a query and a specific document. We can use the /_explain
endpoint for this purpose. In this operation we need to provide a single index and single type:
$ curl -XGET 'http://localhost:9200/library/book/1/_explain?pretty' -d '{ "query" : { "term" : {"author" : "gupta"} } }'
We executed this on book with id 1
and for query where we expect gupta
to be in author name.
Profile API
Sometimes we might encounter a slow-running query and this API can help us with the low level detail, to find which component of the query is taking time so that we can analyze and take action. This API is still an experiment in ES 5.x. To enable a profile on a call, use the following:
$ curl -XGET "http://localhost:9200/library/_search?q=author:gupta&pretty" -d' { "profile": true }'
In the results, we usually have hits along with basic information, but now there will also be detailed profiling information. This will do profiling for each shard of the index being searched for the query we made. This contains the complete query execution details.
X-Pack also offers a profiler that can be referred to in Chapter 9, X-Pack: Security and Monitoring under the Understanding Profiler section.
Field stat API
This API gives us the statistics about one or more fields. We use the _field_stats
endpoint for this purpose. For example:
$ curl -XGET 'http://localhost:9200/_field_stats?pretty&fields=pages'
The preceding operation is performed at the cluster
level by default. Similar to fields
, we can also use a level
to define whether it will be cluster
level or indices
level. This results in how many shards, count, minimum value, maximum value, density, and so on:
{ "_shards": { "total": 5, "successful": 5, "failed": 0 }, "indices": { "_all": { "fields": { "pages": { "type": "integer", "max_doc": 2, "doc_count": 2, "density": 100, "sum_doc_freq": -1, "sum_total_term_freq": 2, "searchable": true, "aggregatable": true, "min_value": 210, "min_value_as_string": "210", "max_value": 240, "max_value_as_string": "240" } } } } }
If you see any value as -1
that means the measurement for that field is not available for one or more shards.
This API is helpful for you to find out min, max values, count of terms, and so on. This API is also experimental and can be removed in future releases.
Indices APIs
The Indices API helps us to build and manage indices, settings, mappings, aliases, and templates. In this section, we will take a closer look at what it offers.
Managing indices
This section introduces the endpoints that help us to create, update, and delete indices and settings.
Creating an index
To create an index, we use the PUT
verb with curl
. While creating an index we can specify settings such as shards, mapping, aliases, and so on. To create an index with all default settings, we can use the following:
$ curl -XPUT 'localhost:9200/library'
This will try to create an index named library and if it can, the output will be:
{"acknowledged":true}
This operation will create five primary and five replica shards (one for each primary shard) for this index. We can change these settings, including mappings and aliases. Let's see a more complex example:
$ curl -XPUT 'http://localhost:9200/library' -d '{ "settings" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }'
This will create two primary shards and one replica for each primary shard for this index.
Note
If an index is already created and you try to create it again, you are likely to get index_already_exists_exception
.
Checking if an index exists
We can use the HEAD
verb with curl to check if an index exists. For example, if we want to check if a library index exists:
$ curl -XHEAD -i 'http://localhost:9200/library'
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
If the status code is 404
, that means the index does not exist.
Getting index information
We can get information about the index using the following:
$ curl -XGET 'localhost:9200/library?pretty'
This will show information about the library index, which includes aliases, mappings, settings, and warmers. Settings contain shard information, creation time, version, and so on:
{ "library": { "aliases": {}, "mappings": { "book": { "properties": { "author": { "type": "keyword", "store": true }, "pages": { "type": "integer" }, "title": { "type": "keyword", "store": true } } } }, "settings": { "index": { "creation_date": "1484693593662", "number_of_shards": "5", "number_of_replicas": "1", "uuid": "kW1tscieT6OaWClNnhguAQ", "version": { "created": "5010199" }, "provided_name": "library" } } } }
If there is no information available, an empty array will be returned as it did for aliases.
Note
If an index is not present and you try to get any information, you are likely to get index_not_found_exception
.
Managing index settings
If we want to get just settings, we can use /{index}/_settings
:
$ curl -XGET 'http://localhost:9200/library/_settings?pretty'
{index}
can also take multiple indices as comma separated. We can also use wildcards to match indices.
We can also update settings after an index is created. We can use the same endpoint {index}/_settings
with the PUT
verb. The settings will be updated dynamically. If we do not specify any, it will update settings for all indices:
$ curl -XPUT 'http://localhost:9200/library/_settings?pretty' -d '{ "index" : { "number_of_replicas" : 2 } }'
The preceding operation will set two replica shards for each primary shard on the library index.
So far, we have seen operations on creating, editing, opening, and closing. For an index, we can also monitor the indices by checking statistics, shard stores, recovery info, and segments.
Getting index stats
This operation helps us to get stats about an index. It will show information about documents, index size, indexing stats, search, fielddata, flush, merge, request cache, refresh, suggest, translog, warmer, and other statistics. To get stats of an index, run the following:
$ curl -XGET 'http://localhost:9200/library/_stats?pretty'
We can supply comma separated multiple indices to get stats for multiple indices at a time. We can also specify which specific stats to get, as shown in the following command:
$ curl -XGET 'http://localhost:9200/library/_stats/docs,search?pretty'
Getting index segments
Using the _segment
endpoint on one or more indices, we can get low-level segment information of an index:
$ curl -XGET 'http://localhost:9200/library/_segments?pretty'
Getting index recovery information
This API provides us information about index shard recoveries. This provides a very detailed view for each shard of an index. We can use the _recovery
endpoint on one or more indices:
$ curl -XGET 'http://localhost:9200/library/_recovery?pretty'
Getting shard stores information
This API helps us to get shard stores for one or more indices. To get shard stores, we can use the _shard_stores
endpoint:
$ curl -XGET 'http://localhost:9200/library/_shard_stores?pretty'
This will affect all of the shards and the following is one such shard:
"0" : { "stores" : [ { "re4j45-yTp6VZgMN7dP2Sg" : { "name" : "xB20COp", "ephemeral_id" : "SK9sdP0nRPuCKs51VtvNqg", "transport_address" : "127.0.0.1:9300", "attributes" : { } }, "allocation_id" : "9I_4rywATD6CV8lqvFIFaA", "allocation" : "primary" } ] }
It shows the stores with node name, address, attributes, version, and allocation type for each store. If you are running only one node, information about replica shards will not be present.
Index aliases
Aliases are just like what we have in Unix OS. In Unix, they are for commands, here they are for indices.
We can create an alias for single or multiple indices. Once an alias is set and we use it instead of indices names in the API calls, Elasticsearch will replace the alias with the actual indices names and execute the operations. Aliasing is useful when we have many indices and we have a reason to group them.
For example, we have a production server set up that includes database, web, and application servers, and we have set up elastic stack to collect logs from all of those to put in Elasticsearch. Now there are other indices as well, but we want to perform some operation on all logs related indices. We can use an alias for all those indices, for example, prod_logs
and then whatever operations we want to perform we can do so by simply using prod_logs
in place of indices rather than typing in all of the indices. We can create aliases using the _aliases
endpoint.
We can add and remove aliases for an index, as in the following snippet:
$ curl -XPOST 'http://localhost:9200/_aliases' -d '{ "actions" : [ { "add" : { "index" : "library", "alias" : "alias1" } }, { "remove" : { "index" : "library", "alias" : "alias1" } } ] }'
We can provide as many actions as we want. The first action we called for is to add an alias alias1
for index library
and the second action removes it.
Tip
An alias name cannot be the same as an index name.
Mappings
When we add a document to an index, based on data inside the document, Elasticsearch creates mappings. These mappings help us identify and define whether the field should be a full text field, numeric, or date.
In Elasticsearch, to pide documents within an index into logical groups, every index has one or more mapping types. Every mapping types has meta-fields ( _index
, _type
, _id
, _source
) and a list of fields for the type. Every field has a data type. These data types can be string
, long
, double
, date
, boolean
, ip
(IP address), object
, nested
, and specialized types related to geo locations - geo_point
, geo_shape
.
We can get mappings of an index using the _mapping
endpoint on an index:
$ curl http://localhost:9200/library/_mapping?pretty
If there are multiple types and we want to get mappings for a specific type, use the {index}/_mapping/{type}
endpoint:
$ curl http://localhost:9200/library/_mapping/book?pretty
This will output a properties map containing all of the fields:
{ "library": { "mappings": { "book": { "properties": { "author": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "pages": { "type": "long" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } } }
The book
type contains these three fields and we can see that for each field there is a type defined along with other relevant properties.
We can also get mapping for specific field(s) using {index}/_mapping/{type}/field/{fields}
. Multiple fields can be supplied comma separated:
$ curl localhost:9200/library/_mapping/field/title?pretty
We can also check if a type exists using the HEAD
verb:
$ curl -XHEAD -i 'http://localhost:9200/library/_mapping/book'
HTTP/1.1 200 OK
Content-Type: text/plain; charset=UTF-8
Content-Length: 0
Status code 200
acknowledges that the type exists, if it doesn't exist it would be 404
.
We can also add mappings using the Put Mappings API, which allows us to add a new type or new field.
To add a new type to an index, for example, paper
to library
, run the following:
$ curl -XPUT 'http://localhost:9200/library/_mapping/paper' -d' { "properties": { "abstract": { "type": "string" } } }'
Executing this will acknowledge true
if it was successfully created. This will create a paper
type with an abstract
field to a paper
mapping type.
Closing, opening, and deleting an index
Sometimes, we might want to stop read/write for a specific index for maintenance or recovery purpose. When an index is closed, it has no overhead on the cluster except maintaining its metadata. We can close/open index using /{index}/_close
and /{index}/_open
, respectively. To do so, we use the POST
verb with curl
:
$ curl -XPOST 'http://localhost:9200/library/_close' $ curl -XPOST 'http://localhost:9200/library/_open'
If we don't want an index at all, we can also delete it using the DELETE
verb:
$ curl -XDELETE 'http://localhost:9200/library'
This will delete the index from cluster like it was never there. We can also use wildcard to delete multiple indices, for example, to delete all indices with the name textXXX
:
$ curl -XDELETE 'http://localhost:9200/test*'
This would see if there are any indices with a name starting with test
, it will delete all those. In case there is no index present with the name test
and we are using wildcards, we won't get any errors, but an acknowledgement:
{ "acknowledged": true }
But if we delete an index without wildcards and the index does not exist, we will get an exception - index_not_found_exception
.
Other operations
There are more operations that this API supports such as clearing the cache, upgrading Elasticsearch indices, force merge, refresh, and flushing:
- Clearing Cache: The
_cache/clear
endpoint helps us to clear cache for one or more indices:$ curl -XPOST "http://localhost:9200/library/_cache/clear"
- Flush: Using
_flush
helps us to free memory from one or more indices by clearing the transaction logs and by flushing data to index storage:$ curl -XPOST "http://localhost:9200/library/_flush"
- Refresh: This API provides the
_refresh
endpoint to refresh one or more indices:$ curl -XPOST "http://localhost:9200/library/_refresh"
- Upgrade API: This API helps us to upgrade indices from older Elasticsearch versions to new versions. We can use the
_upgrade
endpoint to upgrade one or more indices. This process will usually take time:$ curl -XPOST 'http://localhost:9200/library/_upgrade?pretty'
This will give us the following output:
{ "_shards": { "total": 15, "successful": 10, "failed": 0 }, "upgraded_indices": { "library": { "upgrade_version": "5.1.1", "oldest_lucene_segment_version": "6.3.0" } } }
As we can see, the library index was upgraded successfully and the version it is upgraded to is specified.
Cat APIs
This API helps us to print information nodes, indices, fields, tasks, and plugins in a human readable format rather than a JSON. It can also be visualized how tables are printed on console.
We will learn more about cat APIs in Chapter 8, Elasticsearch APIs, The cat APIs section.
Cluster APIs
These APIs allows us to know about cluster state, health, statistics, node statistics, and node information. We will learn about Cluster APIs in Chapter 8, Elasticsearch APIs, The cluster APIs section.