Elasticsearch APIs_Mastering Elastic Stack-QQ阅读男频轻小说网

上QQ阅读APP看书，第一时间看更新

Elasticsearch APIs

There are many APIs available for managing Elasticsearch. These APIs help us to manage cluster, indices, search, and so on. In this section, we will look at each of these APIs in detail.

We can use these APIs through Command Prompt, Console in Kibana, or any tool that can make calls to RESTful APIs.

Note

By default, Elasticsearch runs on port 9200 to listen to HTTP requests. Kibana uses the same port to connect to Elasticsearch. To learn more about Console, refer to Chapter 4, Kibana Interface, Exploring Dev tools section.

Sense is a powerful plugin for Kibana that allows us to make calls to Elasticsearch APIs using a web interface. We will be learning about Sense in Chapter 8, Elasticsearch APIs. For this chapter, we will be using cURL, a Command Prompt utility that allows us to access HTTP requests to access the APIs.

A typical cURL request against ES contains a verb, URL, and message body:

$ curl -X{Verb} 'url' -d '{message-body}'

Verbs are GET, PUT, POST, DELETE, and HEAD, and URL is either HTTP (by default), or HTTPS. Message body is usually the document we want to index, query we want to perform, or some command we want Elasticsearch to follow.

Document APIs

These APIs allow users to add document(s) to index, get those documents, and edit and delete operations. These APIs are pided into two groups, as discussed in the following sections.

Single document APIs

These APIs are applicable when operations are performed on one document at a time. These can be further pided as follows:

Index API
Get API
Delete API
Update API

Index API

The Index API adds a JSON document to the index. To understand this, let's take an example of a library in which we need to add a book:

$ curl -XPUT 'http://localhost:9200/library/book/1?pretty' -d '{
 "author" : "Ravi Kumar Gupta",
 "title" : "Test-Driven Javascript Development",
 "pages" : 240
}'

Tip

We have used pretty at the end of uri to pretty print the JSON output (if any).

This will automatically create an index named library if not already present, and add a book to the index. If the index library is already present, it will add the document to it unless there is a document with the same ID present.

Look at the following code pieces:

$ curl -XPUT 'http://localhost:9200/library/book/1?op_type=create&pretty' -d '{
      "author" : "Ravi Kumar Gupta",
      "title" : "Test-Driven Javascript Development",
      "pages" : 240
     }'
$ curl -XPUT 'http://localhost:9200/library/book/1/_create?pretty' -d '{
      "author" : "Ravi Kumar Gupta",
      "title" : "Test-Driven JavaScript Development",
      "pages" : 240
    }'

In the preceding code and in the first command, we have mentioned op_type to set the operation type and set the value as create. We can achieve the same thing by appending /_create at the end of the URI. Both will result in the same output and they will add this document to the index. By default, the operation is create in case we do not specify (as in the first operation of the previous code listing). Any of the three commands mentioned previously will create an index library if it does not exist and add an entry as provided for author, title, and pages.

Note

When using a create command explicitly either by using the _create endpoint or by using the op_type parameter, in case the document is already present with the same ID, we will get version_conflict_engine_exception.

Let's analyze the output of the preceding code:

{ 
  "_index" : "library", 
  "_type" : "book", 
  "_id" : "1", 
  "_version" : 1, 
  "result" : "created", 
  "_shards" : { 
    "total" : 2, 
    "successful" : 1, 
    "failed" : 0 
  }, 
  "created" : true 
}

The output shows that the document has been added to an index named library and the type assigned is book. The documents get the id specified as 1 with version as 1. The value of result is set to created. The value true of created shows that the operation was successful. If the same document was already present in the index, while doing the same operation, we will get the value of result as updated, _version will be updated to next, that is, 2 in this case, and the value of created would be false. The shard header tells us that there are total of 2 shards, out of which document got copied to 1 shard successfully. If the value of successful results is at least one, the operation is considered as successful.

There would be five shards by default and one replica for each. Out of five shards, two are chosen for indexing the document. The number of successful shards depends on your cluster settings as well. If there is only one node, then the replica would not be present, and if you notice, the cluster/node health would be yellow. So the value of successful is often less than the total shards. In case you have more than one node set up in the cluster, the value will likely be same. The following is an output of the same call made to a cluster with two nodes:

{ 
  "_index": "library", 
  "_type": "book", 
  "_id": "1", 
  "_version": 1, 
  "result": "created", 
  "_shards": { 
    "total": 2, 
    "successful": 2, 
    "failed": 0 
  }, 
  "created": true 
}

As we can see, the document was properly replicated. All we need is a positive value of field successful in the output. The document will be copied to multiple shards/replicas as soon as one is available.

Note that we have added an id1 in the uri, /library/book/1/_create. In case we want this ID to be generated automatically, we can use the verb POST instead of PUT for curl.

If we use POST, then it will automatically create the index along with ID even if index doesn't exist.

Let's add another book:

$ curl -XPOST 'http://localhost:9200/library/book?pretty' -d '{ 
  "author" : "Yuvraj Gupta", 
  "title" : "Kibana Essentials", 
  "pages" : 210 
}'

Let's analyze what the output will be now:

{ 
  "_index" : "library", 
  "_type" : "book", 
  "_id" : "AVSPSXXDxiIiqkaLJfTy", 
  "_version" : 1, 
  "result" : "created", 
  "_shards" : { 
    "total" : 2, 
    "successful" : 1, 
    "failed" : 0 
  }, 
  "created" : true 
}

This time everything is similar, but the value of id is automatically generated by Elasticsearch.

Routing

By default, Elasticsearch, based on the id of the document, decides which shard to put a document into. Using the id of the document, it calculates a hash and makes a decision to select one or more shards. Instead of Elasticsearch making a decision based on id, we can provide a value that will be used to calculate the hash. This process of shard allocation is called routing. We can use a query parameter routing with the uri, as shown in the following command:

$ curl -XPOST 'http://localhost:9200/library/book?pretty&routing=books' -d '{ 
  "author" : "Yuvraj Gupta", 
  "title" : "Kibana Essentials", 
  "pages" : 210 
}'

In the preceding example, the Kibana Essentials book will be placed into a shard based on the hash calculated using the books routing parameter.

Get API

The Index API helped us to add a document into the index and the Get API is used to get a document using the ID of the document. Let's try to get the document we stored with id 1:

$ curl -XGET 'http://localhost:9200/library/book/1?pretty'

We used the GET verb to get a document and provided the uri with the ID of the document. The output will be as shown in the following code:

{ 
  "_index" : "library", 
  "_type" : "book", 
  "_id" : "1", 
  "_version" : 1, 
  "found" : true, 
  "_source" : { 
    "author" : "Ravi Kumar Gupta", 
    "title" : "Test-Driven Javascript Development", 
    "pages" : 240 
  } 
}

We can see that the output includes information related to the document including index name, type, ID, and version. The value of found shows whether the document exists with the ID or not. Source contains the actual document we indexed.

To check if a document exists or not, we can use the HEAD verb as well. Let's see the operations:

$ curl -XHEAD -i 'http://localhost:9200/library/book/2'
    HTTP/1.1 404 Not Found
    Content-Type: text/plain; charset=UTF-8
    Content-Length: 0
    

$ curl -XHEAD -i 'http://localhost:9200/library/book/1'
    HTTP/1.1 200 OK
    Content-Type: text/plain; charset=UTF-8
    Content-Length: 0

The source header in the result while getting a document shows the actual document we used. If you are familiar with Kibana Console and try the same command, you might see an error. This is a known issue only for Kibana Console and it is filed here - https://github.com/elastic/kibana/issues/9141. We will be learning about Kibana Console, in Chapter 4, Kibana Interface.

There might be situations where we don't want to retrieve the whole content and only a few of the fields are needed. Or sometimes, we don't want the source at all:

curl -XGET 'http://localhost:9200/library/book/1?_source=false'

The preceding code will exclude the source from the output. To skip some fields - for example, if we don't want to get pages of the book:

curl -XGET 'http://localhost:9200/library/book/1?_source_exclude=pages'

To include only authors and skip all other fields:

curl -XGET 'http://localhost:9200/library/book/1?_source_include=author'

We can use both _source_include and _source_exclude together. This is helpful when there are documents with many fields and you want to reduce the network overhead by requesting fewer fields. To understand this, add a book with categories:

curl -XPUT "http://localhost:9200/library/book/4/_create?pretty" -d' 
{ 
  "author" : "Ravi Kumar Gupta", 
  "title" : "Test-Driven JavaScript Development", 
  "pages" : 240, 
  "category" : [ 
    {"name":"Technology","subcategory": "javascript"}, 
    {"name":"Methodology","subcategory": "development"} 
    ] 
}'

Now if we want to get the document with category and skip subcategory, use the following command:

curl -XGET http://localhost:9200/library/book/4?_source_include=category&_source_exclude=*.subcategory

The response to the preceding command will be as follows:

{
 "_index": "library",
 "_type": "book",
 "_id": "4",
 "_version": 1,
 "found": true,
 "_source": {
 "category": [
 {
 "name": "Technology"
 },
 {
 "name": "Methodology"
 }
 ]
 }
}

When we need to use only _source_include, we can also use the following:

curl -XGET 'http://localhost:9200/library/book/1?_source=author'

This will include only the author excluding all others.

Sometimes we might only want the source, in that case use the following:

curl -XGET 'http://localhost:9200/library/book/1/_source'

We can use _source_include and _source_exclude here as well to exclude or include some fields of the source. Also with the same uri, with HEAD verb, you can find out if a document exists or not.

Sometimes, we may not be worried about a type to which a document belongs and we want to retrieve the document with id. For such cases, we can specify _all for the type and we will get the first document that matches the ID in any type:

curl -XGET 'http://localhost:9200/library/_all/1/_source'
curl -XGET 'http://localhost:9200/library/_all/1'

While creating an index, we can explicitly specify which fields are to be stored. For example - if we create a library index with the following mappings:

curl -XPUT "http://localhost:9200/library" -d'
{
 "mappings": {
 "book": {
 "properties": {
 "author": {
 "type": "keyword",
 "store": true
 },
 "pages": {
 "type": "integer",
 "store": false
 }
 }
 }
 }
}'

Please note that the preceding command will throw an index_already_exists_exception exception in case an index library is already present. After this we add a book with id 1, similar to what we added previously. Now this will allow us to get stored fields as an array when passing a stored_field parameter with the following call:

curl -XGET "http://localhost:9200/library/book/1?stored_fields=author,pages"

We will get the following output:

{
 "_index": "library",
 "_type": "book",
 "_id": "1",
 "_version": 1,
 "found": true,
 "fields": {
 "author": [
 "Ravi Kumar Gupta"
 ]
 }
}

All of the stored fields will be returned as arrays.

Delete API

The Delete API helps us to delete a document from the index. We can use the DELETE verb for this purpose:

curl -XDELETE 'http://localhost:9200/library/book/1?pretty'

This will result in the following JSON:

{
 "found" : true,
 "_index" : "library",
 "_type" : "book",
 "_id" : "1",
 "_version" : 2,
 "result" : "deleted",
 "_shards" : {
 "total" : 2,
 "successful" : 1,
 "failed" : 0
 }
}

If the document exists, the value of found will be true and it will delete the document.

Note

Every write operation, including delete, will increase the version.

Update API

The Update API helps us to update a document. The update happens using a script that we provide. While updating, it gets the document from the index, runs the script on the document, and indexes back. Let's try to add a category to our book that we added earlier:

curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d '{
 "script" : {
 "inline" : "ctx._source.category = \"category\"",
 "lang": "painless",
 "params" : {
 "category" : "Technical"
 }
 }
}'

Note

Earlier versions of Elasticsearch used groovy language for scripting by default. With the latest versions of Elasticsearch, such as 5.x, a new language for scripting is developed and embedded to Elasticsearch by default. This language is named as Painless and it has a similar syntax as groovy. We will be learning more about this later in this chapter. Wherever we need to use script, we can simply specify field lang with value as painless. We will be using painless script by default throughout the chapter.

In case you get a document missing error, it may be because you deleted the book with id 1 while trying previous operations in the Delete API section. Just add a book again to try this out.

Here, we are running an inline script that will add a category field if none exists. If a category field exists, it will update the field:

"inline" : "ctx._source.category = category"

In the preceding command, we are assigning _source.category. It will take value from the param category in params:

The ctx map contains _index, _type, _id, _version, _routing, _parent, _timestamp, _ttl, and _source variables.

Running the preceding command will result in the following JSON:

{
 "_index" : "library",
 "_type" : "book",
 "_id" : "1",
 "_version" : 2, "result" : "updated",
 "_shards" : {
 "total" : 2,
 "successful" : 1,
 "failed" : 0
 }
}

Now the book with id 1 will have category as Technical. In case, we need to delete a field, we can do so by using ctx._source.remove():

$ curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d ' {
"script" : "ctx._source.remove("category")",
"lang" : "painless"
}

We can even put a condition and then do an update:

"inline" : "ctx._source.category.contains(category) ? ctx.op = "delete" : ctx.op = "none"

As per the preceding command, it will first check whether the category field contains a category named Technical, and if it does, then it will delete that document, otherwise do nothing.

We can also update a document partially and in this case, we need to provide only the field that we want to update:

$ curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d '{
 "doc" : {
 "pages" : 250 
 }
}'

The preceding code will update only the pages in the type book with id 1. The value is merged in this case. In case both script and doc are present, doc will be ignored and only script will run.

A document will be re-indexed only if source differs from the old one. If we want to always update the document even if source did not change, we can use detect_noop and set it to false:

$ curl -XPOST 'http://localhost:9200/library/book/1/_update?pretty' -d '{
 "doc" : {
 "pages" : 250 
 },
"detect_noop" : false
}'

Refer to the following code:

$ curl -XPOST 'http://localhost:9200/library/book/3/_update?pretty' -d '{
 "script" : {
 "inline" : "ctx._source.category = "category"",
 "lang": "painless",
 "params" : {
 "category" : "Technical"
 }
 },
 "upsert" : {
 "category" : "Technical" 
 }
}'

If the document does not exist with the ID (3 in this case) supplied, then a new document will be created with values inside the upsert. We can also use scripted_upsert in which the script handles the initialization of the document instead of upsert. We should use scripted_upsert as true.

We can skip adding an upsert in case we want to use the content of doc as a new document. For this, we can set doc_as_upsert to true and skip adding upsert in the operation:

$ curl -XPOST 'http://localhost:9200/library/book/5/_update?pretty' -d '{
 "doc" : {
 "pages" : 250
 }, 
 "doc_as_upsert" : true
 }'

Multi-document APIs

These APIs support operations on multiple documents. Similar to single document APIs, these can also be pided. Let's take a look at each of these.

Multi-get API

As the name suggests, this API helps us to get multiple documents at a time. We need to provide index, which is mandatory, type, which is optional, and id. If we want to get documents with ID 1 and 5, this is how we can do so:

$ curl -XGET 'http://localhost:9200/_mget?pretty' -d '{
 "docs" : [
 {
 "_index" : "library",
 "_id" : 1
 },
 {
 "_index" : "library",
 "_id" : 5
 }
 ]
}'

We provide an array to the docs and as a result we get an array of documents. If we know that all the documents are going to be from the same index, then we can specify the index name in uri and use 'http://localhost:9200/library/_mget?pretty' instead of 'http://localhost:9200/_mget?pretty'.

And similarly for type as well. We can use 'http://localhost:9200/library/book/_mget?pretty instead of 'http://localhost:9200/_mget?pretty.

Now there is only _id, which is repeated. In such cases we can further change the request to include the ids parameter, which will contain an array of _id(s):

curl -XGET 'http://localhost:9200/library/_mget?pretty' -d '{
 "ids" : [1, 5]
}'

Or as follows:

curl -XGET 'http://localhost:9200/library/book/_mget?pretty' -d '{
 "ids" : [1, 5]
}'

If we don't want to specify type field then we can either leave that empty or use _all, in which case it will bring the first document in the all type matching the specified ids.

In the result set, all documents will contain the source field as well. In case we want to skip source or want some fields only, we can use _source, _source_include, and _source_exclude as URL parameters as we did in the Get API on a single document. Possible usage for the source can be as shown in the following code:

curl -XGET 'http://localhost:9200/library/book/_mget?pretty' -d '{
 "docs" : [
 {
 "_index" : "library",
 "_id" : 1,
 "_source" : ["author"]
 },
 {
 "_index" : "library",
 "_id" : 5,
 "_source" : {
 "include" : ["author"],
 "exclude" : ["pages"]
 }
 }
 ]
 }'

In the first one we are trying to get only authors and excluding all others. The second example is also similar in a way as we are including only author and excluding pages.

Similarly, we can pass fields as well and an array of fields as value. Only stored_fields supplied in the array will be returned. We can even set a default list of stored_fields to be returned in the url and for each document as well:

$ curl 'http://localhost:9200/library/book/_mget?stored_fields=author' -d '{
 "docs" : [
 {
 "_id" : 1
 },
 { 
 "_id" : 5,
 "stored_fields":["pages"]
 }
 ]
}'

The preceding operation will get the author field for all those documents for which fields are not defined explicitly. For a document with an id of 5, only the pages field will be returned provided that the pages field is mapped as a stored field.

Bulk API

There are times when we need to do operations on a good amount of documents, the Bulk API comes in handy for such situations. We can do index, create, delete, and update with this API. We can use this API on:

/_bulk
/index/_bulk
/index/type/_bulk

Search APIs

These APIs help users to search into one or more indices. Let's get into these APIs in more detail.

Search API

This API allows us to search on one or more indices and zero or more types. The search operation can be done in two ways - putting query parameters in the search uri or by using the Domain Specific Language (DSL) query in the request body. The operation returns the number of hits, which shows the number of results.

Query parameters

Using this way, we use q= to specify the search parameters. For example, if we want to search for all the books with author containing gupta we can use the GET verb:

$ curl -XGET 'http://localhost:9200/library/_search?q=author:gupta&pretty'

This will result in the following JSON:

{
 "took" : 4,
 "timed_out" : false,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 },
 "hits" : {
 "total" : 2,
 "max_score" : 0.37158427,
 "hits" : [ {
 "_index" : "library",
 "_type" : "book",
 "_id" : "AVSPSXXDxiIiqkaLJfTy",
 "_score" : 0.37158427,
 "_source" : {
 "author" : "Yuvraj Gupta",
 "title" : "Kibana Essentials",
 "pages" : 210
 }
 }, {
 "_index" : "library",
 "_type" : "book",
 "_id" : "1",
 "_score" : 0.2972674,
 "_source" : {
 "author" : "Ravi Kumar Gupta",
 "title" : "Test-Driven JavaScript Development",
 "pages" : 250,
 "category" : "Technical"
 }
 } ]
 }
}

As we can see, we got two hits for Kibana Essentials and Test-Driven Javascript Development.

Unlike POST/PUT operations, results from GET operations will be returning the total number of shards in the _shards section of the result.

In the previous call, we did not specify any type and if we want to, we can call the following:

$ curl -XGET 'http://localhost:9200/library/book/_search?q=author:gupta&pretty'

If there are multiple indices or multiple types, we can search those using the following:

$ curl -XGET 'http://localhost:9200/library,users/_search?q=author:gupta&pretty'
$ curl -XGET 'http://localhost:9200/library/book,journal/_search?q=author:gupta&pretty'

Note

In case an index is not present, the index_not_found_exception exception will be thrown.

If we want to search into all indices and all types, we can skip specifying any in the uri:

$ curl -XGET 'http://localhost:9200/_search?q=author:gupta&pretty'

We can provide as many indices and types (comma separated) as we want.

Search shard API

This API helps us to get the indices and shards against which a search will be executed. While searching we can provide comma separate indices and types. To understand this, let's see the following uri:

$ curl -XGET 'http://localhost:9200/library/_search_shards?pretty'

This will return us all the shards available for searching and indexing. We can provide a routing value as well:

$ curl -XGET 'http://localhost:9200/library/_search_shards?pretty&routing=gupta'

Now we will get a short list because we have specified the routing value. We can specify multiple routing values (comma separated).

Multi-search APIs

This API allows us to do multiple search operations at a time. We need to provide a file that contains the header and body part of a search request. The header part specifies index, type to search on, search_type, preference, and routing. The body part contains the query, from, size, aggregations, and so on. For example, consider the following:

{"index" : "library", "type" : "book" } 
{"query" : {"term" :{"author" : "gupta"}} }

In the preceding code, the first line is the header and the second line is the body part. There can be as many pairs as we want in the file. Once we have the queries ready, we can run the operation using the GET verb. Let's assume that we put all of our queries in a file named queries:

$curl -XGET 'http://localhost:9200/_msearch?pretty' --data-binary @queries; echo

Using --data-binary, we can specify the file containing all headers and bodies. @queries is the filename.

Count API

As the name suggests, this API call results in the number of matches for a query. We can use the _count endpoint to get the number of results:

$ curl -XGET 'http://localhost:9200/library/book/_count?pretty' -d '{
"query" : {
"term" : {"author" : "gupta"}
}
}'

And the outcome of this operation will be as follows:

{
 "count" : 2,
 "_shards" : {
 "total" : 5,
 "successful" : 5,
 "failed" : 0
 }
}

The result shows that there are two matches across five shards.

Validate API

If we are running a query on sensitive data or it is going to be taking too much time or some other reason, what if we could validate the query first and then run it. This API helps us to know whether the query is a valid one before executing the query.

To use this API we can use the /_validate/query endpoint:

$ curl -XGET 'http://localhost:9200/library/book/_validate/query?pretty' -d '{
"query" : {
"term" : {"author" : "gupta"}
}
}'

The operation will result in true or false for the valid field.

Explain API

This API explains the score calculation for a query and a specific document. We can use the /_explain endpoint for this purpose. In this operation we need to provide a single index and single type:

$ curl -XGET 'http://localhost:9200/library/book/1/_explain?pretty' -d '{
"query" : {
"term" : {"author" : "gupta"}
}
}'

We executed this on book with id 1 and for query where we expect gupta to be in author name.

Profile API

Sometimes we might encounter a slow-running query and this API can help us with the low level detail, to find which component of the query is taking time so that we can analyze and take action. This API is still an experiment in ES 5.x. To enable a profile on a call, use the following:

$ curl -XGET "http://localhost:9200/library/_search?q=author:gupta&pretty" -d'
{
 "profile": true
}'

In the results, we usually have hits along with basic information, but now there will also be detailed profiling information. This will do profiling for each shard of the index being searched for the query we made. This contains the complete query execution details.

X-Pack also offers a profiler that can be referred to in Chapter 9, X-Pack: Security and Monitoring under the Understanding Profiler section.

Field stat API

This API gives us the statistics about one or more fields. We use the _field_stats endpoint for this purpose. For example:

$ curl -XGET 'http://localhost:9200/_field_stats?pretty&fields=pages'

The preceding operation is performed at the cluster level by default. Similar to fields, we can also use a level to define whether it will be cluster level or indices level. This results in how many shards, count, minimum value, maximum value, density, and so on:

{ 
  "_shards": { 
    "total": 5, 
    "successful": 5, 
    "failed": 0 
  }, 
  "indices": { 
    "_all": { 
      "fields": { 
        "pages": { 
          "type": "integer", 
          "max_doc": 2, 
          "doc_count": 2, 
          "density": 100, 
          "sum_doc_freq": -1, 
          "sum_total_term_freq": 2, 
          "searchable": true, 
          "aggregatable": true, 
          "min_value": 210, 
          "min_value_as_string": "210", 
          "max_value": 240, 
          "max_value_as_string": "240" 
        } 
      } 
    } 
  } 
}

If you see any value as -1 that means the measurement for that field is not available for one or more shards.

This API is helpful for you to find out min, max values, count of terms, and so on. This API is also experimental and can be removed in future releases.

Indices APIs

The Indices API helps us to build and manage indices, settings, mappings, aliases, and templates. In this section, we will take a closer look at what it offers.

Managing indices

This section introduces the endpoints that help us to create, update, and delete indices and settings.

Creating an index

To create an index, we use the PUT verb with curl. While creating an index we can specify settings such as shards, mapping, aliases, and so on. To create an index with all default settings, we can use the following:

$ curl -XPUT 'localhost:9200/library'

This will try to create an index named library and if it can, the output will be:

{"acknowledged":true}

This operation will create five primary and five replica shards (one for each primary shard) for this index. We can change these settings, including mappings and aliases. Let's see a more complex example:

$ curl -XPUT 'http://localhost:9200/library' -d '{
"settings" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
}
}'

This will create two primary shards and one replica for each primary shard for this index.

Note

If an index is already created and you try to create it again, you are likely to get index_already_exists_exception.

Checking if an index exists

We can use the HEAD verb with curl to check if an index exists. For example, if we want to check if a library index exists:

$ curl -XHEAD -i 'http://localhost:9200/library' 
    HTTP/1.1 200 OK
    Content-Type: text/plain; charset=UTF-8
    Content-Length: 0

If the status code is 404, that means the index does not exist.

Getting index information

We can get information about the index using the following:

$ curl -XGET 'localhost:9200/library?pretty'

This will show information about the library index, which includes aliases, mappings, settings, and warmers. Settings contain shard information, creation time, version, and so on:

{ 
  "library": { 
    "aliases": {}, 
    "mappings": { 
      "book": { 
        "properties": { 
          "author": { 
            "type": "keyword", 
            "store": true 
          }, 
          "pages": { 
            "type": "integer" 
          }, 
          "title": { 
            "type": "keyword", 
            "store": true 
          } 
        } 
      } 
    }, 
    "settings": { 
      "index": { 
        "creation_date": "1484693593662", 
        "number_of_shards": "5", 
        "number_of_replicas": "1", 
        "uuid": "kW1tscieT6OaWClNnhguAQ", 
        "version": { 
          "created": "5010199" 
        }, 
        "provided_name": "library" 
      } 
    } 
  } 
}

If there is no information available, an empty array will be returned as it did for aliases.

Note

If an index is not present and you try to get any information, you are likely to get index_not_found_exception.

Managing index settings

If we want to get just settings, we can use /{index}/_settings:

$ curl -XGET 'http://localhost:9200/library/_settings?pretty'

{index} can also take multiple indices as comma separated. We can also use wildcards to match indices.

We can also update settings after an index is created. We can use the same endpoint {index}/_settings with the PUT verb. The settings will be updated dynamically. If we do not specify any, it will update settings for all indices:

$ curl -XPUT 'http://localhost:9200/library/_settings?pretty' -d '{
"index" : {
"number_of_replicas" : 2 
} 
}'

The preceding operation will set two replica shards for each primary shard on the library index.

So far, we have seen operations on creating, editing, opening, and closing. For an index, we can also monitor the indices by checking statistics, shard stores, recovery info, and segments.

Getting index stats

This operation helps us to get stats about an index. It will show information about documents, index size, indexing stats, search, fielddata, flush, merge, request cache, refresh, suggest, translog, warmer, and other statistics. To get stats of an index, run the following:

$ curl -XGET 'http://localhost:9200/library/_stats?pretty'

We can supply comma separated multiple indices to get stats for multiple indices at a time. We can also specify which specific stats to get, as shown in the following command:

$ curl -XGET 'http://localhost:9200/library/_stats/docs,search?pretty'

Getting index segments

Using the _segment endpoint on one or more indices, we can get low-level segment information of an index:

$ curl -XGET 'http://localhost:9200/library/_segments?pretty'

Getting index recovery information

This API provides us information about index shard recoveries. This provides a very detailed view for each shard of an index. We can use the _recovery endpoint on one or more indices:

$ curl -XGET 'http://localhost:9200/library/_recovery?pretty'

Getting shard stores information

This API helps us to get shard stores for one or more indices. To get shard stores, we can use the _shard_stores endpoint:

$ curl -XGET 'http://localhost:9200/library/_shard_stores?pretty'

This will affect all of the shards and the following is one such shard:

"0" : {
 "stores" : [ {
 "re4j45-yTp6VZgMN7dP2Sg" : {
 "name" : "xB20COp",
 "ephemeral_id" : "SK9sdP0nRPuCKs51VtvNqg",
 "transport_address" : "127.0.0.1:9300",
 "attributes" : { }
 },
 "allocation_id" : "9I_4rywATD6CV8lqvFIFaA",
 "allocation" : "primary"
 } 
 ]
 }

It shows the stores with node name, address, attributes, version, and allocation type for each store. If you are running only one node, information about replica shards will not be present.

Index aliases

Aliases are just like what we have in Unix OS. In Unix, they are for commands, here they are for indices.

We can create an alias for single or multiple indices. Once an alias is set and we use it instead of indices names in the API calls, Elasticsearch will replace the alias with the actual indices names and execute the operations. Aliasing is useful when we have many indices and we have a reason to group them.

For example, we have a production server set up that includes database, web, and application servers, and we have set up elastic stack to collect logs from all of those to put in Elasticsearch. Now there are other indices as well, but we want to perform some operation on all logs related indices. We can use an alias for all those indices, for example, prod_logs and then whatever operations we want to perform we can do so by simply using prod_logs in place of indices rather than typing in all of the indices. We can create aliases using the _aliases endpoint.

We can add and remove aliases for an index, as in the following snippet:

$ curl -XPOST 'http://localhost:9200/_aliases' -d '{
"actions" : [
 { "add" : { "index" : "library", "alias" : "alias1" } },
 { "remove" : { "index" : "library", "alias" : "alias1" } }
 ]
}'

We can provide as many actions as we want. The first action we called for is to add an alias alias1 for index library and the second action removes it.

Tip

An alias name cannot be the same as an index name.

Mappings

When we add a document to an index, based on data inside the document, Elasticsearch creates mappings. These mappings help us identify and define whether the field should be a full text field, numeric, or date.

In Elasticsearch, to pide documents within an index into logical groups, every index has one or more mapping types. Every mapping types has meta-fields ( _index, _type, _id, _source) and a list of fields for the type. Every field has a data type. These data types can be string, long, double, date, boolean, ip (IP address), object, nested, and specialized types related to geo locations - geo_point, geo_shape.

We can get mappings of an index using the _mapping endpoint on an index:

$ curl http://localhost:9200/library/_mapping?pretty

If there are multiple types and we want to get mappings for a specific type, use the {index}/_mapping/{type} endpoint:

$ curl http://localhost:9200/library/_mapping/book?pretty

This will output a properties map containing all of the fields:

{
 "library": {
 "mappings": {
 "book": {
 "properties": {
 "author": {
 "type": "text",
 "fields": {
 "keyword": {
 "type": "keyword",
 "ignore_above": 256
 }
 }
 },
 "pages": {
 "type": "long"
 },
 "title": {
 "type": "text",
 "fields": {
 "keyword": {
 "type": "keyword",
 "ignore_above": 256
 }
 }
 }
 }
 }
 }
 }
}

The book type contains these three fields and we can see that for each field there is a type defined along with other relevant properties.

We can also get mapping for specific field(s) using {index}/_mapping/{type}/field/{fields}. Multiple fields can be supplied comma separated:

$ curl localhost:9200/library/_mapping/field/title?pretty

We can also check if a type exists using the HEAD verb:

$ curl -XHEAD -i 'http://localhost:9200/library/_mapping/book'
    HTTP/1.1 200 OK
    Content-Type: text/plain; charset=UTF-8
    Content-Length: 0

Status code 200 acknowledges that the type exists, if it doesn't exist it would be 404.

We can also add mappings using the Put Mappings API, which allows us to add a new type or new field.

To add a new type to an index, for example, paper to library, run the following:

$ curl -XPUT 'http://localhost:9200/library/_mapping/paper' -d'
{
 "properties": {
 "abstract": {
 "type": "string"
 }
 } 
}'

Executing this will acknowledge true if it was successfully created. This will create a paper type with an abstract field to a paper mapping type.

Closing, opening, and deleting an index

Sometimes, we might want to stop read/write for a specific index for maintenance or recovery purpose. When an index is closed, it has no overhead on the cluster except maintaining its metadata. We can close/open index using /{index}/_close and /{index}/_open, respectively. To do so, we use the POST verb with curl:

$ curl -XPOST 'http://localhost:9200/library/_close'
$ curl -XPOST 'http://localhost:9200/library/_open'

If we don't want an index at all, we can also delete it using the DELETE verb:

$ curl -XDELETE 'http://localhost:9200/library'

This will delete the index from cluster like it was never there. We can also use wildcard to delete multiple indices, for example, to delete all indices with the name textXXX:

$ curl -XDELETE 'http://localhost:9200/test*'

This would see if there are any indices with a name starting with test, it will delete all those. In case there is no index present with the name test and we are using wildcards, we won't get any errors, but an acknowledgement:

{
 "acknowledged": true
}

But if we delete an index without wildcards and the index does not exist, we will get an exception - index_not_found_exception.

Other operations

There are more operations that this API supports such as clearing the cache, upgrading Elasticsearch indices, force merge, refresh, and flushing:

Clearing Cache: The _cache/clear endpoint helps us to clear cache for one or more indices:
```
$ curl -XPOST "http://localhost:9200/library/_cache/clear"
```
Flush: Using _flush helps us to free memory from one or more indices by clearing the transaction logs and by flushing data to index storage:
```
$ curl -XPOST "http://localhost:9200/library/_flush"
```
Refresh: This API provides the _refresh endpoint to refresh one or more indices:
```
$ curl -XPOST "http://localhost:9200/library/_refresh"
```
Upgrade API: This API helps us to upgrade indices from older Elasticsearch versions to new versions. We can use the _upgrade endpoint to upgrade one or more indices. This process will usually take time:
```
$ curl -XPOST 'http://localhost:9200/library/_upgrade?pretty'
```

This will give us the following output:

{
 "_shards": {
 "total": 15,
 "successful": 10,
 "failed": 0
 },
 "upgraded_indices": {
 "library": {
 "upgrade_version": "5.1.1",
 "oldest_lucene_segment_version": "6.3.0"
 }
 }
}

As we can see, the library index was upgraded successfully and the version it is upgraded to is specified.

Cat APIs

This API helps us to print information nodes, indices, fields, tasks, and plugins in a human readable format rather than a JSON. It can also be visualized how tables are printed on console.

We will learn more about cat APIs in Chapter 8, Elasticsearch APIs, The cat APIs section.

Cluster APIs

These APIs allows us to know about cluster state, health, statistics, node statistics, and node information. We will learn about Cluster APIs in Chapter 8, Elasticsearch APIs, The cluster APIs section.