Auteur Daan Aikema Tags Technische blog Datum

Elasticsearch is awesome. The many possibilities makes this tool a must-have when you want to work with data. Search data, make analyzes or store logs. It can do it all. But Elasticsearch has some downsides. Putting data in Elasticsearch is easy. It will create a mapping automatically for every field, or use the mapping you have defined for it. But when you want to change this mapping for a specific field you are in trouble. Elasticsearch won’t let you do this. And that makes sense: What to do with the current data? You have to reindex all the documents to fit the new mapping.

The process for reindexing all documents in Elasticsearch without losing data:

  1. Create a new temporary index
  2. Reindex the data from the previous to this temporary index
  3. Delete the current index
  4. Create it again and apply the new mapping
  5. Reindex the documents from the temporary to the new created current index

The downside to this process is that the index is unavailable (and logically all search functionality using the index) from the moment the actual index is removed until it has been fully recreated. This implies that changing the mapping of an index need to be done during a maintenance.

However, if the index is wrapped in an alias we can skip the last 2 steps and the temporary index becomes the current. This way the index won't be unavailable. At one time you reach the old index, and 1 second later you immediately reach the new index. In this article i will describe how i’ve used aliases to reach the zero downtime principle in Elasticsearch.

Note: For our customer we use Elasticsearch 5.6. I’ve changed the examples in this article to version 7.1, but you can use the same workaround in version 5 and 6. You only have to check the documentation to see if some fields are different in previous versions.

Setup

 

During one of my projects as an Integration Engineer I encountered the following simplified example:

 

Zero downtime in Elasticsearch

 

Product information was spread across separate databases and to enable high performant search of the scattered data I used Elasticsearch to create an index with the combined data.

Simplified, this is a document for this index:

Create index template

First we will define a mapping for the product index. I’d like to use index templates for this. In such a template you can define a pattern for the index name. Then you create a new index. It will automatically use this template:

PUT _template/product

As you can see i use the pattern “products_v*”. All my indices in the product namespace use a version for better clarity.

Create index based on the template

Next, we will create a new index that will use the template that we’ve created in the previous step:

PUT products_v1

You can check if the mapping is automatically applied to this index:

GET products_v1/_mapping

This should return the mapping for the products index.

Create the alias

The last step is to define an alias for this index. Consumers will use this alias to consume the products. We will point this alias to the first version of the products index:

Remove the old index

And lastly, remove the first products index:

DELETE /products_v1

And there you have it! Zero downtime deployment in Elasticsearch. We’ve shown you an example on how to use it during a mapping change. But it is also easily conceivable to use for updating an index you normally would need to do a lot of sub tasks before the consumer can request this data.

Deel jouw IT-vraagstuk.

Vul het formulier in en ontvang binnen 2 werkdagen een reactie.

Heb je liever persoonlijk contact? Bel dan naar +31 (0)30 227 31 97.

Wil jij weten wat we met je gegevens uit dit formulier doen? Lees dan onze privacyverklaring.

IT’s easy.
  1. Beschrijf het IT-vraagstuk.
  2. Wij bekijken het IT-vraagstuk en bellen je terug met een voorstel.
  3. Op basis daarvan beslis jij of je een kennismakingsgesprek wilt.