Elasticsearch is awesome. The many possibilities makes this tool a must-have when you want to work with data. Search data, make analyzes or store logs. It can do it all. But Elasticsearch has some downsides. Putting data in Elasticsearch is easy. It will create a mapping automatically for every field, or use the mapping you have defined for it. But when you want to change this mapping for a specific field you are in trouble. Elasticsearch won’t let you do this. And that makes sense: What to do with the current data? You have to reindex all the documents to fit the new mapping.
The process for reindexing all documents in Elasticsearch without losing data:
- Create a new temporary index
- Reindex the data from the previous to this temporary index
- Delete the current index
- Create it again and apply the new mapping
- Reindex the documents from the temporary to the new created current index
The downside to this process is that the index is unavailable (and logically all search functionality using the index) from the moment the actual index is removed until it has been fully recreated. This implies that changing the mapping of an index need to be done during a maintenance.
However, if the index is wrapped in an alias we can skip the last 2 steps and the temporary index becomes the current. This way the index won't be unavailable. At one time you reach the old index, and 1 second later you immediately reach the new index. In this article i will describe how i’ve used aliases to reach the zero downtime principle in Elasticsearch.
Note: For our customer we use Elasticsearch 5.6. I’ve changed the examples in this article to version 7.1, but you can use the same workaround in version 5 and 6. You only have to check the documentation to see if some fields are different in previous versions.
During one of my projects as an Integration Engineer I encountered the following simplified example:
Product information was spread across separate databases and to enable high performant search of the scattered data I used Elasticsearch to create an index with the combined data.
Simplified, this is a document for this index:
Create index template
First we will define a mapping for the product index. I’d like to use index templates for this. In such a template you can define a pattern for the index name. Then you create a new index. It will automatically use this template:
As you can see i use the pattern “products_v*”. All my indices in the product namespace use a version for better clarity.
Create index based on the template
Next, we will create a new index that will use the template that we’ve created in the previous step:
You can check if the mapping is automatically applied to this index:
This should return the mapping for the products index.
Create the alias
The last step is to define an alias for this index. Consumers will use this alias to consume the products. We will point this alias to the first version of the products index:
Remove the old index
And lastly, remove the first products index:
And there you have it! Zero downtime deployment in Elasticsearch. We’ve shown you an example on how to use it during a mapping change. But it is also easily conceivable to use for updating an index you normally would need to do a lot of sub tasks before the consumer can request this data.