Big, popular forums start out as small forums. One day we will find that one shard in our shared index is doing a lot more work than the other shards, because it holds the documents for a forum that has become very popular. That forum now needs its own index.
The index aliases that we’re using to fake an index per user give us a clean migration path for the big forum.
The first step is to create a new index dedicated to the forum, and with the appropriate number of shards to allow for expected growth:
PUT /baking_v1
{
"settings": {
"number_of_shards": 3
}
}
The next step is to migrate the data from the shared index into the dedicated
index, which can be done using a scroll
query and the
bulk
API. As soon as the migration is finished, the index alias
can be updated to point to the new index:
POST /_aliases
{
"actions": [
{ "remove": { "alias": "baking", "index": "forums" }},
{ "add": { "alias": "baking", "index": "baking_v1" }}
]
}
Updating the alias is atomic; it’s like throwing a switch. Your application
continues talking to the baking
API and is completely unaware that it now
points to a new dedicated index.
The dedicated index no longer needs the filter or the routing values. We can
just rely on the default sharding that Elasticsearch does using each
document’s _id
field.
The last step is to remove the old documents from the shared index, which can be done by searching using the original routing value and forum ID and performing a bulk delete.
The beauty of this index-per-user model is that it allows you to reduce resources, keeping costs low, while still giving you the flexibility to scale out when necessary, and with zero downtime.