Ensure new indexes can be created during depoyment #210

mrdavidlaing · 2016-03-07T14:00:32Z

Following on from the [known limitations]https://github.com/logsearch/logsearch-boshrelease/releases/tag/v201.0.0#known-limitations) in v201 and the discussion at #209 ...

As of v201, shard allocation is disabled at the beginning of a deployment that affects elasticsearch nodes; and then manually re-enabled after deployment with bosh run errand enable_shard_allocation

This means that:

a. Unnecessary shard movements are avoided during deployment; speeding up deploys
b. Primary indices remain "green" throughout the deployment so that
c. New data can be written to existing indexes during the deployment

BUT:

d. New indexes cannot be created during deployment
e. Index replicas remain un-allocated until the enable_shard_allocation errand is run.

The purpose of this issue is to capture ideas for alternative techniques that can remove the 2 remaining limitations - d & e.

The text was updated successfully, but these errors were encountered:

mrdavidlaing · 2016-03-07T15:29:14Z

@voelzmo, @dpb587 - Any thoughts on how we should proceed?

dpb587 · 2016-03-07T21:06:04Z

Might also want to experiment with cluster.routing.allocation.cluster_concurrent_rebalance and other throttling settings. Reducing pre-deploy, resetting post-deploy. By setting them extremely low, you'd effectively put the rebalancing on pause, but still allow new primaries+replicas to be allocated as necessary. Used it a few times at cityindex. Important to note that index and shard size is a factor though. If other disaster scenarios occurred during the deployment, it'd cause additional strain and delay on the cluster, though. Once a shard does start transferring though, can't interrupt it even if the original node is back online mid-transfer.

You may be able to take advantage of how BOSH will operate on a single AZ in cloud-config-aware environments. If you assume operators are using awareness settings to distribute their shards in an HA-aware scenario across AZs, you might be able to take advantage of this. Drain could use a document to persist which AZ is being restarted. If it's the current AZ, restart. If not, and if the cluster is not yet green, wait until it is or the doc is the AZ, and then update the doc and restart. This approach would require operators to use cloud-config and specific index awareness settings though.

A couple other things that concerned me when initially looking at the new drain script...

any scale down operations have the potential to unexpectedly lose data because there's no waiting to ensure shards have been rebalanced away from the former VM. If a quorum or shard replica set are on the VMs BOSH decides to terminate, the cluster can't become green.
when an index has >1 replica, the index will become unstable after quorum-1 VMs with the shard have been restarted since replicas will not automatically be reopened. Writes will fail and reads may fail.

I don't think there's an easy solution though. I'll think about it some more.

mrdavidlaing · 2016-03-08T11:25:29Z

@dpb587 Excellent points; thank you. Looks like we have some experimenting to do with cluster.routing.allocation.cluster_concurrent_rebalance

We're considering scale down as a special case, and attempting to document the gotchas.

We're also telling people to not have more "in flight" nodes than replicas.

when an index has >1 replica, the index will become unstable after quorum-1 VMs with the shard have been restarted since replicas will not automatically be reopened. Writes will fail and reads may fail.

True. Meaning that the cluster will go red during upgrades ( with logs backing up in the queue ); but it should then recover once the enable_shard_allocation errand is run.

voelzmo · 2016-07-06T09:14:01Z

@mrdavidlaing any updates on the topic in the meanwhile? Is it still the case that I have to run an errand to re-enable shard-allocation after a successful deployment?

mrdavidlaing · 2016-07-07T21:52:32Z

@voelzmo You still need to run the re-enable shard-allocation errand post deployment

jmcarp · 2016-09-23T03:02:56Z

Just curious--could re-enabling shard allocation be moved to a post-deploy script instead of an errand so that operators don't have to run the errand manually or add it to CI?

mrdavidlaing · 2016-09-24T06:43:18Z

Joshua,

Definately - we're just waiting for that functionality to become widely
available ( ie, people have upgraded their BOSHes )

On 23 September 2016 at 04:02, Joshua Carp [email protected] wrote:

Just curious--could re-enabling shard allocation be moved to a post-deploy
script instead of an errand so that operators don't have to run the errand
manually or add it to CI?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#210 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAN4scrxAvfp2qgT8MopRqrAwz1R5V1Dks5qs0FggaJpZM4HqzA0
.

David Laing
logsearch.io - build your own open source cloud logging cluster
http://davidlaing.com

voelzmo · 2016-09-24T07:13:49Z

Even when people upgraded their Directors: keep in mind that post-deploy is still optional and disabled by default: http://bosh.io/jobs/director?source=github.com/cloudfoundry/bosh&version=257.15#p=director.enable_post_deploy

mrdavidlaing added the scheduled label Mar 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure new indexes can be created during depoyment #210

Ensure new indexes can be created during depoyment #210

mrdavidlaing commented Mar 7, 2016

mrdavidlaing commented Mar 7, 2016

dpb587 commented Mar 7, 2016

mrdavidlaing commented Mar 8, 2016

voelzmo commented Jul 6, 2016

mrdavidlaing commented Jul 7, 2016

jmcarp commented Sep 23, 2016

mrdavidlaing commented Sep 24, 2016

voelzmo commented Sep 24, 2016

Ensure new indexes can be created during depoyment #210

Ensure new indexes can be created during depoyment #210

Comments

mrdavidlaing commented Mar 7, 2016

mrdavidlaing commented Mar 7, 2016

dpb587 commented Mar 7, 2016

mrdavidlaing commented Mar 8, 2016

voelzmo commented Jul 6, 2016

mrdavidlaing commented Jul 7, 2016

jmcarp commented Sep 23, 2016

mrdavidlaing commented Sep 24, 2016

voelzmo commented Sep 24, 2016