elasticsearch_master fails when minimum_master_nodes is set to >1 #167

petar-petrov-sap · 2015-08-12T14:39:45Z

Hi,

Since adding kibana 4.1.1 as a separate job in elasticsearch_master it can no longer be used in cluster environment because the following happens:

When installing the deployment for first time, elasticsearch_master_z1/0 is being created. The VM contains 3 jobs: api, elasticsearch and kibana.
Elasticsearch job and api job get started, monit shows them as running. Kibana is waiting for start.
Elasticsearch process runs, it is configured with minimum_master_nodes: 2 therefore http://127.0.0.1:9200 returns 503 (waiting for other master to be available)
Kibana ctl script waits for elasticsearch to return 200 (which it never does) and fails after 5 minutes (timeout is set in ctl script).

By deleting the wait-for-elasticsearch-200-respone in kibana ctl script the cluster deploys and start as expected.

Kind regards,
Petar Petrov

dpb587 · 2015-08-13T21:41:26Z

Some background... kibana4 will not start successfully if elasticsearch is unavailable (because kibana needs to load and utilize configuration from elasticsearch). This is why kibana startup is blocked by the health check of elasticsearch in the control script. If you know this kibana behavior has changed, do let us know.

With that in mind, I consider the scenario you're describing to be expected, although inconvenient, as a sort of chicken or the egg problem. In your scenario, by removing the wait-for-elasticsearch-200-response, kibana was able to start, but after 60 seconds it timed out and exited with an error. That 60s period was long enough for monit to think it was running and moved on to provisioning your next VMs which eventually brought the cluster into good shape where kibana could stay up.

From a deployment perspective, your options are:

Don't configure elasticsearch to need 2 master nodes until 2 master nodes are actually available.
Don't deploy kibana until the datastore it needs is available.

The only change I can think of is to switch from a 200 status check to a simple port openness check (nc -z). This would ensure we at least are able to talk to the expected, configured port before moving on, and therefore be a bit more gracious in your mentioned scenario. The downside is that if the operator did indeed misconfigure the upstream configuration, it may be more difficult to diagnose.

@mrdavidlaing, you're more familiar with kibana4 - what do you think of switching to a naive port check for the purposes of this scenario, and is that something we can take a PR for?

dpb587 · 2015-08-26T16:39:51Z

@mrdavidlaing, thoughts?

mrdavidlaing · 2015-08-27T05:24:30Z

@dpb587, @petar-petrov-sap - The way that Kibana starts up and checks for elasticsearch being available will be overhauled as part of the Kibana 4.2 release. I'd advise not making any changes until we know how that behaviour will change.

Until then, we should change our example templates ( bosh-lite and spiff )to have a separate Kibana job (rather than co-locating) at the end of the manifest (so BOSH only deploys it AFTER the ES nodes have been set up).

@petar-petrov-sap - Any chance you could submit a PR with that alternate template structure?

petar-petrov-sap · 2015-08-27T08:04:02Z

Currently we benefit from the fact that Kibana and Elasticsearch are on the same VM, see #166 Splitting them does not fit to our setup.

mrdavidlaing · 2015-08-27T09:23:23Z

@petar-petrov-sap - you can keep the nginx proxy to Kibana on the same api box; whilst still hosting Kibana on a different VM to get around the startup issue.

I'm afraid we can't think of a better short term solution to this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

elasticsearch_master fails when minimum_master_nodes is set to >1 #167

elasticsearch_master fails when minimum_master_nodes is set to >1 #167

petar-petrov-sap commented Aug 12, 2015

dpb587 commented Aug 13, 2015

dpb587 commented Aug 26, 2015

mrdavidlaing commented Aug 27, 2015

petar-petrov-sap commented Aug 27, 2015

mrdavidlaing commented Aug 27, 2015

elasticsearch_master fails when minimum_master_nodes is set to >1 #167

elasticsearch_master fails when minimum_master_nodes is set to >1 #167

Comments

petar-petrov-sap commented Aug 12, 2015

dpb587 commented Aug 13, 2015

dpb587 commented Aug 26, 2015

mrdavidlaing commented Aug 27, 2015

petar-petrov-sap commented Aug 27, 2015

mrdavidlaing commented Aug 27, 2015