Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch_master fails when minimum_master_nodes is set to >1 #167

Open
petar-petrov-sap opened this issue Aug 12, 2015 · 5 comments
Open

Comments

@petar-petrov-sap
Copy link

Hi,

Since adding kibana 4.1.1 as a separate job in elasticsearch_master it can no longer be used in cluster environment because the following happens:

  1. When installing the deployment for first time, elasticsearch_master_z1/0 is being created. The VM contains 3 jobs: api, elasticsearch and kibana.
  2. Elasticsearch job and api job get started, monit shows them as running. Kibana is waiting for start.
  3. Elasticsearch process runs, it is configured with minimum_master_nodes: 2 therefore http://127.0.0.1:9200 returns 503 (waiting for other master to be available)
  4. Kibana ctl script waits for elasticsearch to return 200 (which it never does) and fails after 5 minutes (timeout is set in ctl script).

By deleting the wait-for-elasticsearch-200-respone in kibana ctl script the cluster deploys and start as expected.

Kind regards,
Petar Petrov

@dpb587
Copy link
Contributor

dpb587 commented Aug 13, 2015

Some background... kibana4 will not start successfully if elasticsearch is unavailable (because kibana needs to load and utilize configuration from elasticsearch). This is why kibana startup is blocked by the health check of elasticsearch in the control script. If you know this kibana behavior has changed, do let us know.

With that in mind, I consider the scenario you're describing to be expected, although inconvenient, as a sort of chicken or the egg problem. In your scenario, by removing the wait-for-elasticsearch-200-response, kibana was able to start, but after 60 seconds it timed out and exited with an error. That 60s period was long enough for monit to think it was running and moved on to provisioning your next VMs which eventually brought the cluster into good shape where kibana could stay up.

From a deployment perspective, your options are:

  1. Don't configure elasticsearch to need 2 master nodes until 2 master nodes are actually available.
  2. Don't deploy kibana until the datastore it needs is available.

The only change I can think of is to switch from a 200 status check to a simple port openness check (nc -z). This would ensure we at least are able to talk to the expected, configured port before moving on, and therefore be a bit more gracious in your mentioned scenario. The downside is that if the operator did indeed misconfigure the upstream configuration, it may be more difficult to diagnose.

@mrdavidlaing, you're more familiar with kibana4 - what do you think of switching to a naive port check for the purposes of this scenario, and is that something we can take a PR for?

@dpb587
Copy link
Contributor

dpb587 commented Aug 26, 2015

@mrdavidlaing, thoughts?

@mrdavidlaing
Copy link
Member

@dpb587, @petar-petrov-sap - The way that Kibana starts up and checks for elasticsearch being available will be overhauled as part of the Kibana 4.2 release. I'd advise not making any changes until we know how that behaviour will change.

Until then, we should change our example templates ( bosh-lite and spiff )to have a separate Kibana job (rather than co-locating) at the end of the manifest (so BOSH only deploys it AFTER the ES nodes have been set up).

@petar-petrov-sap - Any chance you could submit a PR with that alternate template structure?

@petar-petrov-sap
Copy link
Author

Currently we benefit from the fact that Kibana and Elasticsearch are on the same VM, see #166 Splitting them does not fit to our setup.

@mrdavidlaing
Copy link
Member

@petar-petrov-sap - you can keep the nginx proxy to Kibana on the same api box; whilst still hosting Kibana on a different VM to get around the startup issue.

I'm afraid we can't think of a better short term solution to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants