support health check information? #24

ogrodnek · 2016-10-06T17:18:34Z

First, thanks for the great mesos and marathon tooling -- it's all been extremely solid and reliable!

I am using marathon-consul along with haproxy-consul for load balancing.

As soon as a task is running, it's added to consul and thus the haproxy config. If an application has a non-trivial start up time, I will start to see 503s. That is, tasks are added to the LB before they are reporting healthy.

I've been trying to figure out the best way to solve this. The app definition in consul does have health check info, so, theoretically I could translate the marathon checks to haproxy checks in my haproxy-consul template. Practically though the marathon checks could be more complex than haproxy can express (e.g. some of the apps are using "COMMAND" marathon health checks). It also seems non-trivial to be able to do the translation in the template, and, there might be a chance haproxy and marathon don't agree.

My current thinking is that it makes sense to leave the health checks evaluated by marathon, and include this information in the KV store (perhaps /<app>/<task>/healthCheck ?)

Then in the template I can check that tasks where their app has a health check is defined is only rendered if the task's health check is passing (i.e. not unknown or failing).

I believe this can be done by adding support for the following event types from the event bus:

add_health_check_event
remove_health_check_event
failed_health_check_event
health_status_changed_event

and including this new information in the KV.

I am happy to submit a PR for this, and would love to get any feedback on this approach, if there's a better way, etc.

thanks!

The text was updated successfully, but these errors were encountered:

ogrodnek · 2016-10-17T22:37:08Z

I've simplified this a bit, by including a HealthCheckResults field directly into the task (already reported by marathon for GET /v2/apps/<app>/tasks), and updating this in the KV when the health_status_changed_event event is fired.

PR is here: #25

would love any feedback,
thanks!

bwhaley · 2016-11-15T05:35:03Z

We are using this in production now and it's working great. Really helps out with the lag time between new tasks being added and actually being alive and healthy. 👍

yogeshnath · 2017-06-16T19:43:53Z

I recently updated to this version to get rid of 503's, It works great if the healthcheck is http or tcp.

Though if healthcheck is based on command ( say curl), Marathon actually shows it as healthy but consul json has "healthCheckResults" set to null. So now Haproxy fails to pick it up based on healthy value ( "healthy":false).

any reason why?

ogrodnek mentioned this issue Oct 17, 2016

update task health status based on health_status_changed_event #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support health check information? #24

support health check information? #24

ogrodnek commented Oct 6, 2016

ogrodnek commented Oct 17, 2016

bwhaley commented Nov 15, 2016

yogeshnath commented Jun 16, 2017

support health check information? #24

support health check information? #24

Comments

ogrodnek commented Oct 6, 2016

ogrodnek commented Oct 17, 2016

bwhaley commented Nov 15, 2016

yogeshnath commented Jun 16, 2017