log response time to Librato #1569

chadwhitacre · 2013-10-09T14:16:26Z

Reticketed from #1567.

chadwhitacre · 2013-10-09T14:21:46Z

Is it silly to suggest doing this instead of using New Relic (#1508)?

chadwhitacre · 2013-10-09T15:33:57Z

Actually, this might work out great. Librato has a Heroku integration that works with Heroku logging:

https://devcenter.heroku.com/articles/librato

All we have to do is print specially-formed messages to stdout, and apparently Librato will pick those up and process them into metrics for us. That, I think, is cool. That's waaaaaay simpler than integrating New Relic.

I'm testing this out on gittip.whit537.org.

chadwhitacre · 2013-10-09T16:13:02Z

I'm talking to Librato in their Campfire. @librato-peter is walking me through configuring the non-Heroku Librato account I set up yesterday (#1567) for use w/ Heroku.

Create a "Record Only" API key TTW on Librato
Request it using http://dev.librato.com/v1/get/api_tokens/:id, with ?logs_uri=1.
heroku drains:add DRAIN_URL

chadwhitacre · 2013-10-09T18:37:17Z

Use the same log drain for multiple Heroku apps (qa, prod), and append ?source-prefix=qa to the drain URL to vary the "source" in Librato:

http://support.metrics.librato.com/knowledgebase/articles/47904-what-is-a-source-

chadwhitacre · 2013-10-09T18:49:53Z

You can vary the source via stdout as well:

so if you have set your source-prefix at the drain-level to "foo"
and then you put source=bar in a log line
it should show up as "foo.bar" in your librato metrics

E.g.:

$stdout.puts("source=us-east measure#web.latency=4ms")

chadwhitacre · 2013-10-09T19:03:25Z

Okay! I've set up logging of requests per minute and response time in qa, with a public dashboard here:

https://metrics.librato.com/share/dashboards/42hmoj9j?duration=3600&source=qa

The same dashboard will show us prod when we start logging in prod. PR coming ...

zbynekwinkler · 2013-10-10T08:44:15Z

Looks great. But response time of 7s?? Mean time almost 1s :(. I take it that is not typical for production.

chadwhitacre · 2013-10-10T18:25:28Z

Median in production looks like it's in the 10-20ms range. Our p95 and p99 are pretty bad though.

Anyway, this is live now! 💃

https://metrics.librato.com/share/dashboards/42hmoj9j?duration=3600&source=production

zbynekwinkler · 2013-10-12T15:31:13Z

Why aren't we using https://metrics.librato.com/metrics/router.service instead? It is without work for us, supplied by default by heroku, understood by librato - for details see https://devcenter.heroku.com/articles/http-routing#heroku-router-log-format and also https://devcenter.heroku.com/articles/librato where they say:

As these details are sourced directly from the Heroku routing layer itself, it’s the only true measure of performance as experienced by your customers, accounting for any delays introduced by Heroku in addition to your application’s processing.

I am in for anything we do not have to maintain :).

zbynekwinkler · 2013-10-12T18:14:54Z

And for the p95 and p99 - we already know some of the pages causing this: #1417, #1491 or #1585. When the time comes we can add logging to sentry for all requests above some threshold to identify the offending pages. In the future I'd like to make the response times a bit more stable.

chadwhitacre · 2013-10-14T20:43:43Z

And for the p95 and p99 - we already know some of the pages causing this: #1417, #1491 or #1585. When the time comes we can add logging to sentry for all requests above some threshold to identify the offending pages.

Good call.

chadwhitacre · 2013-10-14T20:47:27Z

Why aren't we using https://metrics.librato.com/metrics/router.service instead?

Good find. I'd say let's add that as an additional instrument on the Application Health dashboard. Since we already have the internal response time metric it's work to remove it, and it'll be interesting to see how much time is eaten up by Heroku, no?

If you'd prefer to back out the internal response time tracking lemme know.

zbynekwinkler · 2013-10-14T20:56:22Z

I've checked, the times are almost the same.

The internal time tracking can be used in the future for identifying the slow pages. Lets deal with it then.

chadwhitacre · 2013-10-14T20:58:24Z

Okay.

This was referenced Oct 9, 2013

Emit log messages for Librato to pick up #1570

Merged

log db time to Librato #1571

Closed

Add New Relic heroku addon #1508

Closed

chadwhitacre mentioned this issue Oct 10, 2013

Backup DB without slowing down the site #1327

Closed

chadwhitacre closed this as completed Oct 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log response time to Librato #1569

log response time to Librato #1569

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

zbynekwinkler commented Oct 10, 2013

chadwhitacre commented Oct 10, 2013

zbynekwinkler commented Oct 12, 2013

zbynekwinkler commented Oct 12, 2013

chadwhitacre commented Oct 14, 2013

chadwhitacre commented Oct 14, 2013

zbynekwinkler commented Oct 14, 2013

chadwhitacre commented Oct 14, 2013

log response time to Librato #1569

log response time to Librato #1569

Comments

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

chadwhitacre commented Oct 9, 2013

zbynekwinkler commented Oct 10, 2013

chadwhitacre commented Oct 10, 2013

zbynekwinkler commented Oct 12, 2013

zbynekwinkler commented Oct 12, 2013

chadwhitacre commented Oct 14, 2013

chadwhitacre commented Oct 14, 2013

zbynekwinkler commented Oct 14, 2013

chadwhitacre commented Oct 14, 2013