Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

log response time to Librato #1569

Closed
chadwhitacre opened this issue Oct 9, 2013 · 14 comments
Closed

log response time to Librato #1569

chadwhitacre opened this issue Oct 9, 2013 · 14 comments

Comments

@chadwhitacre
Copy link
Contributor

Reticketed from #1567.

@chadwhitacre
Copy link
Contributor Author

Is it silly to suggest doing this instead of using New Relic (#1508)?

@chadwhitacre
Copy link
Contributor Author

Actually, this might work out great. Librato has a Heroku integration that works with Heroku logging:

https://devcenter.heroku.com/articles/librato

All we have to do is print specially-formed messages to stdout, and apparently Librato will pick those up and process them into metrics for us. That, I think, is cool. That's waaaaaay simpler than integrating New Relic.

I'm testing this out on gittip.whit537.org.

@chadwhitacre
Copy link
Contributor Author

I'm talking to Librato in their Campfire. @librato-peter is walking me through configuring the non-Heroku Librato account I set up yesterday (#1567) for use w/ Heroku.

  1. Create a "Record Only" API key TTW on Librato
  2. Request it using http://dev.librato.com/v1/get/api_tokens/:id, with ?logs_uri=1.
  3. heroku drains:add DRAIN_URL

@chadwhitacre
Copy link
Contributor Author

Use the same log drain for multiple Heroku apps (qa, prod), and append ?source-prefix=qa to the drain URL to vary the "source" in Librato:

http://support.metrics.librato.com/knowledgebase/articles/47904-what-is-a-source-

@chadwhitacre
Copy link
Contributor Author

You can vary the source via stdout as well:

so if you have set your source-prefix at the drain-level to "foo"
and then you put source=bar in a log line
it should show up as "foo.bar" in your librato metrics

E.g.:

$stdout.puts("source=us-east measure#web.latency=4ms")

@chadwhitacre
Copy link
Contributor Author

Okay! I've set up logging of requests per minute and response time in qa, with a public dashboard here:

https://metrics.librato.com/share/dashboards/42hmoj9j?duration=3600&source=qa

The same dashboard will show us prod when we start logging in prod. PR coming ...

capture

@zbynekwinkler
Copy link
Contributor

Looks great. But response time of 7s?? Mean time almost 1s :(. I take it that is not typical for production.

@chadwhitacre
Copy link
Contributor Author

Median in production looks like it's in the 10-20ms range. Our p95 and p99 are pretty bad though.

Anyway, this is live now! 💃

https://metrics.librato.com/share/dashboards/42hmoj9j?duration=3600&source=production

@zbynekwinkler
Copy link
Contributor

Why aren't we using https://metrics.librato.com/metrics/router.service instead? It is without work for us, supplied by default by heroku, understood by librato - for details see https://devcenter.heroku.com/articles/http-routing#heroku-router-log-format and also https://devcenter.heroku.com/articles/librato where they say:

As these details are sourced directly from the Heroku routing layer itself, it’s the only true measure of performance as experienced by your customers, accounting for any delays introduced by Heroku in addition to your application’s processing.

I am in for anything we do not have to maintain :).

@zbynekwinkler
Copy link
Contributor

And for the p95 and p99 - we already know some of the pages causing this: #1417, #1491 or #1585. When the time comes we can add logging to sentry for all requests above some threshold to identify the offending pages. In the future I'd like to make the response times a bit more stable.

@chadwhitacre
Copy link
Contributor Author

And for the p95 and p99 - we already know some of the pages causing this: #1417, #1491 or #1585. When the time comes we can add logging to sentry for all requests above some threshold to identify the offending pages.

Good call.

@chadwhitacre
Copy link
Contributor Author

Why aren't we using https://metrics.librato.com/metrics/router.service instead?

Good find. I'd say let's add that as an additional instrument on the Application Health dashboard. Since we already have the internal response time metric it's work to remove it, and it'll be interesting to see how much time is eaten up by Heroku, no?

If you'd prefer to back out the internal response time tracking lemme know.

@zbynekwinkler
Copy link
Contributor

I've checked, the times are almost the same.

The internal time tracking can be used in the future for identifying the slow pages. Lets deal with it then.

@chadwhitacre
Copy link
Contributor Author

Okay.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants