Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patch / rfc] 0.9.x: Fetch all request targets in a single http query to the backends #1045

Closed
jraby opened this issue Dec 2, 2014 · 0 comments

Comments

@jraby
Copy link

jraby commented Dec 2, 2014

Last week, I tested the 0.9.x branch with the new bulk fetching feature and had high hopes of performance improvement. Our use case includes having multiple backends 'far away' from the frontend (80ms ping latency), so reducing the number of http calls between the frontend and the backends would greatly help us.
The speedup was definitely there, but somehow not as impressive as other reported. Probably because we try to use the aggregated metrics instead of wildcards in our most frequent queries, so the number of http request was already pretty low.

So I set out to find a way to improve the performance, by trying to fetch all targets in a single http request instead of doing one http call per target (per remote backend).

I've come up with a patch which can be found here: datacratic#2

I've written a bit a background and context in that pull request, but here's an overview of how it works:

When a render request comes in:

  • Have a function extract all the pathExpressions (metric name/patterns) from the targets
  • fetch all pathExpressions from all backends and store the result in a hash table keyed per backend and per metric name + timerange for easy lookups:
    cache[remoteBackendName][originalTargetWithWildcards-startTime-endTime] = [ series_from_remote_backend]
  • Store that hash table in the requestContext (which has the targets list, startTime, endTime, etc...)
  • Continue processing as before (via the recursive parser - evaluateTarget)
  • In the fetchData method do a hashtable lookup for the requested data.
    If the data is there, or if there is no data but a prefetch call was made for that pathExpr, skip the remote fetch and use the data from the cache.
    If there is a cache miss, do a regular remote fetch. This case is pretty rare, it happens when functions needs data out of the timerange of the original query. For example: timeshift, movingAverage, etc.

I've done some testing and it seems to be faster is all our use cases (low latency backends and high latency backends). You can see the result of those rudimentary tests at the bottom of the pull request.

Unfortunately, the patch builds on another patch that we use to fetch data from backends in parallel, so I couldn't do a proper pull request for review.
Nonetheless, I thought you guys might be interested in seeing this since the speedup is quite important and I'd also appreciate your input on the more hacky parts of the patch.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant