Skip to content

Increase the size of the history that the query cache tracks. #20116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 23, 2016

The current value is 256 which is conservative. There have been a couple cases
that indicated that it might be too low to allow caching at all in the case of
queries that are composed of tens of filters, so I would like to experiment
with a slightly higher value: 1024.

The current value is 256 which is conservative. There have been a couple cases
that indicated that it might be too low to allow caching at all in the case of
queries that are composed of tens of filters, so I would like to experiment
with a slightly higher value: 1024.
@jimczi
Copy link
Contributor

jimczi commented Aug 23, 2016

LGTM. It's cheap to do it (?) and it should caught more use cases.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 23, 2016

It's cheap to do it

I think so, the data structure is a ring buffer kept in sync with a hash table that stores the counts of individual elements, so updates are constant-time.

@rmuir
Copy link
Contributor

rmuir commented Aug 23, 2016

How can we measure this? I am concerned the ones asking for a larger buffer could simply be the 'vocal minority' with esoteric use cases. You can never please everyone, and there is a risk in diverging from lucene's defaults.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 23, 2016

It is definitely hard to know whether vocal ones are a minority or a majority. However, I think 1024 would still be a reasonable value?

Regarding diverging from Lucene defaults, Lucene cannot make assumptions about things like heap size, so I think the defaults of 256 for the history size and 32MB for the cache size make sense. On the other hand, elasticsearch is a server-side application and has a default heap size of 2GB so it is fine to use memory a bit more aggressively?

@rmuir
Copy link
Contributor

rmuir commented Aug 23, 2016

I almost buy the argument about server-side app, but I have the usual concerns given elasticsearch's history of overcaching (to the point of doing more harm than good).

On the other hand, its been far more common to see slowdowns when it diverges from lucene. If we change this number here unscientifically to 1024, who is to stop the next PR from arguing it should be 4096, and so on.

I guess i'm not buying that its necessarily a performance win. If a query is really common enough to justify it being cached, seeing it N times within M times is an ok heuristic. But I only see this adjusting M and not N. By definition its gonna cache more rare shit, and i have concerns about that.

@aleph-zero
Copy link
Contributor

@jpountz Would it be possible to make this a configurable setting with reasonable bounds checking to ensure it's not set to something absurd like @rmuir mentions?

@TwP
Copy link

TwP commented Sep 26, 2016

I would like to amplify @aleph-zero's request to make this a configurable setting. A common use case we have generates terms queries with more than a thousand terms. A limit of 256 entries in the ring buffer ensures that none of those terms will ever get cached. As users paginate through search results, those same ~1,000 terms query is sent for each page. We noticed a 5x increase in 99th percentile query times when switching from ES 1.7.5 to ES 2.3.5.

I am also going to crosslink a few other issues, too:

jpountz added a commit to jpountz/elasticsearch that referenced this pull request Nov 15, 2016
There have been reports that the query cache did not manage to speed up search
requests when the query includes a large number of different sub queries since
a single request may manage to exhaust the whole history (256 queries) while
the query cache only starts caching queries once they appear multiple times in
the history (elastic#16031). On the other hand, increasing the size of the query cache
is a bit controversial (elastic#20116) so this pull request proposes a different
approach that consists of never caching term queries, and not adding them to the
history of queries either. The reasoning is that these queries should be fast
anyway, regardless of caching, so taking them out of the equation should not
cause any slow down. On the other hand, the fact that they are not added to the
cache history anymore means that other queries have greater chances of being
cached.
jpountz added a commit that referenced this pull request Nov 16, 2016
There have been reports that the query cache did not manage to speed up search
requests when the query includes a large number of different sub queries since
a single request may manage to exhaust the whole history (256 queries) while
the query cache only starts caching queries once they appear multiple times in
the history (#16031). On the other hand, increasing the size of the query cache
is a bit controversial (#20116) so this pull request proposes a different
approach that consists of never caching term queries, and not adding them to the
history of queries either. The reasoning is that these queries should be fast
anyway, regardless of caching, so taking them out of the equation should not
cause any slow down. On the other hand, the fact that they are not added to the
cache history anymore means that other queries have greater chances of being
cached.
jpountz added a commit that referenced this pull request Nov 16, 2016
There have been reports that the query cache did not manage to speed up search
requests when the query includes a large number of different sub queries since
a single request may manage to exhaust the whole history (256 queries) while
the query cache only starts caching queries once they appear multiple times in
the history (#16031). On the other hand, increasing the size of the query cache
is a bit controversial (#20116) so this pull request proposes a different
approach that consists of never caching term queries, and not adding them to the
history of queries either. The reasoning is that these queries should be fast
anyway, regardless of caching, so taking them out of the equation should not
cause any slow down. On the other hand, the fact that they are not added to the
cache history anymore means that other queries have greater chances of being
cached.
jpountz added a commit that referenced this pull request Nov 16, 2016
There have been reports that the query cache did not manage to speed up search
requests when the query includes a large number of different sub queries since
a single request may manage to exhaust the whole history (256 queries) while
the query cache only starts caching queries once they appear multiple times in
the history (#16031). On the other hand, increasing the size of the query cache
is a bit controversial (#20116) so this pull request proposes a different
approach that consists of never caching term queries, and not adding them to the
history of queries either. The reasoning is that these queries should be fast
anyway, regardless of caching, so taking them out of the equation should not
cause any slow down. On the other hand, the fact that they are not added to the
cache history anymore means that other queries have greater chances of being
cached.
jpountz added a commit that referenced this pull request Nov 16, 2016
There have been reports that the query cache did not manage to speed up search
requests when the query includes a large number of different sub queries since
a single request may manage to exhaust the whole history (256 queries) while
the query cache only starts caching queries once they appear multiple times in
the history (#16031). On the other hand, increasing the size of the query cache
is a bit controversial (#20116) so this pull request proposes a different
approach that consists of never caching term queries, and not adding them to the
history of queries either. The reasoning is that these queries should be fast
anyway, regardless of caching, so taking them out of the equation should not
cause any slow down. On the other hand, the fact that they are not added to the
cache history anymore means that other queries have greater chances of being
cached.
@jpountz
Copy link
Contributor Author

jpountz commented Nov 16, 2016

Superseded by #21566.

@jpountz jpountz closed this Nov 16, 2016
@jpountz jpountz removed the v5.1.1 label Nov 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants