Increase the size of the history that the query cache tracks. #20116

jpountz · 2016-08-23T08:30:48Z

The current value is 256 which is conservative. There have been a couple cases
that indicated that it might be too low to allow caching at all in the case of
queries that are composed of tens of filters, so I would like to experiment
with a slightly higher value: 1024.

The current value is 256 which is conservative. There have been a couple cases that indicated that it might be too low to allow caching at all in the case of queries that are composed of tens of filters, so I would like to experiment with a slightly higher value: 1024.

jimczi · 2016-08-23T08:38:31Z

LGTM. It's cheap to do it (?) and it should caught more use cases.

jpountz · 2016-08-23T09:02:22Z

It's cheap to do it

I think so, the data structure is a ring buffer kept in sync with a hash table that stores the counts of individual elements, so updates are constant-time.

rmuir · 2016-08-23T11:10:09Z

How can we measure this? I am concerned the ones asking for a larger buffer could simply be the 'vocal minority' with esoteric use cases. You can never please everyone, and there is a risk in diverging from lucene's defaults.

jpountz · 2016-08-23T15:16:39Z

It is definitely hard to know whether vocal ones are a minority or a majority. However, I think 1024 would still be a reasonable value?

Regarding diverging from Lucene defaults, Lucene cannot make assumptions about things like heap size, so I think the defaults of 256 for the history size and 32MB for the cache size make sense. On the other hand, elasticsearch is a server-side application and has a default heap size of 2GB so it is fine to use memory a bit more aggressively?

rmuir · 2016-08-23T15:42:29Z

I almost buy the argument about server-side app, but I have the usual concerns given elasticsearch's history of overcaching (to the point of doing more harm than good).

On the other hand, its been far more common to see slowdowns when it diverges from lucene. If we change this number here unscientifically to 1024, who is to stop the next PR from arguing it should be 4096, and so on.

I guess i'm not buying that its necessarily a performance win. If a query is really common enough to justify it being cached, seeing it N times within M times is an ok heuristic. But I only see this adjusting M and not N. By definition its gonna cache more rare shit, and i have concerns about that.

aleph-zero · 2016-09-07T18:27:34Z

@jpountz Would it be possible to make this a configurable setting with reasonable bounds checking to ensure it's not set to something absurd like @rmuir mentions?

TwP · 2016-09-26T21:14:03Z

I would like to amplify @aleph-zero's request to make this a configurable setting. A common use case we have generates terms queries with more than a thousand terms. A limit of 256 entries in the ring buffer ensures that none of those terms will ever get cached. As users paginate through search results, those same ~1,000 terms query is sent for each page. We noticed a 5x increase in 99th percentile query times when switching from ES 1.7.5 to ES 2.3.5.

I am also going to crosslink a few other issues, too:

Cached query/filter performance degradation from 1.7.3 to 2.3.2 Cached query/filter performance degradation from 1.7.3 to 2.3.2 #18680
Usage-based caching of filters can be ineffective without tuning Usage-based caching of filters can be ineffective without tuning #16031

There have been reports that the query cache did not manage to speed up search requests when the query includes a large number of different sub queries since a single request may manage to exhaust the whole history (256 queries) while the query cache only starts caching queries once they appear multiple times in the history (elastic#16031). On the other hand, increasing the size of the query cache is a bit controversial (elastic#20116) so this pull request proposes a different approach that consists of never caching term queries, and not adding them to the history of queries either. The reasoning is that these queries should be fast anyway, regardless of caching, so taking them out of the equation should not cause any slow down. On the other hand, the fact that they are not added to the cache history anymore means that other queries have greater chances of being cached.

There have been reports that the query cache did not manage to speed up search requests when the query includes a large number of different sub queries since a single request may manage to exhaust the whole history (256 queries) while the query cache only starts caching queries once they appear multiple times in the history (#16031). On the other hand, increasing the size of the query cache is a bit controversial (#20116) so this pull request proposes a different approach that consists of never caching term queries, and not adding them to the history of queries either. The reasoning is that these queries should be fast anyway, regardless of caching, so taking them out of the equation should not cause any slow down. On the other hand, the fact that they are not added to the cache history anymore means that other queries have greater chances of being cached.

jpountz · 2016-11-16T09:45:17Z

Superseded by #21566.

jpountz added >enhancement :Cache v5.0.0-beta1 labels Aug 23, 2016

clintongormley added v5.0.0 and removed v5.0.0-beta1 labels Sep 14, 2016

clintongormley added v5.1.1 and removed v5.0.0 labels Oct 18, 2016

jpountz mentioned this pull request Nov 15, 2016

Do not cache term queries. #21566

Merged

jpountz closed this Nov 16, 2016

jpountz removed the v5.1.1 label Nov 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase the size of the history that the query cache tracks. #20116

Increase the size of the history that the query cache tracks. #20116

jpountz commented Aug 23, 2016

jimczi commented Aug 23, 2016

jpountz commented Aug 23, 2016

rmuir commented Aug 23, 2016

jpountz commented Aug 23, 2016

rmuir commented Aug 23, 2016

aleph-zero commented Sep 7, 2016

TwP commented Sep 26, 2016

jpountz commented Nov 16, 2016

Increase the size of the history that the query cache tracks. #20116

Increase the size of the history that the query cache tracks. #20116

Conversation

jpountz commented Aug 23, 2016

jimczi commented Aug 23, 2016

jpountz commented Aug 23, 2016

rmuir commented Aug 23, 2016

jpountz commented Aug 23, 2016

rmuir commented Aug 23, 2016

aleph-zero commented Sep 7, 2016

TwP commented Sep 26, 2016

jpountz commented Nov 16, 2016