-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Increase the size of the history that the query cache tracks. #20116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase the size of the history that the query cache tracks. #20116
Conversation
The current value is 256 which is conservative. There have been a couple cases that indicated that it might be too low to allow caching at all in the case of queries that are composed of tens of filters, so I would like to experiment with a slightly higher value: 1024.
LGTM. It's cheap to do it (?) and it should caught more use cases. |
I think so, the data structure is a ring buffer kept in sync with a hash table that stores the counts of individual elements, so updates are constant-time. |
How can we measure this? I am concerned the ones asking for a larger buffer could simply be the 'vocal minority' with esoteric use cases. You can never please everyone, and there is a risk in diverging from lucene's defaults. |
It is definitely hard to know whether vocal ones are a minority or a majority. However, I think 1024 would still be a reasonable value? Regarding diverging from Lucene defaults, Lucene cannot make assumptions about things like heap size, so I think the defaults of 256 for the history size and 32MB for the cache size make sense. On the other hand, elasticsearch is a server-side application and has a default heap size of 2GB so it is fine to use memory a bit more aggressively? |
I almost buy the argument about server-side app, but I have the usual concerns given elasticsearch's history of overcaching (to the point of doing more harm than good). On the other hand, its been far more common to see slowdowns when it diverges from lucene. If we change this number here unscientifically to 1024, who is to stop the next PR from arguing it should be 4096, and so on. I guess i'm not buying that its necessarily a performance win. If a query is really common enough to justify it being cached, seeing it N times within M times is an ok heuristic. But I only see this adjusting M and not N. By definition its gonna cache more rare shit, and i have concerns about that. |
I would like to amplify @aleph-zero's request to make this a configurable setting. A common use case we have generates I am also going to crosslink a few other issues, too:
|
There have been reports that the query cache did not manage to speed up search requests when the query includes a large number of different sub queries since a single request may manage to exhaust the whole history (256 queries) while the query cache only starts caching queries once they appear multiple times in the history (elastic#16031). On the other hand, increasing the size of the query cache is a bit controversial (elastic#20116) so this pull request proposes a different approach that consists of never caching term queries, and not adding them to the history of queries either. The reasoning is that these queries should be fast anyway, regardless of caching, so taking them out of the equation should not cause any slow down. On the other hand, the fact that they are not added to the cache history anymore means that other queries have greater chances of being cached.
There have been reports that the query cache did not manage to speed up search requests when the query includes a large number of different sub queries since a single request may manage to exhaust the whole history (256 queries) while the query cache only starts caching queries once they appear multiple times in the history (#16031). On the other hand, increasing the size of the query cache is a bit controversial (#20116) so this pull request proposes a different approach that consists of never caching term queries, and not adding them to the history of queries either. The reasoning is that these queries should be fast anyway, regardless of caching, so taking them out of the equation should not cause any slow down. On the other hand, the fact that they are not added to the cache history anymore means that other queries have greater chances of being cached.
There have been reports that the query cache did not manage to speed up search requests when the query includes a large number of different sub queries since a single request may manage to exhaust the whole history (256 queries) while the query cache only starts caching queries once they appear multiple times in the history (#16031). On the other hand, increasing the size of the query cache is a bit controversial (#20116) so this pull request proposes a different approach that consists of never caching term queries, and not adding them to the history of queries either. The reasoning is that these queries should be fast anyway, regardless of caching, so taking them out of the equation should not cause any slow down. On the other hand, the fact that they are not added to the cache history anymore means that other queries have greater chances of being cached.
There have been reports that the query cache did not manage to speed up search requests when the query includes a large number of different sub queries since a single request may manage to exhaust the whole history (256 queries) while the query cache only starts caching queries once they appear multiple times in the history (#16031). On the other hand, increasing the size of the query cache is a bit controversial (#20116) so this pull request proposes a different approach that consists of never caching term queries, and not adding them to the history of queries either. The reasoning is that these queries should be fast anyway, regardless of caching, so taking them out of the equation should not cause any slow down. On the other hand, the fact that they are not added to the cache history anymore means that other queries have greater chances of being cached.
There have been reports that the query cache did not manage to speed up search requests when the query includes a large number of different sub queries since a single request may manage to exhaust the whole history (256 queries) while the query cache only starts caching queries once they appear multiple times in the history (#16031). On the other hand, increasing the size of the query cache is a bit controversial (#20116) so this pull request proposes a different approach that consists of never caching term queries, and not adding them to the history of queries either. The reasoning is that these queries should be fast anyway, regardless of caching, so taking them out of the equation should not cause any slow down. On the other hand, the fact that they are not added to the cache history anymore means that other queries have greater chances of being cached.
Superseded by #21566. |
The current value is 256 which is conservative. There have been a couple cases
that indicated that it might be too low to allow caching at all in the case of
queries that are composed of tens of filters, so I would like to experiment
with a slightly higher value: 1024.