Skip to content

term_queries are not cached #23343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sundarv85 opened this issue Feb 24, 2017 · 7 comments
Closed

term_queries are not cached #23343

sundarv85 opened this issue Feb 24, 2017 · 7 comments

Comments

@sundarv85
Copy link

Elasticsearch version: 5.2.0

Plugins installed: None

JVM version: 1.8.0_74

OS version: macOS 10.12.3

In our case, almost all our queries are only bool queries with term filters. So we would like to enable the term_queries setting in index.queries.cache.term_queries as described in the pull request #21566

Steps to reproduce:

  1. https://gist.github.com/sundarv85/5aec5d8a50958bee5891e10d11b87fae
  2. I ran the above gist, but I still do not see the term queries to be cached. The "query_cache" object in GET _nodes/stats?pretty is empty. Am I missing something here?
@jpountz
Copy link
Contributor

jpountz commented Feb 24, 2017

Your recreation does not cache the term query because we only cache on segments that both have more than 10k documents and account for more than 3% of the shard size.

Why do you want term queries to be cached? The term filter in your recreation should execute very quickly without any caching.

@jpountz jpountz closed this as completed Feb 24, 2017
@sundarv85
Copy link
Author

sundarv85 commented Feb 24, 2017

Ah! Thanks for info more than 10k documents and account for more than 3% of the shard size.

We currently are facing an issue, where we have 12 indexes with each have 5 shards and in total this has 90 million documents. Each of these documents contains nested document of approx. 5 each. So in total this is around 206 million documents (the output from _stats).

When we query all these 12 weeks with the range query, it takes somewhere between 4-60 seconds. We have in the query - a range, for the timeduration, then couple of term queries. Then we aggregate on all of them. I'm looking at ways to improve the performance.

@jpountz
Copy link
Contributor

jpountz commented Feb 24, 2017

Interesting, we have improvements to nested and range queries coming to Elasticsearch 5.4, which should yield sizable improvements: #23079 and #23119. In the meantime, the only workaround I can think of is to add the list of the types you are searching in to the URL, eg. index/type1,type2/_search rather than index/_search, which night improve the way nested docs are masked depending on your query. Other than that I am afraid there isn't much that can be done.

@sundarv85
Copy link
Author

sundarv85 commented Feb 24, 2017

Thanks. We will upgrade to 5.4, when it is released. Meanwhile, what I hope would help us is to increase the request cache from 1% to 10% of the heap size, because as we use aggs so often, I expect the request cache to be a valuable fallback. For e.g. in one of the shards I see

          "evictions": 68562,
          "hit_count": 13057,
          "miss_count": 70268

So increasing req cache should reduce the miss_count. I will test this setup and will update here, if that helps.

One more item that would be helpful is, If profile API can also show if the response is retrieved from the cache or not.

@jpountz
Copy link
Contributor

jpountz commented Feb 24, 2017

Actually the profile API disables caching entirely, it never goes to the cache. Maybe you should add a thumbs up to this issue if you are interested in figuring out what gets cached? #20108

@sundarv85
Copy link
Author

@jpountz Eventually we enabled term_queries cache in our production and have seen improvements in our network speed. It would be good if this can be exposed in documentation as well, as we do not want to lose this feature in upcoming elasticsearch versions.

@yitian134
Copy link

@jpountz Eventually we enabled term_queries cache in our production and have seen improvements in our network speed. It would be good if this can be exposed in documentation as well, as we do not want to lose this feature in upcoming elasticsearch versions.

@sundarv85 hi, I encounter similar problems. I'am wonder how do you enable term_queries cache?By Increasing the cache size?Looking forward to your reply~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants