-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
match-phase degradation taking a long time and returning more hits than coverage #31445
Comments
Hey, Could you share what type of queries this is? range search, terms, vector search etc? capped range search has better guarantees, but filters are post-filters practical doc |
👋 sure! The queries which are causing this issue are relatively simple attribute matching - specifically we're querying for multiple terms in a fieldset of attribute fields which have type
Our match-phase attribute is defined as:
Match-phase looks like:
We use a single-phase ranking function which is decently expensive - a single content node can rank ~50k documents in 100ms (single-threaded). This is the primary reason for attempting to use match-phase degradation. Unfortunately the actual queries we need to use have a lot of filters so capped range search wouldn't be feasible. We're currently experimenting with two-phase ranking as a workaround for now. Basically first-phase ranking on the |
Thanks! It sounds like if ranking is the main complexity that using phased ranking is one alternative with better guarantees.
So i'm assuming that the same type of request (same query value, same filter) - produces the same behaviour with a higher hit count than you expect? Or do you experience that the same type of request (same query value, same filter) produces non-deterministic behaviour? |
I will look into this and see if this is expected behavior, and what can be done to improve this. |
Thanks! I noticed that the issue is intermittent again - some content groups are behaving as expected (fast response, fewer matched documents) while others are slow for the exact same query. I've included both results below as well.
Our rank-profile, expected:
Our rank-profile, slow:
ranking=unranked:
ranking=unranked&ranking.matching.termwiseLimit=0.01:
|
Thanks, this is what I need. The issue is that the hitratio you have (3062770/232863029) is outside of the area the heuristics have been tested for. Based on the numbers here it is way off. This indicates that the model the heuristics are based on, misses something vital. This has no simple solution and will require a deep dive with extensive benchmarking to get this right, so no immediate resolution expected. You can try to experiment with match-phase.max-hits. I suspect it to change the sweetspot. Try increasing to 1000, and 10000, and also reduce to 100 and see what happens. That will also give some indication as to what is wrong with the heuristics. |
Thanks for looking into this. We've tried adjusting max-hits and reducing it further seems to help, but we need to account for the drop in quality as well so I'm not sure if that's an option. We see the issue happening on two clusters and the biggest commonality is a large number of documents per content node. Are there any general guidelines you have regarding the max number of documents per node? In our most extreme cluster we have 100M docs per node (and we see the same behaviour on this cluster as well). I assume the match-phase algorithm (and associated heuristics) run on each content node, so would increasing the nodes (and decreasing the docs per node) be a potential solution? |
No it might offset the issue, but is no real solution. The inherent problem here is find the crossover point where to graphs meet. Simplified it is to find the solution for |
👍 got it, thanks for the explanation. One interesting consideration for our use case is that we know which queries are expensive and should be limited. It would be great if there was a way to "force" match-phase to trigger - if I'm understanding correctly that would essentially bypass the heuristics and give us more customisable (and predictable) behaviour. We have a solution using two-phased ranking, but since these queries have a high hit count (as you've seen), even the cheap first-phase ranking ends up being expensive. |
Describe the bug
In one of our production Vespa clusters, we are using match-phase degradation to speed up queries which match a large number of document. Under certain scenarios (which are relatively common in our application), we see match-phase being triggered but the query takes a long time and returns a larger number of hits than is both expected (with a
max-hits
value of 250 and 12 content nodes per group) as well as more than specified in thecoverage.documents
response field. See below for an example:To Reproduce
I'm not sure how easy this would be to reproduce. If requested, I can try to provide an example.
Expected behavior
I expect match-phase degradation to be faster and return fewer documents. It works for some queries, as shown below:
Environment (please complete the following information):
Self-hosted
Vespa version
8.323.45
The text was updated successfully, but these errors were encountered: