Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query planner optimization (or AST optimization?) #2470

Closed
cjrh opened this issue Aug 1, 2024 · 2 comments
Closed

Query planner optimization (or AST optimization?) #2470

cjrh opened this issue Aug 1, 2024 · 2 comments
Labels

Comments

@cjrh
Copy link
Collaborator

cjrh commented Aug 1, 2024

A query like this:

+(
    +(text:"a" text:"b" text:"c" ...) 
    -(text:"x" text:"y" text:"z" ...)
)
+field_id:[0 TO 10]

For a large dataset, we don't want the inner tests on the text values to be carried out for any documents that have field_id outside the range given at the outer level of the query. This is a toy example and I understand that in this specific case, I could just remove scope around the text clauses and make everything toplevel, but my question is really about whether and how an outer clause might be applied to nested levels, if at all.

Is there an existing optimization layer that does this? Currently I've been applying range constraints within the deepest nested levels of my queries to "avoid work", but I don't know whether this is necessary. I've seen some recent work on AST optimizations in #2449 and #2461, and comments like this but it's difficult to get an overall sense of what is and isn't done w.r.t. my question.

TL,DR, "Should I repeat my toplevel range constraints in nested layers for performance?"

@cjrh
Copy link
Collaborator Author

cjrh commented Aug 1, 2024

One of the examples mentioned in #2449 is this:

(a AND b) AND (c AND d) can be simplified to (a AND b AND c AND d)

My example above might then look like this:

(a NOT b) AND c ==> (a AND c NOT b)

if that is useful for answering the question.

@cjrh cjrh added the question label Aug 19, 2024
@cjrh
Copy link
Collaborator Author

cjrh commented Aug 19, 2024

I got feedback on the Discord:

We do have seek in DocSet, which is used to skip docids in AND queries
How effective it is depends on the implementation, in the worst case it just calls advance
There's no documentation but you can check the DocSet implementations
Generally there's some room for improvement here.

I take this to mean that it will depend on the specific implementation currently in the DocSet code, and that I should look there. Closing.

@cjrh cjrh closed this as completed Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant