-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for <-> operator rewriting #189
add support for <-> operator rewriting #189
Conversation
{ | ||
check_stack_depth(); | ||
|
||
switch(nodeTag(plan)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Are these all of the cases that need to be covered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar enough with all of them to say for sure. I removed some I know are definitely unnecessary. The remaining ones I couldn't definitively rule out so I was hoping to get some additional eyes on them
some very rough and possibly erroneous notes.
SubqueryScan
I think this is required, since a subquery could include the case we're trying to addressCteScan
Unecessary if the CTE doesn't use the index, I'm not sure what the expected behavior isJoin
One of both of the tables being joined can be generated by a select so I think this is requiredAgg
I think that this can cover what is essentially a subquery e.g.ANY(SELECT ....)
Group
This can probably be removed, I wasn't sure if it could have a subquery with an order by but at least for a single query I think it's 1 or the other it is possible this could end up parenting a CTE though I don't know enough to saySort
I thinkORDDER BY
can generate a sort meriting its inclusion, but I'm not sureUnique
This can probably be removed, I don't think you can call unique on an expression, unless you can dounique(v <-> ...)
NestLoop
loops can be seqscans, I think this is requiredResult
I couldn't find much information on this plan type, and so included it to be safeLimit
When I looked at this it seemed like the limit becomes the root node, so I think we need thisAppend
Not 100% sure where this will end up in the tree, but I think you can have a union of two scans so this should be checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha just realized I misunderstood the question, apologies. I had assumed that plan tree walker was covering and just removed the nodes I have explicit checks for. I don't know enough about the remaining plan types to say if they might require checks. I will try to read more about them. Immediately T_IncrementalSort
feels like it should probably be included so my guess is that I'm missing several others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, I was just asking if this was a deliberately curated subset or just what you ended up covering. I don't know if removing is the move, since I know the operator can end up in funny subqueries and joins.
Assuming you meant to refer to the plan walker I wrote -- there's around 30 plan nodes I believe, which you can see in nodes.h. I only covered the ones that I was able to generate tests for. It probably would be better for me to have added the other cases as well. I just couldn't figure out tests for the rest. For example, I couldn't trigger T_Join (joins ended up using a different plan node), and it ended up getting removed in Postgres 16. I'm not sure if it gets triggered at all in Postgres 15 -- I think it gets triggered in earlier Postgres versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. You got way more than I've been able to generate! Reading through the additional types it seems like most if not all of the ones that were left out can be safely ignored. I'll try going over the rest of them tomorrow. It's funny that T_Join is hard to trigger, you'd expect it would be relatively common.
One option I could implement pretty easily is to call the base mutator in the default so that we get some coverage on the more exotic types, this could be combined with a check that returns the path as is for ones we think are safe to ignore too.
Can you add some tests where enable_seqscan is on, seqscan is used with this operator, does not throw an error, and the results are properly sorted? |
I added a couple tests in and removed the note about the query mutator. I'll work more on figuring out what is and isn't necessary in the case statement |
Pasting benchmarking results from CI/CD since they didn't auto-comment. Would you be able to run benchmarking locally? Wondering if this is attributable to noise or if this does cause notable performance degradation.
|
this is what I get when I compare against main, I think it may be noise but I'm not sure
|
These commits should improve coverage. I added tests for the additional cases that are now handled and made some notes for cases I'm confident we can ignore. The extra tags on the default are a bit gross but it felt good to have some explicit reference to the types of plans that are ignored and why. I can remove them if that's preferable. There were some types I couldn't generate or wasn't sure we should support, I've included notes on them below. I may have missed some too
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
I will need to dig into this a little locally in an IDE before we merge, so may take a couple of days before I get back to it.
src/hooks/op_rewrite.h
Outdated
@@ -0,0 +1,15 @@ | |||
#ifndef _OP_REWRITE_H_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you follow the same convention for what the include guards should be. Look at other header files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha my bad, pushed a fix
if(is_plan_node(node)) { | ||
return (Node *)plan_tree_mutator((Plan *)node, ctx); | ||
} else { | ||
return expression_tree_mutator(node, operator_rewriting_mutator, ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all nodes either plan nodes or expression tree nodes? are there other options?
if not, can we have an invariant assertion (ldb_invariant
) in the else
branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'm not sure, @dqii is more knowledgeable than I am about these systems, she may know. Otherwise I will try to figure out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this thread seems to imply that the plan tree is made exclusively of exprs and plans. It's over a decade old but looking at some of the code in the executor I think it's probably accurate still. The executor doesn't check node types at all and the functions I've inspected all appear to expect expressions. Intuitively it sorta makes sense that the executor takes only exprs, statements and plans. Since statements are a separate tree I think it's safe to enforce the invariant
test/sql/hnsw_select.sql
Outdated
-- SampleScan | ||
EXPLAIN (COSTS false) SELECT * FROM small_world TABLESAMPLE BERNOULLI (20) ORDER BY v <-> '{1,1,1}' ASC; | ||
-- SetOpt/HashSetOp | ||
EXPLAIN (COSTS false) (SELECT * FROM small_world ORDER BY v<-> '{1,0,1}' ASC )EXCEPT (SELECT * FROM small_world ORDER by v <-> '{1,1,1}' ASC LIMIT 5); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add another kind of test for these (described below)?
In sequential scans the use of <->
operator should be identical to corresponding _dist
function.
So, we should be able to run both and have a helper function that compares the output line by line and ensure they are the same.
Can you add this helper function and use it for these tests?
Not exactly sure what's going on in pg11. To generate this error you can run
However the following queries work just fine
their query plans are identical
However if I remove handling of Result nodes it does not fix it. Internally the plan tree of the failing query is much more complex because it includes the function. If I log the node tags the working query shows a single
If I comment out any one of the following lines it still fails but if I comment out all three it works generating a
if I short circuit any of the individual node types it doesn't seem to do anything though I may have missed a couple. Short circuiting both While including the function in the query is what causes the issue, calling it alone works fine This is the full query plan when I replace the function in the breaking query, You can see more of the nodes that are being printed but not all of them
I'm missing something, but I've spent way more time than I planned on this so I'm writing it up and putting a pin in it for now. If anyone else has ideas please please please let me know UPDATE: |
update tests
… operator rewriting
4e95ed1
to
1d0c513
Compare
After going over ASan logs and bisecting to this commit my understanding of this issue is still incomplete. However the general shape is that in queries with multiple subplans, e.g. inserts with multiple rows each calling a function, the operator rewriting code visits the same nodes repeatedly. At the same time, postgres executes the query incrementally, executing subplans corresponding to e.g. plgpgsql functions as they are encountered. These functions are traversed and executed in their own memory context which lives only as long as the subplan's execution and is then destroyed. Because |
This adds a hook that rewrites the lantern vector distance operator to the correct function in the case that a sequential scan is done on a table with an index on it
Fixes #148