Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for <-> operator rewriting #189

Merged
merged 19 commits into from
Nov 18, 2023

Conversation

ezra-varady
Copy link
Collaborator

@ezra-varady ezra-varady commented Oct 2, 2023

This adds a hook that rewrites the lantern vector distance operator to the correct function in the case that a sequential scan is done on a table with an index on it

Fixes #148

{
check_stack_depth();

switch(nodeTag(plan)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Are these all of the cases that need to be covered?

Copy link
Collaborator Author

@ezra-varady ezra-varady Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar enough with all of them to say for sure. I removed some I know are definitely unnecessary. The remaining ones I couldn't definitively rule out so I was hoping to get some additional eyes on them

some very rough and possibly erroneous notes.

  • SubqueryScan I think this is required, since a subquery could include the case we're trying to address
  • CteScan Unecessary if the CTE doesn't use the index, I'm not sure what the expected behavior is
  • Join One of both of the tables being joined can be generated by a select so I think this is required
  • Agg I think that this can cover what is essentially a subquery e.g. ANY(SELECT ....)
  • Group This can probably be removed, I wasn't sure if it could have a subquery with an order by but at least for a single query I think it's 1 or the other it is possible this could end up parenting a CTE though I don't know enough to say
  • Sort I think ORDDER BY can generate a sort meriting its inclusion, but I'm not sure
  • UniqueThis can probably be removed, I don't think you can call unique on an expression, unless you can do unique(v <-> ...)
  • NestLoop loops can be seqscans, I think this is required
  • Result I couldn't find much information on this plan type, and so included it to be safe
  • Limit When I looked at this it seemed like the limit becomes the root node, so I think we need this
  • AppendNot 100% sure where this will end up in the tree, but I think you can have a union of two scans so this should be checked

Copy link
Collaborator Author

@ezra-varady ezra-varady Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha just realized I misunderstood the question, apologies. I had assumed that plan tree walker was covering and just removed the nodes I have explicit checks for. I don't know enough about the remaining plan types to say if they might require checks. I will try to read more about them. Immediately T_IncrementalSort feels like it should probably be included so my guess is that I'm missing several others

Copy link
Contributor

@dqii dqii Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I was just asking if this was a deliberately curated subset or just what you ended up covering. I don't know if removing is the move, since I know the operator can end up in funny subqueries and joins.

Assuming you meant to refer to the plan walker I wrote -- there's around 30 plan nodes I believe, which you can see in nodes.h. I only covered the ones that I was able to generate tests for. It probably would be better for me to have added the other cases as well. I just couldn't figure out tests for the rest. For example, I couldn't trigger T_Join (joins ended up using a different plan node), and it ended up getting removed in Postgres 16. I'm not sure if it gets triggered at all in Postgres 15 -- I think it gets triggered in earlier Postgres versions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes sense. You got way more than I've been able to generate! Reading through the additional types it seems like most if not all of the ones that were left out can be safely ignored. I'll try going over the rest of them tomorrow. It's funny that T_Join is hard to trigger, you'd expect it would be relatively common.

One option I could implement pretty easily is to call the base mutator in the default so that we get some coverage on the more exotic types, this could be combined with a check that returns the path as is for ones we think are safe to ignore too.

src/hooks/op_rewrite.c Outdated Show resolved Hide resolved
@dqii
Copy link
Contributor

dqii commented Oct 2, 2023

Can you add some tests where enable_seqscan is on, seqscan is used with this operator, does not throw an error, and the results are properly sorted?

@ezra-varady
Copy link
Collaborator Author

I added a couple tests in and removed the note about the query mutator. I'll work more on figuring out what is and isn't necessary in the case statement

@dqii
Copy link
Contributor

dqii commented Oct 3, 2023

Pasting benchmarking results from CI/CD since they didn't auto-comment.

Would you be able to run benchmarking locally? Wondering if this is attributable to noise or if this does cause notable performance degradation.

------------------------------------------------------------------------------------------
| Metric                          |                Old |                New | Pct Change |
|---------------------------------|--------------------|--------------------|------------|
| Recall (after create)           |              0.740 |              0.740 |          - |
| Recall (after insert)           |              0.784 |              0.760 |     -3.06% |
| Select bulk TPS                 |            233.376 |            189.404 |    -18.84% |
| Select bulk latency (ms)        |             33.162 |             39.408 |    +18.83% |
| Select bulk latency (stddev ms) |              9.651 |             13.803 |    +43.02% |
| Create latency (ms)             |           1568.790 |           1580.457 |     +0.74% |
| Insert bulk TPS                 |             14.084 |             11.751 |    -16.56% |
| Insert bulk latency (ms)        |             70.990 |             85.077 |    +19.84% |
| Insert bulk latency (stddev ms) |              3.069 |              4.474 |    +45.78% |
| Disk usage (bytes)              |        6348800.000 |        6348800.000 |          - |
------------------------------------------------------------------------------------------

@ezra-varady
Copy link
Collaborator Author

this is what I get when I compare against main, I think it may be noise but I'm not sure

------------------------------------------------------------------------------------------
| metric                          |                old |                new | pct change |
|---------------------------------|--------------------|--------------------|------------|
| recall (after create)           |              0.740 |              0.740 |          - |
| recall (after insert)           |              0.760 |              0.760 |          - |
| select bulk tps                 |            120.028 |            113.468 |      -0.05 |
| select bulk latency (ms)        |             64.342 |             69.510 |      +0.08 |
| select bulk latency (stddev ms) |             14.934 |             13.342 |      -0.11 |
| create latency (ms)             |           1613.283 |           1639.485 |      +0.02 |
| insert bulk tps                 |              8.655 |              8.627 |      -0.00 |
| insert bulk latency (ms)        |            115.543 |            116.007 |      +0.00 |
| insert bulk latency (stddev ms) |              5.825 |              9.941 |      +0.71 |
| disk usage (bytes)              |        6348800.000 |        6348800.000 |          - |
------------------------------------------------------------------------------------------ 

@ezra-varady
Copy link
Collaborator Author

These commits should improve coverage. I added tests for the additional cases that are now handled and made some notes for cases I'm confident we can ignore. The extra tags on the default are a bit gross but it felt good to have some explicit reference to the types of plans that are ignored and why. I can remove them if that's preferable. There were some types I couldn't generate or wasn't sure we should support, I've included notes on them below. I may have missed some too

  • T_MergeJoin/T_MergeAppend I couldn't generate tests for these, it depends on both halves of the being presorted so I'm not sure if they can have ORDER BY
  • T_Gather/T_GatherMerge These trigger on scans of very large tables. I ran into an issue where the index scan is was always preferable in this case so I can't create tests may be safe to exclude
  • T_RecursiveUnion I can't get rewriting working when I tried. I think it's maybe ok not to support though, semantically source doesn't really have the index and an order by won't work in the internal select
explain WITH RECURSIVE source (counter) AS (
SELECT id, v from small_world where id = 1
UNION ALL
SELECT counter + 1, v from source where counter  < 10
)
SELECT * FROM source ORDER BY v <-> '{1,1,1}';
  • T_SubPlan already works without handling the case, so I didn't include it, (dropping the limit should make the internal loop into a seqscan if it's using the index, your return will be empty)
EXPLAIN SELECT * FROM small_world WHERE small_world.v NOT IN (SELECT small_world.v FROM small_world ORDER BY v <-> '{1,1,1}' LIMIT 5) ORDER BY v <-> '{1,1,1}';
  • T_CustomScan: my impression is that since it's custom we should not interfere but I'm not sure

  • T_NamedTupleStoreScan: I can't find any information on this

  • T_TidScan/T_TidRangeScan I think these can have an ORDER BY on a table with an index so they should probably be supported but they felt weird and internal so I wanted a second opinion

@var77 var77 mentioned this pull request Oct 5, 2023
Copy link
Contributor

@Ngalstyan4 Ngalstyan4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!
I will need to dig into this a little locally in an IDE before we merge, so may take a couple of days before I get back to it.

@@ -0,0 +1,15 @@
#ifndef _OP_REWRITE_H_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you follow the same convention for what the include guards should be. Look at other header files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha my bad, pushed a fix

if(is_plan_node(node)) {
return (Node *)plan_tree_mutator((Plan *)node, ctx);
} else {
return expression_tree_mutator(node, operator_rewriting_mutator, ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all nodes either plan nodes or expression tree nodes? are there other options?
if not, can we have an invariant assertion (ldb_invariant) in the else branch?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure, @dqii is more knowledgeable than I am about these systems, she may know. Otherwise I will try to figure out

Copy link
Collaborator Author

@ezra-varady ezra-varady Oct 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this thread seems to imply that the plan tree is made exclusively of exprs and plans. It's over a decade old but looking at some of the code in the executor I think it's probably accurate still. The executor doesn't check node types at all and the functions I've inspected all appear to expect expressions. Intuitively it sorta makes sense that the executor takes only exprs, statements and plans. Since statements are a separate tree I think it's safe to enforce the invariant

-- SampleScan
EXPLAIN (COSTS false) SELECT * FROM small_world TABLESAMPLE BERNOULLI (20) ORDER BY v <-> '{1,1,1}' ASC;
-- SetOpt/HashSetOp
EXPLAIN (COSTS false) (SELECT * FROM small_world ORDER BY v<-> '{1,0,1}' ASC )EXCEPT (SELECT * FROM small_world ORDER by v <-> '{1,1,1}' ASC LIMIT 5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add another kind of test for these (described below)?

In sequential scans the use of <-> operator should be identical to corresponding _dist function.
So, we should be able to run both and have a helper function that compares the output line by line and ensure they are the same.

Can you add this helper function and use it for these tests?

@ezra-varady
Copy link
Collaborator Author

ezra-varady commented Oct 10, 2023

Not exactly sure what's going on in pg11. To generate this error you can run

INSERT INTO sift_base10k (id, v) (nextval('serial'), random_array(128, 0, 128));

However the following queries work just fine

SELECT random_array as v FROM  random_array(128, 0, 128) \gset
INSERT INTO sift_base10k (id, v) (nextval('serial'), :'v');

their query plans are identical

        QUERY PLAN
---------------------------------
Insert on sift_base10k
    ->    Result

However if I remove handling of Result nodes it does not fix it. Internally the plan tree of the failing query is much more complex because it includes the function. If I log the node tags the working query shows a single T_ModifyTable node is processed. The failing query shows the following (note the labels are mine and may be incorrect)

12 (T_Modify)
10 (T_Result)
222 (T_List)
119 (T_SubPlan)
222 (T_List)
146 (T_TargetEntry)
106 (T_Param)
42 (T_Agg)
27 (T_SubqueryScan)
222 (T_List)
146 (T_TargetEntry)
107 (T_AggRef)
222 (T_List)
146 (T_TargetEntry)
113 (T_OpExpr)

If I comment out any one of the following lines it still fails but if I comment out all three it works generating a result an agg and a subqueryscan. Notably the OpExpr is not actually getting rewritten afaict, so the tree shouldn't be modified at all making this issue especially odd

 34     plan->initPlan = (List *)operator_rewriting_mutator((Node *)plan->initPlan, context);
 35     // checking qual and target list at the end covers some edge cases, if you modify this leave them here
 36     plan->qual = (List *)operator_rewriting_mutator((Node *)plan->qual, context);
 37     plan->targetlist = (List *)operator_rewriting_mutator((Node *)plan->targetlist, context);

if I short circuit any of the individual node types it doesn't seem to do anything though I may have missed a couple. Short circuiting both Agg and Result nodes will fix it, but this is cutting off basically the entire tree. neither alone is sufficient.

While including the function in the query is what causes the issue, calling it alone works fine select random_array(128, 0, 128) and expanding it to select array_agg(random() *128) from generate_series(0, 127); doesn't generate errors though substituting this for the function in the insert breaks it.

This is the full query plan when I replace the function in the breaking query, You can see more of the nodes that are being printed but not all of them

Insert on sift_base10k
    InitPlan 1 (returns $0)
      ->     Aggregate
                 ->   Function scan on generate_series
    ->   Result

I'm missing something, but I've spent way more time than I planned on this so I'm writing it up and putting a pin in it for now. If anyone else has ideas please please please let me know

UPDATE:
It complains about a node with an invalid type, 0, 1024, and 219 show up also a very big number once in a while. 219 is the string type. If you look at it's contents it's <->. In fact a list holding this string is initialized to find the oid of the operator in ldb_get_operator_oids. This list is also freed. If you don't free it you get the same error, but the invalid node will always have type 0! Furthermore, if I list_free_deep the original list on the node, it recurses till it hits the max stack depth in what appears to be a circular reference error

@ezra-varady ezra-varady force-pushed the ezra/operator-rewriting branch from 4e95ed1 to 1d0c513 Compare October 26, 2023 19:43
@ezra-varady
Copy link
Collaborator Author

After going over ASan logs and bisecting to this commit my understanding of this issue is still incomplete. However the general shape is that in queries with multiple subplans, e.g. inserts with multiple rows each calling a function, the operator rewriting code visits the same nodes repeatedly. At the same time, postgres executes the query incrementally, executing subplans corresponding to e.g. plgpgsql functions as they are encountered. These functions are traversed and executed in their own memory context which lives only as long as the subplan's execution and is then destroyed. Because expression_tree_ mutator makes copies of List nodes when it visits one of these nodes in the context of the subplan it replaces the global node with one whose lifetime is only that of the subplan's execution. This causes a use after free when the same node is revisited in a later subplan. The commits that fix the issue alter the way that generate_series cost is estimated, suggesting that while we were only triggering this in pg11, it's possible that it could be triggered in later versions given the right conditions, hence my applying the patches regardless of version. By tying the lifetime of objects allocated within the mutator to the lifetime of the root plan we can avoid this issue. It is still not totally clear to me what conditions trigger this, there's supposition contained here. Notably short circuiting on subplans does not fix the issue, so there's something missing from this model, however the fixes should work regardless

@Ngalstyan4 Ngalstyan4 merged commit 1c2ff53 into lanterndata:main Nov 18, 2023
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite distance operator <-> into the specific distance function if existing index won't trigger
3 participants