add support for <-> operator rewriting #189

ezra-varady · 2023-10-02T19:53:36Z

This adds a hook that rewrites the lantern vector distance operator to the correct function in the case that a sequential scan is done on a table with an index on it

Fixes #148

dqii · 2023-10-02T20:33:04Z

src/hooks/op_rewrite.c

+{
+    check_stack_depth();
+
+    switch(nodeTag(plan)) {


Question: Are these all of the cases that need to be covered?

I'm not familiar enough with all of them to say for sure. I removed some I know are definitely unnecessary. The remaining ones I couldn't definitively rule out so I was hoping to get some additional eyes on them

some very rough and possibly erroneous notes.

SubqueryScan I think this is required, since a subquery could include the case we're trying to address

CteScan Unecessary if the CTE doesn't use the index, I'm not sure what the expected behavior is

Join One of both of the tables being joined can be generated by a select so I think this is required

Agg I think that this can cover what is essentially a subquery e.g. ANY(SELECT ....)

Group This can probably be removed, I wasn't sure if it could have a subquery with an order by but at least for a single query I think it's 1 or the other it is possible this could end up parenting a CTE though I don't know enough to say

Sort I think ORDDER BY can generate a sort meriting its inclusion, but I'm not sure

UniqueThis can probably be removed, I don't think you can call unique on an expression, unless you can do unique(v <-> ...)

NestLoop loops can be seqscans, I think this is required

Result I couldn't find much information on this plan type, and so included it to be safe

Limit When I looked at this it seemed like the limit becomes the root node, so I think we need this

AppendNot 100% sure where this will end up in the tree, but I think you can have a union of two scans so this should be checked

Haha just realized I misunderstood the question, apologies. I had assumed that plan tree walker was covering and just removed the nodes I have explicit checks for. I don't know enough about the remaining plan types to say if they might require checks. I will try to read more about them. Immediately T_IncrementalSort feels like it should probably be included so my guess is that I'm missing several others

No worries, I was just asking if this was a deliberately curated subset or just what you ended up covering. I don't know if removing is the move, since I know the operator can end up in funny subqueries and joins.

Assuming you meant to refer to the plan walker I wrote -- there's around 30 plan nodes I believe, which you can see in nodes.h. I only covered the ones that I was able to generate tests for. It probably would be better for me to have added the other cases as well. I just couldn't figure out tests for the rest. For example, I couldn't trigger T_Join (joins ended up using a different plan node), and it ended up getting removed in Postgres 16. I'm not sure if it gets triggered at all in Postgres 15 -- I think it gets triggered in earlier Postgres versions.

Ah, that makes sense. You got way more than I've been able to generate! Reading through the additional types it seems like most if not all of the ones that were left out can be safely ignored. I'll try going over the rest of them tomorrow. It's funny that T_Join is hard to trigger, you'd expect it would be relatively common.

One option I could implement pretty easily is to call the base mutator in the default so that we get some coverage on the more exotic types, this could be combined with a check that returns the path as is for ones we think are safe to ignore too.

src/hooks/op_rewrite.c

dqii · 2023-10-02T20:46:31Z

Can you add some tests where enable_seqscan is on, seqscan is used with this operator, does not throw an error, and the results are properly sorted?

ezra-varady · 2023-10-02T22:00:22Z

I added a couple tests in and removed the note about the query mutator. I'll work more on figuring out what is and isn't necessary in the case statement

dqii · 2023-10-03T06:18:54Z

Pasting benchmarking results from CI/CD since they didn't auto-comment.

Would you be able to run benchmarking locally? Wondering if this is attributable to noise or if this does cause notable performance degradation.

------------------------------------------------------------------------------------------
| Metric                          |                Old |                New | Pct Change |
|---------------------------------|--------------------|--------------------|------------|
| Recall (after create)           |              0.740 |              0.740 |          - |
| Recall (after insert)           |              0.784 |              0.760 |     -3.06% |
| Select bulk TPS                 |            233.376 |            189.404 |    -18.84% |
| Select bulk latency (ms)        |             33.162 |             39.408 |    +18.83% |
| Select bulk latency (stddev ms) |              9.651 |             13.803 |    +43.02% |
| Create latency (ms)             |           1568.790 |           1580.457 |     +0.74% |
| Insert bulk TPS                 |             14.084 |             11.751 |    -16.56% |
| Insert bulk latency (ms)        |             70.990 |             85.077 |    +19.84% |
| Insert bulk latency (stddev ms) |              3.069 |              4.474 |    +45.78% |
| Disk usage (bytes)              |        6348800.000 |        6348800.000 |          - |
------------------------------------------------------------------------------------------

ezra-varady · 2023-10-03T07:49:59Z

this is what I get when I compare against main, I think it may be noise but I'm not sure

------------------------------------------------------------------------------------------
| metric                          |                old |                new | pct change |
|---------------------------------|--------------------|--------------------|------------|
| recall (after create)           |              0.740 |              0.740 |          - |
| recall (after insert)           |              0.760 |              0.760 |          - |
| select bulk tps                 |            120.028 |            113.468 |      -0.05 |
| select bulk latency (ms)        |             64.342 |             69.510 |      +0.08 |
| select bulk latency (stddev ms) |             14.934 |             13.342 |      -0.11 |
| create latency (ms)             |           1613.283 |           1639.485 |      +0.02 |
| insert bulk tps                 |              8.655 |              8.627 |      -0.00 |
| insert bulk latency (ms)        |            115.543 |            116.007 |      +0.00 |
| insert bulk latency (stddev ms) |              5.825 |              9.941 |      +0.71 |
| disk usage (bytes)              |        6348800.000 |        6348800.000 |          - |
------------------------------------------------------------------------------------------

ezra-varady · 2023-10-05T05:15:50Z

These commits should improve coverage. I added tests for the additional cases that are now handled and made some notes for cases I'm confident we can ignore. The extra tags on the default are a bit gross but it felt good to have some explicit reference to the types of plans that are ignored and why. I can remove them if that's preferable. There were some types I couldn't generate or wasn't sure we should support, I've included notes on them below. I may have missed some too

T_MergeJoin/T_MergeAppend I couldn't generate tests for these, it depends on both halves of the being presorted so I'm not sure if they can have ORDER BY
T_Gather/T_GatherMerge These trigger on scans of very large tables. I ran into an issue where the index scan is was always preferable in this case so I can't create tests may be safe to exclude
T_RecursiveUnion I can't get rewriting working when I tried. I think it's maybe ok not to support though, semantically source doesn't really have the index and an order by won't work in the internal select

explain WITH RECURSIVE source (counter) AS (
SELECT id, v from small_world where id = 1
UNION ALL
SELECT counter + 1, v from source where counter  < 10
)
SELECT * FROM source ORDER BY v <-> '{1,1,1}';

T_SubPlan already works without handling the case, so I didn't include it, (dropping the limit should make the internal loop into a seqscan if it's using the index, your return will be empty)

EXPLAIN SELECT * FROM small_world WHERE small_world.v NOT IN (SELECT small_world.v FROM small_world ORDER BY v <-> '{1,1,1}' LIMIT 5) ORDER BY v <-> '{1,1,1}';

T_CustomScan: my impression is that since it's custom we should not interfere but I'm not sure
T_NamedTupleStoreScan: I can't find any information on this
T_TidScan/T_TidRangeScan I think these can have an ORDER BY on a table with an index so they should probably be supported but they felt weird and internal so I wanted a second opinion

Ngalstyan4

This is great!
I will need to dig into this a little locally in an IDE before we merge, so may take a couple of days before I get back to it.

Ngalstyan4 · 2023-10-04T18:01:05Z

src/hooks/op_rewrite.h

@@ -0,0 +1,15 @@
+#ifndef _OP_REWRITE_H_


Make sure you follow the same convention for what the include guards should be. Look at other header files

Haha my bad, pushed a fix

Ngalstyan4 · 2023-10-04T18:38:14Z

src/hooks/op_rewrite.c

+    if(is_plan_node(node)) {
+        return (Node *)plan_tree_mutator((Plan *)node, ctx);
+    } else {
+        return expression_tree_mutator(node, operator_rewriting_mutator, ctx);


are all nodes either plan nodes or expression tree nodes? are there other options?
if not, can we have an invariant assertion (ldb_invariant) in the else branch?

Hmm, I'm not sure, @dqii is more knowledgeable than I am about these systems, she may know. Otherwise I will try to figure out

this thread seems to imply that the plan tree is made exclusively of exprs and plans. It's over a decade old but looking at some of the code in the executor I think it's probably accurate still. The executor doesn't check node types at all and the functions I've inspected all appear to expect expressions. Intuitively it sorta makes sense that the executor takes only exprs, statements and plans. Since statements are a separate tree I think it's safe to enforce the invariant

Ngalstyan4 · 2023-10-09T16:35:49Z

test/sql/hnsw_select.sql

+-- SampleScan
+EXPLAIN (COSTS false) SELECT * FROM small_world TABLESAMPLE BERNOULLI (20) ORDER BY v <-> '{1,1,1}' ASC;
+-- SetOpt/HashSetOp
+EXPLAIN (COSTS false) (SELECT * FROM small_world ORDER BY v<-> '{1,0,1}' ASC )EXCEPT (SELECT * FROM small_world ORDER by v <-> '{1,1,1}' ASC LIMIT 5);


Can you add another kind of test for these (described below)?

In sequential scans the use of <-> operator should be identical to corresponding _dist function.
So, we should be able to run both and have a helper function that compares the output line by line and ensure they are the same.

Can you add this helper function and use it for these tests?

ezra-varady · 2023-10-10T07:37:25Z

Not exactly sure what's going on in pg11. To generate this error you can run

INSERT INTO sift_base10k (id, v) (nextval('serial'), random_array(128, 0, 128));

However the following queries work just fine

SELECT random_array as v FROM  random_array(128, 0, 128) \gset
INSERT INTO sift_base10k (id, v) (nextval('serial'), :'v');

their query plans are identical

        QUERY PLAN
---------------------------------
Insert on sift_base10k
    ->    Result

However if I remove handling of Result nodes it does not fix it. Internally the plan tree of the failing query is much more complex because it includes the function. If I log the node tags the working query shows a single T_ModifyTable node is processed. The failing query shows the following (note the labels are mine and may be incorrect)

12 (T_Modify)
10 (T_Result)
222 (T_List)
119 (T_SubPlan)
222 (T_List)
146 (T_TargetEntry)
106 (T_Param)
42 (T_Agg)
27 (T_SubqueryScan)
222 (T_List)
146 (T_TargetEntry)
107 (T_AggRef)
222 (T_List)
146 (T_TargetEntry)
113 (T_OpExpr)

If I comment out any one of the following lines it still fails but if I comment out all three it works generating a result an agg and a subqueryscan. Notably the OpExpr is not actually getting rewritten afaict, so the tree shouldn't be modified at all making this issue especially odd

 34     plan->initPlan = (List *)operator_rewriting_mutator((Node *)plan->initPlan, context);
 35     // checking qual and target list at the end covers some edge cases, if you modify this leave them here
 36     plan->qual = (List *)operator_rewriting_mutator((Node *)plan->qual, context);
 37     plan->targetlist = (List *)operator_rewriting_mutator((Node *)plan->targetlist, context);

if I short circuit any of the individual node types it doesn't seem to do anything though I may have missed a couple. Short circuiting both Agg and Result nodes will fix it, but this is cutting off basically the entire tree. neither alone is sufficient.

While including the function in the query is what causes the issue, calling it alone works fine select random_array(128, 0, 128) and expanding it to select array_agg(random() *128) from generate_series(0, 127); doesn't generate errors though substituting this for the function in the insert breaks it.

This is the full query plan when I replace the function in the breaking query, You can see more of the nodes that are being printed but not all of them

Insert on sift_base10k
    InitPlan 1 (returns $0)
      ->     Aggregate
                 ->   Function scan on generate_series
    ->   Result

I'm missing something, but I've spent way more time than I planned on this so I'm writing it up and putting a pin in it for now. If anyone else has ideas please please please let me know

UPDATE:
It complains about a node with an invalid type, 0, 1024, and 219 show up also a very big number once in a while. 219 is the string type. If you look at it's contents it's <->. In fact a list holding this string is initialized to find the oid of the operator in ldb_get_operator_oids. This list is also freed. If you don't free it you get the same error, but the invalid node will always have type 0! Furthermore, if I list_free_deep the original list on the node, it recurses till it hits the max stack depth in what appears to be a circular reference error

update tests

… operator rewriting

ezra-varady · 2023-10-26T20:55:24Z

After going over ASan logs and bisecting to this commit my understanding of this issue is still incomplete. However the general shape is that in queries with multiple subplans, e.g. inserts with multiple rows each calling a function, the operator rewriting code visits the same nodes repeatedly. At the same time, postgres executes the query incrementally, executing subplans corresponding to e.g. plgpgsql functions as they are encountered. These functions are traversed and executed in their own memory context which lives only as long as the subplan's execution and is then destroyed. Because expression_tree_ mutator makes copies of List nodes when it visits one of these nodes in the context of the subplan it replaces the global node with one whose lifetime is only that of the subplan's execution. This causes a use after free when the same node is revisited in a later subplan. The commits that fix the issue alter the way that generate_series cost is estimated, suggesting that while we were only triggering this in pg11, it's possible that it could be triggered in later versions given the right conditions, hence my applying the patches regardless of version. By tying the lifetime of objects allocated within the mutator to the lifetime of the root plan we can avoid this issue. It is still not totally clear to me what conditions trigger this, there's supposition contained here. Notably short circuiting on subplans does not fix the issue, so there's something missing from this model, however the fixes should work regardless

dqii reviewed Oct 2, 2023

View reviewed changes

src/hooks/op_rewrite.c Outdated Show resolved Hide resolved

var77 mentioned this pull request Oct 5, 2023

Add parallel tests #192

Merged

Ngalstyan4 reviewed Oct 6, 2023

View reviewed changes

Ngalstyan4 reviewed Oct 9, 2023

View reviewed changes

Ngalstyan4 mentioned this pull request Oct 13, 2023

Changes to address issue 148 #185

Closed

ezra-varady added 18 commits October 26, 2023 09:02

rewriting but not aware of index column

cd9b8c1

working operator rewriting

54eba82

only rewrite if an index is present

0831fa5

update tests

pare down footprint of changes a bit

fc3a63a

cleanup merge errors in tests

9718bb7

fix version guarded stuff

a802f48

clang format

9194842

more pgv11 compat includes and some additional cleanup

58e0b96

remove comment, add tests

352f065

improve coverage, add some tests

442aa50

fixup tests

c63a49a

add guard to case

cf4a683

clang-format

7d49cdb

fix header include guard

69e5250

check that results are the same when operators are rewritten

5d49fa5

add expected output

64ca0e8

possible fix

fcddebd

add parallel tests to sanitizer container. Fix assertion violation in…

1d0c513

… operator rewriting

ezra-varady force-pushed the ezra/operator-rewriting branch from 4e95ed1 to 1d0c513 Compare October 26, 2023 19:43

fix sanitizers not triggering

5a9949d

Ngalstyan4 merged commit 1c2ff53 into lanterndata:main Nov 18, 2023
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for <-> operator rewriting #189

add support for <-> operator rewriting #189

ezra-varady commented Oct 2, 2023 •

edited by Ngalstyan4

Loading

dqii Oct 2, 2023

ezra-varady Oct 2, 2023 •

edited

Loading

ezra-varady Oct 2, 2023 •

edited

Loading

dqii Oct 3, 2023 •

edited

Loading

ezra-varady Oct 3, 2023

dqii commented Oct 2, 2023

ezra-varady commented Oct 2, 2023

dqii commented Oct 3, 2023 •

edited

Loading

ezra-varady commented Oct 3, 2023

ezra-varady commented Oct 5, 2023

Ngalstyan4 left a comment

Ngalstyan4 Oct 4, 2023

ezra-varady Oct 7, 2023

Ngalstyan4 Oct 4, 2023

ezra-varady Oct 7, 2023

ezra-varady Oct 9, 2023 •

edited

Loading

Ngalstyan4 Oct 9, 2023

ezra-varady commented Oct 10, 2023 •

edited

Loading

ezra-varady commented Oct 26, 2023

add support for <-> operator rewriting #189

add support for <-> operator rewriting #189

Conversation

ezra-varady commented Oct 2, 2023 • edited by Ngalstyan4 Loading

dqii Oct 2, 2023

Choose a reason for hiding this comment

ezra-varady Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

ezra-varady Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

dqii Oct 3, 2023 • edited Loading

Choose a reason for hiding this comment

ezra-varady Oct 3, 2023

Choose a reason for hiding this comment

dqii commented Oct 2, 2023

ezra-varady commented Oct 2, 2023

dqii commented Oct 3, 2023 • edited Loading

ezra-varady commented Oct 3, 2023

ezra-varady commented Oct 5, 2023

Ngalstyan4 left a comment

Choose a reason for hiding this comment

Ngalstyan4 Oct 4, 2023

Choose a reason for hiding this comment

ezra-varady Oct 7, 2023

Choose a reason for hiding this comment

Ngalstyan4 Oct 4, 2023

Choose a reason for hiding this comment

ezra-varady Oct 7, 2023

Choose a reason for hiding this comment

ezra-varady Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

Ngalstyan4 Oct 9, 2023

Choose a reason for hiding this comment

ezra-varady commented Oct 10, 2023 • edited Loading

ezra-varady commented Oct 26, 2023

ezra-varady commented Oct 2, 2023 •

edited by Ngalstyan4

Loading

ezra-varady Oct 2, 2023 •

edited

Loading

ezra-varady Oct 2, 2023 •

edited

Loading

dqii Oct 3, 2023 •

edited

Loading

dqii commented Oct 3, 2023 •

edited

Loading

ezra-varady Oct 9, 2023 •

edited

Loading

ezra-varady commented Oct 10, 2023 •

edited

Loading