Avoid nested loop in query planner execution #612

slin30 · 2023-09-18T22:16:37Z

Move relation name not equals from outer SELECT where clause to dependency CTE
- Seems to more reliably push query planner to avoid a nested loop
- Should preserve original intent, i.e. to exclude self-references from final result
- Execution plan suggests this reduces the row estimate for the outer SELECT
- Real-world testing on internal company cluster under conditions where nested loop is used for original, shows that the new approach continues to perform and behave as expected (no nested loop)

resolves #609

Problem

copied from issue Current Behavior

When running the statement in relations.sql, our Redshift cluster query planner will, seemingly at random, expect to use a nested loop join. This reflects in extended execution times, adding anywhere from 1-6 minutes to the start of a run/step.

Solution

copied from issue Additional Context

The original logic that threw the planner for a loop (no pun intended) was the WHERE != in the outer statement. Removing this or setting it to =, or pushing it into the select as an = flag, works fine. For the latter, if I subsequently attempt to filter on the flag (with NOT), this predictably ends up with the same nested loop execution plan -- it's the attempt to filter on not equals in the outer select that triggers the nested loop (when the anomaly manifests).

My adjustment simply pushes the evaluation up to the dependency CTE. My thinking was that in doing so, the planner should choose a more efficient path under more conditions.

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
- NOT run on my end -- simply updated the sole relations.sql file
- Update: while this is not a general test, the updated version has been running in our internal dbt implementation (via local macro override) as of Sept 22, 2023. This (a) resolved the recurrence of the nested loop plan and (b) has been running without issue since. Please see comment in issue.
This PR includes tests, or tests are not required/relevant for this PR
- Not aware if unit tests are supplied for this specific macro, but if they are, I've not run them.
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX
- This PR is for one specific macro, and it definitely needs to be reviewed and tested before merging.

Supplemental

Please see local benchmark comparing the original versus new, 10 runs of each with cache=off, a new session for each run, and a one-second pause between each run. This comparison is under normal conditions, i.e. the query planner does not expect to use a nested loop for the original. It should be a fair comparison (for our environment) of the performance delta, if any, for the two versions (my conclusion is that there is no performance difference).

Also, a row count of the results for each run, across the two groups. This is a light QC check on result equality across runs and between groups. Note y-axes are zoomed in to highlight any diffs and are relative to each metric.

- Move relation name not equals from outer SELECT `where` clause to dependency CTE - Seems to more reliably push query planner to avoid a nested loop - Should preserve original intent, i.e. to exclude self-references from final result - Execution plan suggests this reduces the row estimate for the outer SELECT - Real-world testing on internal company cluster under conditions where nested loop is used for original, shows that the new approach continues to perform and behave as expected (no nested loop)

dbeatty10 · 2023-09-22T17:55:26Z

@slin30 this is looking awesome! 🤩

Could you add a changelog entry via changie new?

slin30 · 2023-09-22T18:02:04Z

@slin30 this is looking awesome! 🤩

Could you add a changelog entry via changie new?

Thanks, @dbeatty10 -- let me go through the instructions in more detail, but at first glance it seems straightforward. Just need to find some focus time. Let me target before this coming Monday.

slin30 · 2023-09-23T13:16:25Z

@dbeatty10 hopefully ab3b41f does the trick. First time using changie and it's quite neat. Might incorporate into our internal workflow.

* Adjust not equals application order - Move relation name not equals from outer SELECT `where` clause to dependency CTE - Seems to more reliably push query planner to avoid a nested loop - Should preserve original intent, i.e. to exclude self-references from final result - Execution plan suggests this reduces the row estimate for the outer SELECT - Real-world testing on internal company cluster under conditions where nested loop is used for original, shows that the new approach continues to perform and behave as expected (no nested loop) * Add changie entry

slin30 requested a review from a team as a code owner September 18, 2023 22:16

slin30 requested a review from Fleid September 18, 2023 22:16

cla-bot bot added the cla:yes label Sep 18, 2023

slin30 mentioned this pull request Sep 22, 2023

[ADAP-900] [Bug] Nested loop in query planner execution of relations.sql #609

Closed

2 tasks

dbeatty10 added the ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering label Sep 22, 2023

dbeatty10 changed the title ~~Adjust not equals application order~~ Avoid nested loop in query planner execution Sep 22, 2023

Add changie entry

ab3b41f

colin-rogers-dbt approved these changes Sep 25, 2023

View reviewed changes

colin-rogers-dbt added ok to test and removed ok to test labels Sep 25, 2023

colin-rogers-dbt closed this Sep 25, 2023

colin-rogers-dbt reopened this Sep 25, 2023

colin-rogers-dbt merged commit 9385b49 into dbt-labs:main Sep 25, 2023
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid nested loop in query planner execution #612

Avoid nested loop in query planner execution #612

slin30 commented Sep 18, 2023 •

edited

Loading

dbeatty10 commented Sep 22, 2023

slin30 commented Sep 22, 2023

slin30 commented Sep 23, 2023

Avoid nested loop in query planner execution #612

Avoid nested loop in query planner execution #612

Conversation

slin30 commented Sep 18, 2023 • edited Loading

Problem

Solution

Checklist

Supplemental

dbeatty10 commented Sep 22, 2023

slin30 commented Sep 22, 2023

slin30 commented Sep 23, 2023

slin30 commented Sep 18, 2023 •

edited

Loading