Improve master_commit_red query performance #6174

clee2000 · 2025-01-15T18:36:17Z

Pre filter the commits so we can filter the workflow job and workflow run tables on it later

This improves speed for all time ranges up to 1 year. I did not check beyond that
I believe the memory used is about the same, but it scans more rows for some reason

This is the query behind this chart

on the metrics page

Tentative wins, sample size 3

+------+----------+-----------+-------------+---------------+--------------+---------------+----------------+--------------+
| Test | Avg Time | Base Time | Time Change | % Time Change |   Avg Mem    |    Base Mem   |   Mem Change   | % Mem Change |
+------+----------+-----------+-------------+---------------+--------------+---------------+----------------+--------------+
|  0   |   1163   |   23117   |    -21954   |      -95      | 157241294.6  | 1539051189.75 | -1381809895.15 |     -90      |
|  1   |   1281   |   14509   |    -13228   |      -91      |  350569557   |  1585082143.8 | -1234512586.8  |     -78      |
|  2   |   8704   |   22954   |    -14250   |      -62      | 1774302065.6 |   3929546293  | -2155244227.4  |     -55      |
+------+----------+-----------+-------------+---------------+--------------+---------------+----------------+--------------+

vercel · 2025-01-15T18:36:21Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
torchci	✅ Ready (Inspect)	Visit Preview	Jan 22, 2025 7:55pm

huydhn · 2025-01-17T01:15:19Z

torchci/clickhouse_queries/master_commit_red/query.sql

-    workflow_job job FINAL
-    JOIN workflow_run FINAL ON workflow_run.id = workflow_job.run_id
-    JOIN push FINAL ON workflow_run.head_commit.'id' = push.head_commit.'id'
+    default.workflow_job job final join all_runs workflow_run on workflow_run.id = workflow_job.run_id


This reads a bit confusing IMO. If I read it correctly, the all_runs table has the alias as workflow_run, which has the correct syntax. But I always think of workflow_run as the workflow_run table instead of having it as an alias. So, it feels easier just to call it all_runs

huydhn

We can only see the improvement on https://hud.pytorch.org/query_execution_metrics after this lands. I wonder if there is a way to get the information for the new query at PR time. Let's chat more on this when you're back

ZainRizvi

Nice discovery that having separating table filters before doing the join yields perf improvements.

To show how much of a difference it really made could you update the PR description with the stats? (e.g. old/new duration, memory usage)

ZainRizvi · 2025-01-22T16:36:31Z

torchci/clickhouse_queries/master_commit_red/query.sql

+    push.head_commit.'timestamp' as time,
+    push.head_commit.'id' as sha
+  from
+    push final


do pushes ever get updated? Wondering if we really need the perf hit from FINAL

ZainRizvi · 2025-01-22T16:41:29Z

torchci/clickhouse_queries/master_commit_red/query.sql

+    workflow_run.name as name,
+    commit.time as time
+  from
+    workflow_run final


Another optimization opportunity:

Do we really need to know when a job is pending? If not, instead of FINAL we can filter on a terminal conclusion

I'm going to leave it as is because I don't want to change the results from the old query

clee2000 added 2 commits January 14, 2025 10:22

tc

bcacd0b

tc

ac3bafe

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2025

vercel bot deployed to Preview January 15, 2025 18:37 View deployment

tc

471233d

vercel bot deployed to Preview January 15, 2025 18:39 View deployment

clee2000 added 2 commits January 15, 2025 11:09

tc

4c289f5

tc

3a1ae76

vercel bot deployed to Preview January 15, 2025 19:12 View deployment

clee2000 marked this pull request as ready for review January 15, 2025 19:27

clee2000 requested a review from a team January 15, 2025 19:28

huydhn reviewed Jan 17, 2025

View reviewed changes

huydhn approved these changes Jan 17, 2025

View reviewed changes

ZainRizvi approved these changes Jan 22, 2025

View reviewed changes

tc

f8aee57

vercel bot deployed to Preview January 22, 2025 19:55 View deployment

clee2000 merged commit f3c27c3 into main Jan 22, 2025
6 checks passed

clee2000 deleted the csl/master_commit_red branch January 22, 2025 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve master_commit_red query performance #6174

Improve master_commit_red query performance #6174

clee2000 commented Jan 15, 2025 •

edited

Loading

vercel bot commented Jan 15, 2025 •

edited

Loading

huydhn Jan 17, 2025 •

edited

Loading

huydhn left a comment

ZainRizvi left a comment

ZainRizvi Jan 22, 2025

ZainRizvi Jan 22, 2025

clee2000 Jan 22, 2025

Improve master_commit_red query performance #6174

Improve master_commit_red query performance #6174

Conversation

clee2000 commented Jan 15, 2025 • edited Loading

vercel bot commented Jan 15, 2025 • edited Loading

huydhn Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

huydhn left a comment

Choose a reason for hiding this comment

ZainRizvi left a comment

Choose a reason for hiding this comment

ZainRizvi Jan 22, 2025

Choose a reason for hiding this comment

ZainRizvi Jan 22, 2025

Choose a reason for hiding this comment

clee2000 Jan 22, 2025

Choose a reason for hiding this comment

clee2000 commented Jan 15, 2025 •

edited

Loading

vercel bot commented Jan 15, 2025 •

edited

Loading

huydhn Jan 17, 2025 •

edited

Loading