Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve master_commit_red query performance #6174

Merged
merged 6 commits into from
Jan 22, 2025
Merged

Conversation

clee2000
Copy link
Contributor

@clee2000 clee2000 commented Jan 15, 2025

Pre filter the commits so we can filter the workflow job and workflow run tables on it later

This improves speed for all time ranges up to 1 year. I did not check beyond that
I believe the memory used is about the same, but it scans more rows for some reason

This is the query behind this chart
image
on the metrics page

Tentative wins, sample size 3

+------+----------+-----------+-------------+---------------+--------------+---------------+----------------+--------------+
| Test | Avg Time | Base Time | Time Change | % Time Change |   Avg Mem    |    Base Mem   |   Mem Change   | % Mem Change |
+------+----------+-----------+-------------+---------------+--------------+---------------+----------------+--------------+
|  0   |   1163   |   23117   |    -21954   |      -95      | 157241294.6  | 1539051189.75 | -1381809895.15 |     -90      |
|  1   |   1281   |   14509   |    -13228   |      -91      |  350569557   |  1585082143.8 | -1234512586.8  |     -78      |
|  2   |   8704   |   22954   |    -14250   |      -62      | 1774302065.6 |   3929546293  | -2155244227.4  |     -55      |
+------+----------+-----------+-------------+---------------+--------------+---------------+----------------+--------------+

Copy link

vercel bot commented Jan 15, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview Jan 22, 2025 7:55pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2025
@clee2000 clee2000 marked this pull request as ready for review January 15, 2025 19:27
@clee2000 clee2000 requested a review from a team January 15, 2025 19:28
workflow_job job FINAL
JOIN workflow_run FINAL ON workflow_run.id = workflow_job.run_id
JOIN push FINAL ON workflow_run.head_commit.'id' = push.head_commit.'id'
default.workflow_job job final join all_runs workflow_run on workflow_run.id = workflow_job.run_id
Copy link
Contributor

@huydhn huydhn Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads a bit confusing IMO. If I read it correctly, the all_runs table has the alias as workflow_run, which has the correct syntax. But I always think of workflow_run as the workflow_run table instead of having it as an alias. So, it feels easier just to call it all_runs

Copy link
Contributor

@huydhn huydhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can only see the improvement on https://hud.pytorch.org/query_execution_metrics after this lands. I wonder if there is a way to get the information for the new query at PR time. Let's chat more on this when you're back

Copy link
Contributor

@ZainRizvi ZainRizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice discovery that having separating table filters before doing the join yields perf improvements.

To show how much of a difference it really made could you update the PR description with the stats? (e.g. old/new duration, memory usage)

push.head_commit.'timestamp' as time,
push.head_commit.'id' as sha
from
push final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do pushes ever get updated? Wondering if we really need the perf hit from FINAL

workflow_run.name as name,
commit.time as time
from
workflow_run final
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another optimization opportunity:

Do we really need to know when a job is pending? If not, instead of FINAL we can filter on a terminal conclusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave it as is because I don't want to change the results from the old query

@clee2000 clee2000 merged commit f3c27c3 into main Jan 22, 2025
6 checks passed
@clee2000 clee2000 deleted the csl/master_commit_red branch January 22, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants