-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve master_commit_red query performance #6174
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
workflow_job job FINAL | ||
JOIN workflow_run FINAL ON workflow_run.id = workflow_job.run_id | ||
JOIN push FINAL ON workflow_run.head_commit.'id' = push.head_commit.'id' | ||
default.workflow_job job final join all_runs workflow_run on workflow_run.id = workflow_job.run_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reads a bit confusing IMO. If I read it correctly, the all_runs
table has the alias as workflow_run
, which has the correct syntax. But I always think of workflow_run
as the workflow_run
table instead of having it as an alias. So, it feels easier just to call it all_runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can only see the improvement on https://hud.pytorch.org/query_execution_metrics after this lands. I wonder if there is a way to get the information for the new query at PR time. Let's chat more on this when you're back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice discovery that having separating table filters before doing the join yields perf improvements.
To show how much of a difference it really made could you update the PR description with the stats? (e.g. old/new duration, memory usage)
push.head_commit.'timestamp' as time, | ||
push.head_commit.'id' as sha | ||
from | ||
push final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do pushes ever get updated? Wondering if we really need the perf hit from FINAL
workflow_run.name as name, | ||
commit.time as time | ||
from | ||
workflow_run final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another optimization opportunity:
Do we really need to know when a job is pending? If not, instead of FINAL we can filter on a terminal conclusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to leave it as is because I don't want to change the results from the old query
Pre filter the commits so we can filter the workflow job and workflow run tables on it later
This improves speed for all time ranges up to 1 year. I did not check beyond that
I believe the memory used is about the same, but it scans more rows for some reason
This is the query behind this chart
![image](https://private-user-images.githubusercontent.com/44682903/403552976-e00c18dc-a87a-4648-99cd-9c35295baabf.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkwNDk5ODQsIm5iZiI6MTczOTA0OTY4NCwicGF0aCI6Ii80NDY4MjkwMy80MDM1NTI5NzYtZTAwYzE4ZGMtYTg3YS00NjQ4LTk5Y2QtOWMzNTI5NWJhYWJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA4VDIxMjEyNFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTllY2MwZjk2YzQ5Yzg3Y2FkN2M5MTE4YmY5MjA3OWJjOTg5ODQ5ODgyN2YwN2ExYjE3YTEwMzFlYmJmYzVjYzYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.LR_idE0sSsVDdfnJavF1W8oa4teTnU9mqG9c-33ctM8)
on the metrics page
Tentative wins, sample size 3