Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add workflow for on-demand benchmarking #4441

Closed
wants to merge 7 commits into from
Closed

Conversation

guangy10
Copy link
Contributor

@guangy10 guangy10 commented Jul 27, 2024

Ability to schedule an on-demand benchmark job from GA UI with params, e.g. models, delegates, devices, etc
Ability to schedule from PR via tagging (doubt it could work with non-default args)

Copy link

pytorch-bot bot commented Jul 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4441

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 0f670eb with merge base 227b49d (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 27, 2024
@guangy10 guangy10 force-pushed the ondemand_benchmark branch from 4a4a9ff to ea23d63 Compare July 27, 2024 00:39
@guangy10
Copy link
Contributor Author

@huydhn how can I test this workflow?

@guangy10 guangy10 requested review from kirklandsign and huydhn July 27, 2024 00:52
.github/pytorch-probot.yml Outdated Show resolved Hide resolved
@guangy10 guangy10 force-pushed the ondemand_benchmark branch 9 times, most recently from 27b8d3c to f0fac5b Compare July 29, 2024 18:56
@guangy10 guangy10 force-pushed the ondemand_benchmark branch from f0fac5b to 4673b32 Compare July 29, 2024 19:15
@guangy10
Copy link
Contributor Author

@huydhn Do I need to get the PR merged in order to test it from the GitHub Action UI? In the WIP PR, I can find this workflow but it seems no way to trigger a run for it

@guangy10 guangy10 marked this pull request as ready for review July 29, 2024 19:17
@huydhn
Copy link
Contributor

huydhn commented Jul 29, 2024

@huydhn Do I need to get the PR merged in order to test it from the GitHub Action UI? In the WIP PR, I can find this workflow but it seems no way to trigger a run for it

For testing, I will just add pull_request into the list of workflow's triggers, then remove it right before committing. Once the PR lands, subsequent ones will be able to use the ciflow tag it creates.

@guangy10 guangy10 force-pushed the ondemand_benchmark branch from 4673b32 to 89123ba Compare July 29, 2024 21:14
@huydhn
Copy link
Contributor

huydhn commented Jul 29, 2024

I just realize that triggering the workflow using pull_request doesn't count as a workflow dispatch, so it couldn't be tested that way. Let me tweak the default values a bit to run the default model when the input is not set

@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@guangy10 guangy10 force-pushed the ondemand_benchmark branch from 5ec0c8e to 55fe843 Compare July 30, 2024 01:04
Comment on lines +178 to +196
strategy:
matrix:
model: ${{ fromJson(needs.set-models.outputs.models) }}
with:
device-type: android
runner: linux.2xlarge
test-infra-ref: ''
# This is the ARN of ExecuTorch project on AWS
project-arn: arn:aws:devicefarm:us-west-2:308535385114:project:02a2cf0f-6d9b-45ee-ba1a-a086587469e6
# This is the custom Android device pool that only includes Samsung Galaxy S2x
device-pool-arn: arn:aws:devicefarm:us-west-2:308535385114:devicepool:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/e59f866a-30aa-4aa1-87b7-4510e5820dfa
# Uploaded to S3 from the previous job, the name of the app comes from the project itself.
# Unlike models there are limited numbers of build flavor for apps, and the model controls whether it should build with bpe/tiktoken tokenizer.
# It's okay to build all possible apps with all possible flavors in job "build-llm-demo". However, in this job, once a model is given, there is only
# one app+flavor that could load and run the model.
# TODO: Hard code llm_demo_bpe for now in this job.
android-app-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/llm_demo_bpe/app-debug.apk
android-test-archive: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/llm_demo_bpe/app-debug-androidTest.apk
# The test spec can be downloaded from https://ossci-assets.s3.amazonaws.com/android-llama2-device-farm-test-spec.yml
test-spec: arn:aws:devicefarm:us-west-2:308535385114:upload:02a2cf0f-6d9b-45ee-ba1a-a086587469e6/abd86868-fa63-467e-a5c7-218194665a77
# Uploaded to S3 from the previous job
extra-data: https://gha-artifacts.s3.amazonaws.com/${{ github.repository }}/${{ github.run_id }}/artifact/${{ matrix.model }}/model.zip
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi, I think we should remove tokenizer flavor from the matrix and only add models to it. cc: @huydhn @kirklandsign

@guangy10 guangy10 force-pushed the ondemand_benchmark branch from 55fe843 to 8f406de Compare July 30, 2024 23:37
@guangy10
Copy link
Contributor Author

Testing job benchmark-on-device:. It appears that the 'model' is null and 'benchmark-on-device' ends up not being triggered though all its dependency jobs finished successfully.

@guangy10
Copy link
Contributor Author

Okay. I can see the workflow is scheduled as expected, though the actual benchmarking doesn't make sense due to hard-coded test-spec.

@guangy10 guangy10 force-pushed the ondemand_benchmark branch from 8f406de to 4eaf53a Compare July 31, 2024 00:24
@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@guangy10 guangy10 force-pushed the ondemand_benchmark branch 2 times, most recently from 04ac523 to cb02b88 Compare July 31, 2024 01:18
@guangy10 guangy10 force-pushed the ondemand_benchmark branch from cb02b88 to 0f670eb Compare July 31, 2024 01:54
@guangy10
Copy link
Contributor Author

Verified the workflow is scheduled as expected: https://github.com/pytorch/executorch/actions/runs/10172346528/job/28135557119?pr=4441

@facebook-github-bot
Copy link
Contributor

@guangy10 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@guangy10
Copy link
Contributor Author

I've seen this permission issue very often on different PRs: https://github.com/pytorch/executorch/actions/runs/10172715749/job/28135704037?pr=4441

CC: @kirklandsign @huydhn @kit1980

@facebook-github-bot
Copy link
Contributor

@guangy10 merged this pull request in f611219.

@huydhn
Copy link
Contributor

huydhn commented Jul 31, 2024

I've seen this permission issue very often on different PRs: https://github.com/pytorch/executorch/actions/runs/10172715749/job/28135704037?pr=4441

CC: @kirklandsign @huydhn @kit1980

Hmm, this comes from pytorch/test-infra#5523, @atalman has removed MacOS from there, but I think any runners picked up by the PR during testing would need to be cleaned up.

@kirklandsign kirklandsign deleted the ondemand_benchmark branch August 2, 2024 23:17
@kirklandsign kirklandsign restored the ondemand_benchmark branch August 2, 2024 23:17
@kirklandsign kirklandsign deleted the ondemand_benchmark branch August 8, 2024 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants