-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transformer benchmark forward #3684
base: main
Are you sure you want to change the base?
Conversation
I haven't used cupti enough to tell. @kevinstephano do you have a clue? @nsarka posted the stack trace in the PR description. To confirm, this patch or the command you used to run the benchmark didn't have anything in particular to trigger cupti. Correct? |
The benchmark suite triggers cupti for measuring kernel time. |
After manually disabling with:
It passes |
I removed the line disabling CUPTI and manually set the number of iterations to 10. With the lowered number of iterations it seems to pass. Here is the output with 4 ranks:
Here is the output with 2 ranks:
@cowanmeg Is the skew in latency between ranks (29398 vs 123 us) expected? |
ef8ecb4
to
f87c137
Compare
!build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
!test |
There was a change last year and build no longer triggers execution of tests. https://github.com/NVIDIA/Fuser/wiki/Bot-Commands#test-command |
Co-authored-by: Jingyue Wu <[email protected]>
Co-authored-by: Jingyue Wu <[email protected]>
Co-authored-by: Jingyue Wu <[email protected]>
132a3bd
to
6e03077
Compare
In this PR I added Meghan's transformer test as a benchmark.
It works for one process, but with > 1 processes it seems there's a hang incuptiActivityDisable
in the profiler. I'm opening the PR now as a draft to ask for comments on the code itself and ideas why cupti might be causing a hang. Here's the backtrace:The output with 2 ranks looks like this: