Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated Doc for Intel XPU Profile #3013

Merged
merged 2 commits into from
Oct 24, 2024
Merged

Conversation

louie-tsai
Copy link
Contributor

@louie-tsai louie-tsai commented Aug 26, 2024

Description

PyTorch Profiling changes for XPU.

Landing page:
https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html
https://pytorch.org/tutorials/recipes/profile_with_itt.html

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @gujinghui @EikanWang @fengyuan14 @guangyey

Copy link

pytorch-bot bot commented Aug 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3013

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit a5408b1 with merge base b725032 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

Hi @louie-tsai!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@svekars svekars added the module: xpu XPU related issues label Aug 26, 2024
recipes_source/recipes/profiler_recipe.py Outdated Show resolved Hide resolved
recipes_source/recipes/profiler_recipe.py Outdated Show resolved Hide resolved
@louie-tsai louie-tsai force-pushed the xpu_profile branch 3 times, most recently from 5692f57 to 9eb8734 Compare September 4, 2024 23:50
recipes_source/profile_with_itt.rst Outdated Show resolved Hide resolved
recipes_source/recipes/profiler_recipe.py Outdated Show resolved Hide resolved
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to profile_with_itt looks fine to me, but changes to profiler_recpie makes it really hard to read. Perhaps one should try to find an easier to use interface for collecting/analyzing profiler information from accelerators?

@louie-tsai
Copy link
Contributor Author

Changes to profile_with_itt looks fine to me, but changes to profiler_recpie makes it really hard to read. Perhaps one should try to find an easier to use interface for collecting/analyzing profiler information from accelerators?

@malfet
The profiler_recipe contents will also be populated into a html file : https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html
Hope that html file format addresses your concern for "hard to read".

@malfet
Copy link
Contributor

malfet commented Sep 16, 2024

The profiler_recipe contents will also be populated into a html file : https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html Hope that html file format addresses your concern for "hard to read".

Sorry, I was not talking about this one, but rather about the following code:

if device == 'cuda':
   activities.append(ProfilerActivity.CUDA)
elif device == 'xpu':
   activities.append(ProfilerActivity.XPU)

As one can not compile PyTorch with CUDA and XPU at the same time, why introducing new enum rather than creating an alias(i.e. ProfilerActivity.XPU == ProfilerActivity.CUDA), so that users do not have to rewrite their programs when migrating from one accelerator to another?

@jingxu10
Copy link
Contributor

The profiler_recipe contents will also be populated into a html file : https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html Hope that html file format addresses your concern for "hard to read".

Sorry, I was not talking about this one, but rather about the following code:

if device == 'cuda':
   activities.append(ProfilerActivity.CUDA)
elif device == 'xpu':
   activities.append(ProfilerActivity.XPU)

As one can not compile PyTorch with CUDA and XPU at the same time, why introducing new enum rather than creating an alias(i.e. ProfilerActivity.XPU == ProfilerActivity.CUDA), so that users do not have to rewrite their programs when migrating from one accelerator to another?

Hi @malfet , it doesn't seem to be something can be fixed in this tutorial. It involves code changes. We will see how to address this concern in 2.6 with a separate PR to change profiler code.

@dvrogozh
Copy link

As one can not compile PyTorch with CUDA and XPU at the same time

@malfet : why? this should be technically possible if someone will install both CUDA and XPU stacks on the system. And might be convenient for some people (mostly for debug I guess) to build pytorch with both path enabled in a single build.

@louie-tsai
Copy link
Contributor Author

louie-tsai commented Oct 2, 2024

As one can not compile PyTorch with CUDA and XPU at the same time

@malfet : why? this should be technically possible if someone will install both CUDA and XPU stacks on the system. And might be convenient for some people (mostly for debug I guess) to build pytorch with both path enabled in a single build.

@malfet
Any feedback? Ex: some laptops also have both Intel iGPU and NVidia dGPU at the same time, so it might be common to have both XPU and CUDA.

Comment on lines +220 to +238
######################################################################
# (Note: the first use of XPU profiling may bring an extra overhead.)

######################################################################
# The resulting table output (omitting some columns):
#
# .. code-block:: sh
#
#------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
# Name Self XPU Self XPU % XPU total XPU time avg # of Calls
# ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
# model_inference 0.000us 0.00% 2.567ms 2.567ms 1
# aten::conv2d 0.000us 0.00% 1.871ms 93.560us 20
# aten::convolution 0.000us 0.00% 1.871ms 93.560us 20
# aten::_convolution 0.000us 0.00% 1.871ms 93.560us 20
# aten::convolution_overrideable 1.871ms 72.89% 1.871ms 93.560us 20
# gen_conv 1.484ms 57.82% 1.484ms 74.216us 20
# aten::batch_norm 0.000us 0.00% 432.640us 21.632us 20
# aten::_batch_norm_impl_index 0.000us 0.00% 432.640us 21.632us 20
# aten::native_batch_norm 432.640us 16.85% 432.640us 21.632us 20
# conv_reorder 386.880us 15.07% 386.880us 6.448us 60
# ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------
# Self CPU time total: 712.486ms
# Self XPU time total: 2.567ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What value does this extra table brings to the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are indeed just minor changes including some different operators and also XPU instead of GPU in this table. just want people to understand how it might look like for the output.

@louie-tsai
Copy link
Contributor Author

@malfet @jingxu10 @dvrogozh @guangyey
please help to merge it if no concern for the change.

thanks!

@louie-tsai
Copy link
Contributor Author

@malfet
updated accordingly. hope it works for you.

@svekars
Copy link
Contributor

svekars commented Oct 16, 2024

To fix spelling, please add these to the en-wordlist.txt:

  • _batch_norm_impl_index
  • convolution_overrideable
  • aten
  • XPU

Update profiler_recipe.py to unify the accelerators python codes
@louie-tsai
Copy link
Contributor Author

To fix spelling, please add these to the en-wordlist.txt:

  • _batch_norm_impl_index
  • convolution_overrideable
  • aten
  • XPU

addressed them accordingly. please also help to approve it if all look good to you.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit verbose to me, but looks fine.

@malfet malfet merged commit 2f3f3fa into pytorch:main Oct 24, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed module: xpu XPU related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants