You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
定位了到pr: #111 在分类模型训练测试, 每次比PyTorch慢几秒的在 tloss = (tloss * i + loss.item()) / (i + 1) # update mean losses 这一行
Profiling with `py-spy`
Evaluating the performance impact of code changes in PyTorch can be complicated,
particularly if code changes happen in compiled code. One simple way to profile
both Python and C++ code in PyTorch is to use py-spy, a sampling profiler for Python
that has the ability to profile native code and Python code in the same session.
py-spy can be installed via pip:
pip install py-spy
To use py-spy, first write a Python test script that exercises the
functionality you would like to profile. For example, this script profiles torch.add:
Since the torch.add operation happens in microseconds, we repeat it a large
number of times to get good statistics. The most straightforward way to use py-spy with such a script is to generate a flame
graph:
py-spy record -o profile.svg --native -- python test_tensor_tensor_add.py
This will output a file named profile.svg containing a flame graph you can
view in a web browser or SVG viewer. Individual stack frame entries in the graph
can be selected interactively with your mouse to zoom in on a particular part of
the program execution timeline. The --native command-line option tells py-spy to record stack frame entries for PyTorch C++ code. To get line numbers
for C++ code it may be necessary to compile PyTorch in debug mode by prepending
your setup.py develop call to compile PyTorch with DEBUG=1. Depending on
your operating system it may also be necessary to run py-spy with root
privileges.
py-spy can also work in an htop-like "live profiling" mode and can be
tweaked to adjust the stack sampling rate, see the py-spy readme for more
details.
前言
py-spy 分析
可稳定复现代码
最近计划
前言
在研究 定位 PyTorch 中 Python API 对应的 C++ 代码 https://github.com/Oneflow-Inc/OneTeam/issues/147 时候
试了下 pytorch官网推荐的一个性能定位工具 py-spy
定位了到pr: #111 在分类模型训练测试, 每次比PyTorch慢几秒的在 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行Profiling with `py-spy`
Evaluating the performance impact of code changes in PyTorch can be complicated,
particularly if code changes happen in compiled code. One simple way to profile
both Python and C++ code in PyTorch is to use
py-spy
, a sampling profiler for Pythonthat has the ability to profile native code and Python code in the same session.
py-spy
can be installed viapip
:To use
py-spy
, first write a Python test script that exercises thefunctionality you would like to profile. For example, this script profiles
torch.add
:Since the
torch.add
operation happens in microseconds, we repeat it a largenumber of times to get good statistics. The most straightforward way to use
py-spy
with such a script is to generate a flamegraph:
This will output a file named
profile.svg
containing a flame graph you canview in a web browser or SVG viewer. Individual stack frame entries in the graph
can be selected interactively with your mouse to zoom in on a particular part of
the program execution timeline. The
--native
command-line option tellspy-spy
to record stack frame entries for PyTorch C++ code. To get line numbersfor C++ code it may be necessary to compile PyTorch in debug mode by prepending
your
setup.py develop
call to compile PyTorch withDEBUG=1
. Depending onyour operating system it may also be necessary to run
py-spy
with rootprivileges.
py-spy
can also work in anhtop
-like "live profiling" mode and can betweaked to adjust the stack sampling rate, see the
py-spy
readme for moredetails.
原来的分类训练测试结果
原来的分类训练测试方法 #111 (comment)
py-spy 分析
y轴表示函数的调用栈,x轴表示函数的执行时间,那么函数在x轴越宽表示执行时间越长,也说明是性能的瓶颈点。
从下面两张图可以发现 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行对性能是有一定影响的。pytorch 后端 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行得用放大镜看oneflow后端 tloss =
(tloss * i + loss.item()) / (i + 1) # update mean losses
这一行比较明显可稳定复现代码
可稳定复现代码- 使用机器 oneflow27-root
- 2023-03-09 编译的oneflow 版本
- flow.version='0.9.1+cu117.git.a4b7145d01' 耗时0.7273483276367188
- torch.version='1.13.0+cu117' 耗时0.11882472038269043
下面代码定义了一个计时的 Profile类,和两个test_torch, test_oneflow 函数
输出
最近计划
The text was updated successfully, but these errors were encountered: