-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #666 from jiangzhonglian/main
更新 部分 notes 翻译
- Loading branch information
Showing
19 changed files
with
17,857 additions
and
280 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,53 +1,138 @@ | ||
> 翻译任务 | ||
* 目前该页面无人翻译,期待你的加入 | ||
* 翻译奖励: https://github.com/orgs/apachecn/discussions/243 | ||
* 任务认领: https://github.com/apachecn/pytorch-doc-zh/discussions/583 | ||
|
||
请参考这个模版来写内容: | ||
|
||
|
||
# PyTorch 某某页面 | ||
# 广播语义 [¶](#broadcasting-semantics "此标题的永久链接") | ||
|
||
> 译者:[片刻小哥哥](https://github.com/jiangzhonglian) | ||
> | ||
> 项目地址:<https://pytorch.apachecn.org/2.0/docs/notes/broadcasting> | ||
> | ||
> 原始地址:<https://pytorch.org/docs/stable/notes/broadcasting.html> | ||
开始写原始页面的翻译内容 | ||
|
||
许多 PyTorch 操作支持 NumPy 的广播语义。有关详细信息,请参阅 <https://numpy.org/doc/stable/user/basics.broadcasting.html>。 | ||
|
||
|
||
简而言之,如果 PyTorch 操作支持广播,那么它的 Tensor 参数可以自动扩展为相同的大小(无需复制数据)。 | ||
|
||
|
||
## 通用语义 [¶](#general-semantics "此标题的永久链接") | ||
|
||
注意事项: | ||
|
||
1. 代码参考: | ||
如果满足以下规则,则两个张量是“可广播的”: | ||
|
||
|
||
|
||
* 每个张量至少有一个维度。 | ||
* 当迭代维度大小时,从尾随维度开始,维度大小必须相等,其中之一为 1,或者其中之一不存在。 | ||
|
||
|
||
例如: | ||
|
||
```py | ||
import torch | ||
|
||
x = torch.ones(5) # input tensor | ||
y = torch.zeros(3) # expected output | ||
w = torch.randn(5, 3, requires_grad=True) | ||
b = torch.randn(3, requires_grad=True) | ||
z = torch.matmul(x, w)+b | ||
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y) | ||
``` | ||
>>> x=torch.empty(5,7,3) | ||
>>> y=torch.empty(5,7,3) | ||
# same shapes are always broadcastable (i.e. the above rules always hold) | ||
>>> x=torch.empty((0,)) | ||
>>> y=torch.empty(2,2) | ||
# x and y are not broadcastable, because x does not have at least 1 dimension | ||
# can line up trailing dimensions | ||
>>> x=torch.empty(5,3,4,1) | ||
>>> y=torch.empty( 3,1,1) | ||
# x and y are broadcastable. | ||
# 1st trailing dimension: both have size 1 | ||
# 2nd trailing dimension: y has size 1 | ||
# 3rd trailing dimension: x size == y size | ||
# 4th trailing dimension: y dimension doesn't exist | ||
# but: | ||
>>> x=torch.empty(5,2,4,1) | ||
>>> y=torch.empty( 3,1,1) | ||
# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3 | ||
2. 公式参考: | ||
``` | ||
|
||
|
||
如果两个张量“x”、“y”是“可广播的”,则结果张量大小计算如下: | ||
|
||
|
||
|
||
* 如果`x`和`y`的维数不相等,则在维数较少的张量的维数前面加上1,使它们的长度相等。*然后,对于每个维数大小,得到的维数大小是以下的最大值沿该维度的“x”和“y”的大小。 | ||
|
||
|
||
例如: | ||
|
||
|
||
``` | ||
# can line up trailing dimensions to make reading easier | ||
>>> x=torch.empty(5,1,4,1) | ||
>>> y=torch.empty( 3,1,1) | ||
>>> (x+y).size() | ||
torch.Size([5, 3, 4, 1]) | ||
# but not necessary: | ||
>>> x=torch.empty(1) | ||
>>> y=torch.empty(3,1,7) | ||
>>> (x+y).size() | ||
torch.Size([3, 1, 7]) | ||
>>> x=torch.empty(5,2,4,1) | ||
>>> y=torch.empty(3,1,1) | ||
>>> (x+y).size() | ||
RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1 | ||
``` | ||
|
||
1) 无需换行的写法: | ||
|
||
$\sqrt{w^T*w}$ | ||
## 就地语义 [¶](#in-place-semantics "此标题的永久链接") | ||
|
||
2) 需要换行的写法: | ||
|
||
$$ | ||
\sqrt{w^T*w} | ||
$$ | ||
一个复杂之处是就地操作不允许就地张量因广播而改变形状。 | ||
|
||
3. 图片参考(用图片的实际地址就行): | ||
|
||
<img src='http://data.apachecn.org/img/logo/logo_green.png' width=20% /> | ||
例如: | ||
|
||
|
||
``` | ||
>>> x=torch.empty(5,3,4,1) | ||
>>> y=torch.empty(3,1,1) | ||
>>> (x.add_(y)).size() | ||
torch.Size([5, 3, 4, 1]) | ||
# but: | ||
>>> x=torch.empty(1,3,1) | ||
>>> y=torch.empty(3,1,7) | ||
>>> (x.add_(y)).size() | ||
RuntimeError: The expanded size of the tensor (1) must match the existing size (7) at non-singleton dimension 2. | ||
``` | ||
|
||
|
||
## 向后兼容性 [¶](#backwards-compatibility "此标题的永久链接") | ||
|
||
|
||
PyTorch 的早期版本允许在不同形状的张量上执行某些逐点函数,只要每个张量中的元素数量相等。然后通过将每个张量视为一维来执行逐点运算。 PyTorch 现在支持广播,并且“一维”逐点行为被视为已弃用,并且在张量不可广播但具有相同数量的元素的情况下将生成 Python 警告。 | ||
|
||
|
||
请注意,在两个张量不具有相同形状,但可广播且具有相同数量元素的情况下,引入广播可能会导致向后不兼容的更改。例如: | ||
|
||
|
||
``` | ||
>>> torch.add(torch.ones(4,1), torch.randn(4)) | ||
``` | ||
|
||
|
||
以前会生成大小为 torch.Size([4,1]) 的张量,但现在生成大小为 torch.Size([4,4]) 的张量。为了帮助识别代码中向后不兼容的情况可能存在广播引入的情况,您可以将 torch.utils.backcompat.broadcast_warning.enabled 设置为 True ,在这种情况下会生成 python 警告。 | ||
|
||
|
||
例如: | ||
|
||
|
||
``` | ||
>>> torch.utils.backcompat.broadcast_warning.enabled=True | ||
>>> torch.add(torch.ones(4,1), torch.ones(4)) | ||
__main__:1: UserWarning: self and other do not have the same shape, but are broadcastable, and have the same number of elements. | ||
Changing behavior in a backwards incompatible manner to broadcasting rather than viewing as 1-dimensional. | ||
4. **翻译完后请删除上面所有模版内容就行** | ||
``` |
146 changes: 115 additions & 31 deletions
146
docs/2.0/docs/notes/cpu_threading_torchscript_inference.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,53 +1,137 @@ | ||
> 翻译任务 | ||
* 目前该页面无人翻译,期待你的加入 | ||
* 翻译奖励: https://github.com/orgs/apachecn/discussions/243 | ||
* 任务认领: https://github.com/apachecn/pytorch-doc-zh/discussions/583 | ||
|
||
请参考这个模版来写内容: | ||
|
||
|
||
# PyTorch 某某页面 | ||
# CPU 线程和 TorchScript 推理 [¶](#cpu-threading-and-torchscript-inference "此标题的固定链接") | ||
|
||
> 译者:[片刻小哥哥](https://github.com/jiangzhonglian) | ||
> | ||
> 项目地址:<https://pytorch.apachecn.org/2.0/docs/notes/cpu_threading_torchscript_inference> | ||
> | ||
> 原始地址:<https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html> | ||
开始写原始页面的翻译内容 | ||
|
||
PyTorch 允许在 TorchScript 模型推理期间使用多个 CPU 线程。下图显示了在非典型应用程序中会发现的不同级别的并行性: | ||
|
||
|
||
[](https://pytorch.org/docs/stable/_images/cpu_threading_torchscript_inference.svg) 一个或多个推理线程对给定输入执行模型的前向传递。每个推理线程调用 JIT 解释器,该解释器逐一执行内联模型的操作。模型可以利用“fork”TorchScript 原语来启动异步任务。一次分叉多个操作会导致并行执行任务。 `fork` 运算符返回一个 `Future` 对象,可用于稍后同步,例如: | ||
|
||
|
||
``` | ||
@torch.jit.script | ||
def compute_z(x): | ||
return torch.mm(x, self.w_z) | ||
@torch.jit.script | ||
def forward(x): | ||
# launch compute_z asynchronously: | ||
fut = torch.jit._fork(compute_z, x) | ||
# execute the next operation in parallel to compute_z: | ||
y = torch.mm(x, self.w_y) | ||
# wait for the result of compute_z: | ||
z = torch.jit._wait(fut) | ||
return y + z | ||
``` | ||
|
||
|
||
PyTorch 使用单个线程池来实现操作间并行性,该线程池由应用程序进程中分叉的所有推理任务共享。 | ||
|
||
|
||
除了操作间并行性之外,PyTorch 还可以在操作内利用多个线程(操作内并行性)。这在许多情况下都很有用,包括大张量上的逐元素运算、卷积、GEMM、嵌入查找等。 | ||
|
||
|
||
## 构建选项 [¶](#build-options "此标题的永久链接") | ||
|
||
|
||
PyTorch 使用内部 ATen 库来实现操作。除此之外,PyTorch 还可以在外部库的支持下构建,例如 [MKL](https://software.intel.com/en-us/mkl) 和 [MKL-DNN](https://github.com/intel/mkl-dnn),以加快 CPU 上的计算速度。 | ||
|
||
|
||
ATen、MKL 和 MKL-DNN 支持操作内并行性,并依赖以下并行化库来实现: | ||
|
||
|
||
|
||
* [OpenMP](https://www.openmp.org/) - 一个标准(和一个库,通常随编译器一起提供),广泛用于外部库; | ||
* [TBB](https://github.com/intel/tbb) - 一个针对基于任务的并行性和并发环境进行优化的新型并行化库。 | ||
|
||
|
||
OpenMP 历史上已被大量库使用。它以相对易于使用以及支持基于循环的并行性和其他原语而闻名。 | ||
|
||
|
||
TBB 在外部库中使用较少,但同时针对并发环境进行了优化。 PyTorch 的 TBB 后端保证有一个单独的、单个的、每个进程的操作内线程池,供应用程序中运行的所有操作使用。 | ||
|
||
|
||
根据用例,人们可能会发现一个或另一个并行化库是其应用程序中更好的选择。 | ||
|
||
注意事项: | ||
|
||
1. 代码参考: | ||
PyTorch 允许在构建时使用以下构建选项选择 ATen 和其他库使用的并行化后端: | ||
|
||
|
||
| Library | Build Option | Values | Notes | | ||
| --- | --- | --- | --- | | ||
| ATen | `ATEN_THREADING` | `OMP` (default), `TBB` | | | ||
| MKL | `MKL_THREADING` | (same) | To enable MKL use `BLAS=MKL` | | ||
| MKL-DNN | `MKLDNN_CPU_RUNTIME` | (same) | To enable MKL-DNN use `USE_MKLDNN=1` | | ||
|
||
|
||
建议不要在一个版本中混合使用 OpenMP 和 TBB。 | ||
|
||
|
||
上述任何“TBB”值都需要“USE_TBB=1”构建设置(默认值:OFF)。OpenMP 并行性需要单独的设置“USE_OPENMP=1”(默认值:ON)。 | ||
|
||
|
||
## 运行时 API [¶](#runtime-api "此标题的永久链接") | ||
|
||
|
||
以下 API 用于控制线程设置: | ||
|
||
|
||
| Type of parallelism | Settings | Notes | | ||
| --- | --- | --- | | ||
| Inter-op parallelism | `at::set_num_interop_threads` , `at::get_num_interop_threads` (C++) `set_num_interop_threads` , `get_num_interop_threads` (Python, [`torch`](../torch.html#module-torch "torch") module) | Default number of threads: number of CPU cores. | | ||
| Intra-op parallelism | `at::set_num_threads` , `at::get_num_threads` (C++) `set_num_threads` , `get_num_threads` (Python, [`torch`](../torch.html#module-torch "torch") module) Environment variables: `OMP_NUM_THREADS` and `MKL_NUM_THREADS` | | ||
|
||
|
||
对于操作内并行度设置,`at::set_num_threads` 、 `torch.set_num_threads` 始终优先于环境变量,`MKL_NUM_THREADS` 变量优先于 `OMP_NUM\ _线程`。 | ||
|
||
|
||
## 调整线程数 [¶](#tuning-the-number-of-threads "永久链接到此标题") | ||
|
||
|
||
以下简单脚本显示了矩阵乘法的运行时间如何随线程数变化: | ||
|
||
```py | ||
import torch | ||
|
||
x = torch.ones(5) # input tensor | ||
y = torch.zeros(3) # expected output | ||
w = torch.randn(5, 3, requires_grad=True) | ||
b = torch.randn(3, requires_grad=True) | ||
z = torch.matmul(x, w)+b | ||
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y) | ||
``` | ||
import timeit | ||
runtimes = [] | ||
threads = [1] + [t for t in range(2, 49, 2)] | ||
for t in threads: | ||
torch.set_num_threads(t) | ||
r = timeit.timeit(setup = "import torch; x = torch.randn(1024, 1024); y = torch.randn(1024, 1024)", stmt="torch.mm(x, y)", number=100) | ||
runtimes.append(r) | ||
# ... plotting (threads, runtimes) ... | ||
``` | ||
|
||
|
||
在具有 24 个物理 CPU 核心(Xeon E5-2680、MKL 和 OpenMP 基于构建)的系统上运行脚本会产生以下运行时间: | ||
|
||
|
||
[](https://pytorch.org/docs/stable/_images/cpu_threading_runtimes.svg) | ||
|
||
在调整帧内和帧间数量时应考虑以下注意事项-操作线程: | ||
|
||
|
||
|
||
* 在选择线程数量时,需要避免超额订阅(使用太多线程,会导致性能下降)。例如,在使用大型应用程序线程池或严重依赖操作间并行性的应用程序中,人们可能会发现禁用操作内并行性作为一种可能的选择(即通过调用“set_num_threads(1)”);*在典型的应用程序中,人们可能会遇到延迟(处理推理请求所花费的时间)和吞吐量(每单位时间完成的工作量)之间的权衡。调整线程数量可能是一种有用的工具,可以以某种方式调整这种权衡。例如,在延迟关键的应用程序中,人们可能希望增加操作内线程的数量以尽可能快地处理每个请求。同时,操作的并行实现可能会增加额外的开销,从而增加每个请求完成的工作量,从而降低总体吞吐量。 | ||
|
||
|
||
2. 公式参考: | ||
!!! warning "警告" | ||
|
||
1) 无需换行的写法: | ||
OpenMP 不保证应用程序中将使用单个每个进程的操作内线程池。相反,两个不同的应用程序或操作线程间可能会使用不同的 OpenMP 线程池来进行操作内工作。这可能会导致应用程序使用大量线程。在调整线程数量时需要格外小心,以避免过度订阅OpenMP 案例中的多线程应用程序。 | ||
|
||
$\sqrt{w^T*w}$ | ||
|
||
2) 需要换行的写法: | ||
!!! note "笔记" | ||
|
||
$$ | ||
\sqrt{w^T*w} | ||
$$ | ||
预构建的 PyTorch 版本是使用 OpenMP 支持进行编译的。 | ||
|
||
3. 图片参考(用图片的实际地址就行): | ||
|
||
<img src='http://data.apachecn.org/img/logo/logo_green.png' width=20% /> | ||
!!! note "笔记" | ||
|
||
4. **翻译完后请删除上面所有模版内容就行** | ||
`parallel_info` 实用程序打印有关线程设置的信息,并可用于调试。在 Python 中也可以通过 `torch.__config__.parallel_info()` 调用获得类似的输出。 |
Oops, something went wrong.
d8b6e0e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs:
pytorch-doc-zh – ./
pytorch-doc-zh-apachecn.vercel.app
pytorch-doc-zh-git-master-apachecn.vercel.app
pytorch-doc-zh.vercel.app