Merge pull request #666 from jiangzhonglian/main

更新部分 notes 翻译
apachecn · Dec 20, 2023 · d8b6e0e · d8b6e0e · vercel · Dec 20, 2023
2 parents eb7a7ce + 49fceb4
commit d8b6e0e
Show file tree

Hide file tree

Showing 19 changed files with 17,857 additions and 280 deletions.
diff --git a/docs/2.0/docs/notes/amp_examples.md b/docs/2.0/docs/notes/amp_examples.md
diff --git a/docs/2.0/docs/notes/autograd.md b/docs/2.0/docs/notes/autograd.md
diff --git a/docs/2.0/docs/notes/broadcasting.md b/docs/2.0/docs/notes/broadcasting.md
@@ -1,53 +1,138 @@
-> 翻译任务
-
-* 目前该页面无人翻译，期待你的加入
-* 翻译奖励: https://github.com/orgs/apachecn/discussions/243
-* 任务认领: https://github.com/apachecn/pytorch-doc-zh/discussions/583
-
-请参考这个模版来写内容:
-
-
-# PyTorch 某某页面
+# 广播语义 [¶](#broadcasting-semantics "此标题的永久链接")
 
 > 译者：[片刻小哥哥](https://github.com/jiangzhonglian)
 >
 > 项目地址：<https://pytorch.apachecn.org/2.0/docs/notes/broadcasting>
 >
 > 原始地址：<https://pytorch.org/docs/stable/notes/broadcasting.html>
 
-开始写原始页面的翻译内容
+
+ 许多 PyTorch 操作支持 NumPy 的广播语义。有关详细信息，请参阅 <https://numpy.org/doc/stable/user/basics.broadcasting.html>。
+
+
+ 简而言之，如果 PyTorch 操作支持广播，那么它的 Tensor 参数可以自动扩展为相同的大小(无需复制数据)。
 
 
+## 通用语义 [¶](#general-semantics "此标题的永久链接")
 
-注意事项: 
 
-1. 代码参考:
+ 如果满足以下规则，则两个张量是“可广播的”：
+
+
+
+* 每个张量至少有一个维度。 
+* 当迭代维度大小时，从尾随维度开始，维度大小必须相等，其中之一为 1，或者其中之一不存在。
+
+
+ 例如：
 
-```py
-import torch
 
-x = torch.ones(5)  # input tensor
-y = torch.zeros(3)  # expected output
-w = torch.randn(5, 3, requires_grad=True)
-b = torch.randn(3, requires_grad=True)
-z = torch.matmul(x, w)+b
-loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
 ```
+>>> x=torch.empty(5,7,3)
+>>> y=torch.empty(5,7,3)
+# same shapes are always broadcastable (i.e. the above rules always hold)
+
+>>> x=torch.empty((0,))
+>>> y=torch.empty(2,2)
+# x and y are not broadcastable, because x does not have at least 1 dimension
+
+# can line up trailing dimensions
+>>> x=torch.empty(5,3,4,1)
+>>> y=torch.empty(  3,1,1)
+# x and y are broadcastable.
+# 1st trailing dimension: both have size 1
+# 2nd trailing dimension: y has size 1
+# 3rd trailing dimension: x size == y size
+# 4th trailing dimension: y dimension doesn't exist
+
+# but:
+>>> x=torch.empty(5,2,4,1)
+>>> y=torch.empty(  3,1,1)
+# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3
 
-2. 公式参考:
+```
+
+
+ 如果两个张量“x”、“y”是“可广播的”，则结果张量大小计算如下：
+
+
+
+* 如果`x`和`y`的维数不相等，则在维数较少的张量的维数前面加上1，使它们的长度相等。*然后，对于每个维数大小，得到的维数大小是以下的最大值沿该维度的“x”和“y”的大小。
+
+
+ 例如：
+
+
+```
+# can line up trailing dimensions to make reading easier
+>>> x=torch.empty(5,1,4,1)
+>>> y=torch.empty(  3,1,1)
+>>> (x+y).size()
+torch.Size([5, 3, 4, 1])
+
+# but not necessary:
+>>> x=torch.empty(1)
+>>> y=torch.empty(3,1,7)
+>>> (x+y).size()
+torch.Size([3, 1, 7])
+
+>>> x=torch.empty(5,2,4,1)
+>>> y=torch.empty(3,1,1)
+>>> (x+y).size()
+RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1
+
+```
 
-1) 无需换行的写法: 
 
-$\sqrt{w^T*w}$
+## 就地语义 [¶](#in-place-semantics "此标题的永久链接")
 
-2) 需要换行的写法：
 
-$$
-\sqrt{w^T*w}
-$$
+ 一个复杂之处是就地操作不允许就地张量因广播而改变形状。
 
-3. 图片参考(用图片的实际地址就行):
 
-<img src='http://data.apachecn.org/img/logo/logo_green.png' width=20% />
+ 例如：
+
+
+```
+>>> x=torch.empty(5,3,4,1)
+>>> y=torch.empty(3,1,1)
+>>> (x.add_(y)).size()
+torch.Size([5, 3, 4, 1])
+
+# but:
+>>> x=torch.empty(1,3,1)
+>>> y=torch.empty(3,1,7)
+>>> (x.add_(y)).size()
+RuntimeError: The expanded size of the tensor (1) must match the existing size (7) at non-singleton dimension 2.
+
+```
+
+
+## 向后兼容性 [¶](#backwards-compatibility "此标题的永久链接")
+
+
+ PyTorch 的早期版本允许在不同形状的张量上执行某些逐点函数，只要每个张量中的元素数量相等。然后通过将每个张量视为一维来执行逐点运算。 PyTorch 现在支持广播，并且“一维”逐点行为被视为已弃用，并且在张量不可广播但具有相同数量的元素的情况下将生成 Python 警告。
+
+
+ 请注意，在两个张量不具有相同形状，但可广播且具有相同数量元素的情况下，引入广播可能会导致向后不兼容的更改。例如：
+
+
+```
+>>> torch.add(torch.ones(4,1), torch.randn(4))
+
+```
+
+
+ 以前会生成大小为 torch.Size([4,1]) 的张量，但现在生成大小为 torch.Size([4,4]) 的张量。为了帮助识别代码中向后不兼容的情况可能存在广播引入的情况，您可以将 torch.utils.backcompat.broadcast_warning.enabled 设置为 True ，在这种情况下会生成 python 警告。
+
+
+ 例如：
+
+
+```
+>>> torch.utils.backcompat.broadcast_warning.enabled=True
+>>> torch.add(torch.ones(4,1), torch.ones(4))
+__main__:1: UserWarning: self and other do not have the same shape, but are broadcastable, and have the same number of elements.
+Changing behavior in a backwards incompatible manner to broadcasting rather than viewing as 1-dimensional.
 
-4. **翻译完后请删除上面所有模版内容就行**
+```
diff --git a/docs/2.0/docs/notes/cpu_threading_torchscript_inference.md b/docs/2.0/docs/notes/cpu_threading_torchscript_inference.md
@@ -1,53 +1,137 @@
-> 翻译任务
-
-* 目前该页面无人翻译，期待你的加入
-* 翻译奖励: https://github.com/orgs/apachecn/discussions/243
-* 任务认领: https://github.com/apachecn/pytorch-doc-zh/discussions/583
-
-请参考这个模版来写内容:
-
-
-# PyTorch 某某页面
+# CPU 线程和 TorchScript 推理 [¶](#cpu-threading-and-torchscript-inference "此标题的固定链接")
 
 > 译者：[片刻小哥哥](https://github.com/jiangzhonglian)
 >
 > 项目地址：<https://pytorch.apachecn.org/2.0/docs/notes/cpu_threading_torchscript_inference>
 >
 > 原始地址：<https://pytorch.org/docs/stable/notes/cpu_threading_torchscript_inference.html>
 
-开始写原始页面的翻译内容
+
+ PyTorch 允许在 TorchScript 模型推理期间使用多个 CPU 线程。下图显示了在非典型应用程序中会发现的不同级别的并行性：
+
+
+[![https://pytorch.org/docs/stable/_images/cpu_threading_torchscript_inference.svg](https://pytorch.org/docs/stable/_images/cpu_threading_torchscript_inference.svg)](https://pytorch.org/docs/stable/_images/cpu_threading_torchscript_inference.svg) 一个或多个推理线程对给定输入执行模型的前向传递。每个推理线程调用 JIT 解释器，该解释器逐一执行内联模型的操作。模型可以利用“fork”TorchScript 原语来启动异步任务。一次分叉多个操作会导致并行执行任务。 `fork` 运算符返回一个 `Future` 对象，可用于稍后同步，例如：
+
+
+```
+@torch.jit.script
+def compute_z(x):
+    return torch.mm(x, self.w_z)
+
+@torch.jit.script
+def forward(x):
+    # launch compute_z asynchronously:
+    fut = torch.jit._fork(compute_z, x)
+    # execute the next operation in parallel to compute_z:
+    y = torch.mm(x, self.w_y)
+    # wait for the result of compute_z:
+    z = torch.jit._wait(fut)
+    return y + z
+
+```
+
+
+ PyTorch 使用单个线程池来实现操作间并行性，该线程池由应用程序进程中分叉的所有推理任务共享。
+
+
+ 除了操作间并行性之外，PyTorch 还可以在操作内利用多个线程(操作内并行性)。这在许多情况下都很有用，包括大张量上的逐元素运算、卷积、GEMM、嵌入查找等。
+
+
+## 构建选项 [¶](#build-options "此标题的永久链接")
+
+
+ PyTorch 使用内部 ATen 库来实现操作。除此之外，PyTorch 还可以在外部库的支持下构建，例如 [MKL](https://software.intel.com/en-us/mkl) 和 [MKL-DNN](https://github.com/intel/mkl-dnn)，以加快 CPU 上的计算速度。
+
+
+ ATen、MKL 和 MKL-DNN 支持操作内并行性，并依赖以下并行化库来实现：
+
+
+
+* [OpenMP](https://www.openmp.org/) - 一个标准(和一个库，通常随编译器一起提供)，广泛用于外部库；
+* [TBB](https://github.com/intel/tbb) - 一个针对基于任务的并行性和并发环境进行优化的新型并行化库。
+
+
+ OpenMP 历史上已被大量库使用。它以相对易于使用以及支持基于循环的并行性和其他原语而闻名。
+
+
+ TBB 在外部库中使用较少，但同时针对并发环境进行了优化。 PyTorch 的 TBB 后端保证有一个单独的、单个的、每个进程的操作内线程池，供应用程序中运行的所有操作使用。
 
 
+ 根据用例，人们可能会发现一个或另一个并行化库是其应用程序中更好的选择。
 
-注意事项: 
 
-1. 代码参考:
+ PyTorch 允许在构建时使用以下构建选项选择 ATen 和其他库使用的并行化后端：
+
+
+| 	 Library	  | 	 Build Option	  | 	 Values	  | 	 Notes	  |
+| --- | --- | --- | --- |
+| 	 ATen	  | 	`ATEN_THREADING`	 | 	`OMP`	 (default),	 `TBB`	 |  |
+| 	 MKL	  | 	`MKL_THREADING`	 | 	 (same)	  | 	 To enable MKL use	 `BLAS=MKL`	 |
+| 	 MKL-DNN	  | 	`MKLDNN_CPU_RUNTIME`	 | 	 (same)	  | 	 To enable MKL-DNN use	 `USE_MKLDNN=1`	 |
+
+
+ 建议不要在一个版本中混合使用 OpenMP 和 TBB。
+
+
+ 上述任何“TBB”值都需要“USE_TBB=1”构建设置(默认值：OFF)。OpenMP 并行性需要单独的设置“USE_OPENMP=1”(默认值：ON)。
+
+
+## 运行时 API [¶](#runtime-api "此标题的永久链接")
+
+
+ 以下 API 用于控制线程设置：
+
+
+| 	 Type of parallelism	  | 	 Settings	  | 	 Notes	  |
+| --- | --- | --- |
+| 	 Inter-op parallelism	  | 	`at::set_num_interop_threads`	 ,	 `at::get_num_interop_threads`	 (C++)	 		`set_num_interop_threads`	 ,	 `get_num_interop_threads`	 (Python,	 [`torch`](../torch.html#module-torch "torch")	 module)	  | 	 Default number of threads: number of CPU cores.	  |
+| 	 Intra-op parallelism	  | 	`at::set_num_threads`	 ,	 `at::get_num_threads`	 (C++)	 `set_num_threads`	 ,	 `get_num_threads`	 (Python,	 [`torch`](../torch.html#module-torch "torch")	 module)	 		 Environment variables:	 `OMP_NUM_THREADS`	 and	 `MKL_NUM_THREADS`	 |
+
+
+ 对于操作内并行度设置，`at::set_num_threads` 、 `torch.set_num_threads` 始终优先于环境变量，`MKL_NUM_THREADS` 变量优先于 `OMP_NUM\ _线程`。
+
+
+## 调整线程数 [¶](#tuning-the-number-of-threads "永久链接到此标题")
+
+
+ 以下简单脚本显示了矩阵乘法的运行时间如何随线程数变化：
 
-```py
-import torch
 
-x = torch.ones(5)  # input tensor
-y = torch.zeros(3)  # expected output
-w = torch.randn(5, 3, requires_grad=True)
-b = torch.randn(3, requires_grad=True)
-z = torch.matmul(x, w)+b
-loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
 ```
+import timeit
+runtimes = []
+threads = [1] + [t for t in range(2, 49, 2)]
+for t in threads:
+    torch.set_num_threads(t)
+    r = timeit.timeit(setup = "import torch; x = torch.randn(1024, 1024); y = torch.randn(1024, 1024)", stmt="torch.mm(x, y)", number=100)
+    runtimes.append(r)
+# ... plotting (threads, runtimes) ...
+
+```
+
+
+ 在具有 24 个物理 CPU 核心(Xeon E5-2680、MKL 和 OpenMP 基于构建)的系统上运行脚本会产生以下运行时间：
+
+
+[![https://pytorch.org/docs/stable/_images/cpu_threading_runtimes.svg](https://pytorch.org/docs/stable/_images/cpu_threading_runtimes.svg)](https://pytorch.org/docs/stable/_images/cpu_threading_runtimes.svg) 
+
+在调整帧内和帧间数量时应考虑以下注意事项-操作线程：
+
+
+
+* 在选择线程数量时，需要避免超额订阅(使用太多线程，会导致性能下降)。例如，在使用大型应用程序线程池或严重依赖操作间并行性的应用程序中，人们可能会发现禁用操作内并行性作为一种可能的选择(即通过调用“set_num_threads(1)”)；*在典型的应用程序中，人们可能会遇到延迟(处理推理请求所花费的时间)和吞吐量(每单位时间完成的工作量)之间的权衡。调整线程数量可能是一种有用的工具，可以以某种方式调整这种权衡。例如，在延迟关键的应用程序中，人们可能希望增加操作内线程的数量以尽可能快地处理每个请求。同时，操作的并行实现可能会增加额外的开销，从而增加每个请求完成的工作量，从而降低总体吞吐量。
+
 
-2. 公式参考:
+!!! warning "警告"
 
-1) 无需换行的写法: 
+    OpenMP 不保证应用程序中将使用单个每个进程的操作内线程池。相反，两个不同的应用程序或操作线程间可能会使用不同的 OpenMP 线程池来进行操作内工作。这可能会导致应用程序使用大量线程。在调整线程数量时需要格外小心，以避免过度订阅OpenMP 案例中的多线程应用程序。
 
-$\sqrt{w^T*w}$
 
-2) 需要换行的写法：
+!!! note "笔记"
 
-$$
-\sqrt{w^T*w}
-$$
+    预构建的 PyTorch 版本是使用 OpenMP 支持进行编译的。
 
-3. 图片参考(用图片的实际地址就行):
 
-<img src='http://data.apachecn.org/img/logo/logo_green.png' width=20% />
+!!! note "笔记"
 
-4. **翻译完后请删除上面所有模版内容就行**
+    `parallel_info` 实用程序打印有关线程设置的信息，并可用于调试。在 Python 中也可以通过 `torch.__config__.parallel_info()` 调用获得类似的输出。