-
Notifications
You must be signed in to change notification settings - Fork 998
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #667 from jiangzhonglian/main
更新 note 剩余部分
- Loading branch information
Showing
11 changed files
with
2,199 additions
and
341 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,53 +1,152 @@ | ||
> 翻译任务 | ||
* 目前该页面无人翻译,期待你的加入 | ||
* 翻译奖励: https://github.com/orgs/apachecn/discussions/243 | ||
* 任务认领: https://github.com/apachecn/pytorch-doc-zh/discussions/583 | ||
|
||
请参考这个模版来写内容: | ||
|
||
|
||
# PyTorch 某某页面 | ||
# 常见问题 [¶](#frequently-asked-questions "此标题的永久链接") | ||
|
||
> 译者:[片刻小哥哥](https://github.com/jiangzhonglian) | ||
> | ||
> 项目地址:<https://pytorch.apachecn.org/2.0/docs/notes/faq> | ||
> | ||
> 原始地址:<https://pytorch.org/docs/stable/notes/faq.html> | ||
开始写原始页面的翻译内容 | ||
|
||
## 我的模型报告“cuda 运行时错误(2):内存不足” [¶](#my-model-reports-cuda-runtime-error-2-out-of-memory "永久链接到此标题") | ||
|
||
|
||
正如错误消息所示,您的 GPU 内存不足。由于我们经常在 PyTorch 中处理大量数据,小错误可能会迅速导致您的程序耗尽所有 GPU;幸运的是,这些情况下的修复通常很简单。以下是一些需要检查的常见事项: | ||
|
||
|
||
**不要在训练循环中累积历史记录。** 默认情况下,涉及需要梯度的变量的计算将保留历史记录。这意味着您应该避免在计算中使用此类变量,这些变量将超出您的训练循环,例如在跟踪统计数据时。相反,您应该分离变量或访问其基础数据。 | ||
|
||
|
||
有时,当可微变量出现时,它可能是不明显的。考虑以下训练循环(摘自[来源](https://discuss.pytorch.org/t/high-memory-usage-while-training/162)): | ||
|
||
|
||
``` | ||
total_loss = 0 | ||
for i in range(10000): | ||
optimizer.zero_grad() | ||
output = model(input) | ||
loss = criterion(output) | ||
loss.backward() | ||
optimizer.step() | ||
total_loss += loss | ||
``` | ||
|
||
|
||
这里,“total_loss”正在整个训练循环中累积历史记录,因为“loss”是一个具有 autograd 历史记录的可微变量。您可以通过编写 Total_loss += float(loss) 来解决此问题。 | ||
|
||
|
||
此问题的其他实例:[1](https://discuss.pytorch.org/t/resolved-gpu-out-of-memory-error-with-batch-size-1/3719)。 | ||
|
||
|
||
**不要保留不需要的张量和变量。** 如果将张量或变量分配给局部变量,Python 将不会释放分配,直到局部变量超出范围。您可以使用“del x”释放此引用。类似地,如果将张量或变量分配给对象的成员变量,则在该对象超出范围之前它不会释放。如果您不保留不需要的临时内存,您将获得最佳的内存使用率。 | ||
|
||
|
||
当地人的范围可能比你想象的要大。例如: | ||
|
||
|
||
``` | ||
for i in range(5): | ||
intermediate = f(input[i]) | ||
result += g(intermediate) | ||
output = h(result) | ||
return output | ||
``` | ||
|
||
|
||
在这里,即使在执行“h”时,“intermediate”仍然保持活动状态,因为它的范围超出了循环的末尾。为了更早地释放它,你应该在使用完它后`del middle`。 | ||
|
||
|
||
**避免在太大的序列上运行 RNN。** 通过 RNN 反向传播所需的内存量与 RNN 输入的长度成线性比例;因此,如果您尝试向 RNN 提供太长的序列,您将耗尽内存。 | ||
|
||
|
||
这种现象的技术术语是[通过时间反向传播](https://en.wikipedia.org/wiki/Backpropagation_through_time),并且有很多关于如何实现truncatedBPTT的参考,包括在[词语言模型](https://github.com/pytorch/examples/tree/master/word_language_model)示例;截断由“repackage”函数处理,如[本论坛帖子](https://discuss.pytorch.org/t/help-clarifying-repackage-hidden-in-word-language-model/226)中所述。 | ||
|
||
注意事项: | ||
|
||
1. 代码参考: | ||
**不要使用太大的线性层。** 线性层 `nn.Linear(m, n)` 使用 $O(nm)$ 内存:也就是说,权重的内存需求与特征数量呈二次方关系。通过这种方式很容易[耗尽你的记忆](https://github.com/pytorch/pytorch/issues/958)(并且记住你将需要至少两倍大小的权重,因为你还需要存储梯度。) | ||
|
||
```py | ||
import torch | ||
|
||
x = torch.ones(5) # input tensor | ||
y = torch.zeros(3) # expected output | ||
w = torch.randn(5, 3, requires_grad=True) | ||
b = torch.randn(3, requires_grad=True) | ||
z = torch.matmul(x, w)+b | ||
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y) | ||
**考虑检查点。** 您可以使用 [checkpoint](https://pytorch.org/docs/stable/checkpoint.html) 来权衡计算的内存。 | ||
|
||
|
||
## 我的 GPU 内存未正确释放 [¶](#my-gpu-memory-isn-t-freed-properly "永久链接到此标题") | ||
|
||
|
||
PyTorch 使用缓存内存分配器来加速内存分配。因此,“nvidia-smi”中显示的值通常不反映真实的内存使用情况。有关 GPU 内存管理的更多详细信息,请参阅[内存管理](cuda.html#cuda-memory-management)。 | ||
|
||
|
||
如果即使 Python 退出后您的 GPU 内存也没有释放,则很可能某些 Python 子进程仍然存在。你可以通过`ps -elf | 找到它们。 grep python` 并使用 `kill -9 [pid]` 手动杀死它们。 | ||
|
||
|
||
## 我的内存不足异常处理程序无法分配内存 [¶](#my-out-of-memory-exception-handler-can-t-allocate-memory "永久链接到此标题") | ||
|
||
|
||
您可能有一些代码尝试从内存不足错误中恢复。 | ||
|
||
|
||
``` | ||
try: | ||
run_model(batch_size) | ||
except RuntimeError: # Out of memory | ||
for _ in range(batch_size): | ||
run_model(1) | ||
``` | ||
|
||
2. 公式参考: | ||
|
||
1) 无需换行的写法: | ||
但你会发现,当你确实耗尽内存时,你的恢复代码也无法分配。这是因为 python 异常对象保存了对引发错误的堆栈帧的引用。这会阻止原始张量对象被释放。解决方案是将 OOM 恢复代码移到“ except”子句之外。 | ||
|
||
|
||
$\sqrt{w^T*w}$ | ||
``` | ||
oom = False | ||
try: | ||
run_model(batch_size) | ||
except RuntimeError: # Out of memory | ||
oom = True | ||
2) 需要换行的写法: | ||
if oom: | ||
for _ in range(batch_size): | ||
run_model(1) | ||
$$ | ||
\sqrt{w^T*w} | ||
$$ | ||
``` | ||
|
||
3. 图片参考(用图片的实际地址就行): | ||
|
||
<img src='http://data.apachecn.org/img/logo/logo_green.png' width=20% /> | ||
## 我的数据加载器工作程序返回相同的随机数 [¶](#my-data-loader-workers-return-identical-random-numbers "永久链接到此标题") | ||
|
||
|
||
您可能使用其他库在数据集中生成随机数,并且工作子进程通过“fork”启动。请参阅 [`torch.utils.data.DataLoader`](../data.html#torch.utils.data.DataLoader "torch.utils.data.DataLoader") 的文档,了解如何在工作人员中正确设置随机种子它的`worker_init_fn`选项。 | ||
|
||
|
||
## 我的循环网络无法使用数据并行性 [¶](#my-recurrent-network-doesn-t-work-with-data-parallelism "永久链接到此标题") | ||
|
||
|
||
在 [`Module`](../generated/torch.nn.Module.html#torch.nn.Module "torch.nn.Module") 与 [`DataParallel`](../generated/torch.nn.DataParallel.html#torch.nn.DataParallel "torch.nn.DataParallel") 或 [`data_parallel()`](../generated/torch.nn.function.torch.nn.parallel.data_parallel.html#torch.nn.parallel.data_parallel“torch.nn.parallel.data_parallel”)。每个设备上每个“forward()”的输入将只是整个输入的一部分。因为解包操作 [`torch.nn.utils.rnn.pad_packed_sequence()`](../generated/torch.nn.utils.rnn.pad_packed_sequence.html#torch.nn.utils.rnn.pad_packed_sequence "torch.nn.utils.rnn.pad_packed_sequence")默认情况下仅填充到它看到的最长输入,即该特定设备上最长的输入,当结果聚集在一起时会发生大小不匹配。因此,您可以利用 [`pad_packed_sequence()`](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pad_packed_sequence.html#torch.nn.utils.rnn.pad_packed_sequence "torch.nn.utils.rnn.pad_packed_sequence") 以确保 `forward()` 调用返回相同长度的序列。例如,你可以这样写: | ||
|
||
|
||
``` | ||
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence | ||
class MyModule(nn.Module): | ||
# ... __init__, other methods, etc. | ||
# padded_input is of shape [B x T x *](batch_first mode) and contains | ||
# the sequences sorted by lengths | ||
# B is the batch size | ||
# T is max sequence length | ||
def forward(self, padded_input, input_lengths): | ||
total_length = padded_input.size(1) # get the max sequence length | ||
packed_input = pack_padded_sequence(padded_input, input_lengths, | ||
batch_first=True) | ||
packed_output, _ = self.my_lstm(packed_input) | ||
output, _ = pad_packed_sequence(packed_output, batch_first=True, | ||
total_length=total_length) | ||
return output | ||
m = MyModule().cuda() | ||
dp_m = nn.DataParallel(m) | ||
``` | ||
|
||
|
||
4. **翻译完后请删除上面所有模版内容就行** | ||
此外,当批量维度为暗淡“1”(即“batch_first=False”)且数据并行性时,需要格外小心。在这种情况下, pack_padd_sequence `padding_input` 的第一个参数的形状为 `[T x B x *]` 并且应该沿着暗淡的 `1` 分散,但第二个参数 `input_lengths`形状为“[B]”,并且应该沿着暗淡的“0”分散。需要额外的代码来操纵张量形状。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
8c1cceb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Successfully deployed to the following URLs:
pytorch-doc-zh – ./
pytorch-doc-zh.vercel.app
pytorch-doc-zh-git-master-apachecn.vercel.app
pytorch-doc-zh-apachecn.vercel.app