about deblur trainning #117

Davidcoach · 2022-07-05T03:10:35Z

hello, I degraded the image in FFHQ and want to use the debur process in MPRnet to restore it.
But, when I train the model, I first met this problem
Traceback (most recent call last):
File "/home/ma-user/work/MPRnet/train.py", line 120, in
loss_char = np.sum([criterion_char(restored[j],target) for j in range(len(restored))])
File "<array_function internals>", line 6, in sum
File "/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 2260, in sum
initial=initial, where=where)
File "/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 86, in wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
File "/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/torch/tensor.py", line 621, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
So I changed the loss function to this
loss_char = torch.tensor([criterion_char(restored[j],target) for j in range(len(restored))]).sum()
loss_edge = torch.tensor([criterion_edge(restored[j],target) for j in range(len(restored))]).sum()
loss = ((loss_char) + (0.05*loss_edge)).requires_grad(True)
this seems work, but after trainning, I found that the PSNR is always 12.4697 and never change, the model learnd nothing from the data.
How can I do?

KKKLeouee · 2022-07-20T05:54:56Z

I also encounter the same problem. I want to ask you how to solve this problem in the end

KKKLeouee · 2022-07-20T06:04:45Z

你好，在 FFHQ 降了一个图像，想用我在 Rnet 中的 debur 过程中我训练问题来恢复。，当模型时，第一次遇到这个 Traceback（最近一次调用最后一次）：文件“/home /ma-user/work/MPRnet/train.py”，第 120 行，在 loss_char = np.sum([ criteria_char(restored[j],target) for j in range(len(restored))]) 文件“< array_function internals>”，第 6 行，总和文件“/home/ma-user/anaconda3/envs/PyTorch-1.8 /lib/python3.7/site-packages/numpy/core/fromnumeric.py”，第 2260 行，总而言之其 initial=initial，where=where) 文件“/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/numpy/core/fromnumeric.py”，第86行，在_wrapreduction 返回 ufunc.reduce(obj, axis, dtype, out, passkwargs) 文件“/home/ma-user/anaconda3/envs/PyTorch-1.8/lib/python3.7/site-packages/torch/tensor.py ”，第 621 行，在指令中 return self.numpy** () TypeError: can't convert cuda:0 device type tensor to numpy.首先使用 Tenor.cpu() 将张量复制到主机。 tensor([criterion_char(restored[j],target) for j in range(len(restored))]).sum() loss_edge = torch.tensor([criterion_edge(restored [j],target) for j in range(len (restoredd.sum() loss = (0.05*lossedge))._ require_grad ( 并且经过训练，我似乎发现 PSNR 但这是 12.4697真实)能做什么？

I also encounter the same problem. I want to ask you how to solve this problem in the end

Davidcoach · 2022-07-20T07:49:16Z

你好，我发现是warmup_scheduler的问题，你可以试试删去它或者调大max epoch

…

------------------ Original message ------------------ From: "KKKLeouee"; Sendtime: Wednesday, Jul 20, 2022 2:05 PM To: "swz30/MPRNet"; Cc: "薛文 ***@***.***>; "Author"; Subject: Re: [swz30/MPRNet] about deblur trainning (Issue #117) 我也遇到了同样的问题。我想问你这个到底怎么解决的问题 1212121212 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

adityac8 · 2022-07-20T09:24:39Z

This might help #91 (comment)

userHLN · 2022-10-23T12:09:03Z

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
请问这个问题怎么解决我和楼主问题一样也是用的python3.7

Makohhh · 2022-10-31T07:11:35Z

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. 请问这个问题怎么解决我和楼主问题一样也是用的python3.7

请问你解决了吗？我也遇到了这个问题

wpc0086 · 2022-11-25T02:14:43Z

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. 请问这个问题怎么解决我和楼主问题一样也是用的python3.7

loss_char = sum([criterion_char(restored[j],target) for j in range(len(restored))]) 将 np.sum改为sum就可以了

drifterss · 2023-03-27T07:51:50Z

请问大家训练过程中遇到了损失函数突然很大的情况吗？我训练到20轮的时候损失函数就爆炸了

Feecuin · 2024-04-16T08:34:05Z

请问大家训练过程中遇到了损失函数突然很大的情况吗？我训练到20轮的时候损失函数就爆炸了

老哥问题解决了吗

Turing2022 · 2024-12-11T07:49:10Z

请问大家训练过程中遇到了损失函数突然很大的情况吗？我训练到20轮的时候损失函数就爆炸了
我也是我还怀疑是我batch_size调到2的原因呢

Turing2022 · 2024-12-11T07:49:31Z

请问大家训练过程中遇到了损失函数突然很大的情况吗？我训练到20轮的时候损失函数就爆炸了

老哥问题解决了吗

现在怎么样了？

Feecuin · 2024-12-11T08:16:32Z

请问大家训练过程中遇到了损失函数突然很大的情况吗？我训练到20轮的时候损失函数就爆炸了

老哥问题解决了吗

现在怎么样了？

没解决哈哈哈哈，我没有研究这个算法了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about deblur trainning #117

about deblur trainning #117

Davidcoach commented Jul 5, 2022

KKKLeouee commented Jul 20, 2022

KKKLeouee commented Jul 20, 2022

Davidcoach commented Jul 20, 2022 via email

adityac8 commented Jul 20, 2022

userHLN commented Oct 23, 2022

Makohhh commented Oct 31, 2022

wpc0086 commented Nov 25, 2022

drifterss commented Mar 27, 2023

Feecuin commented Apr 16, 2024

Turing2022 commented Dec 11, 2024

Turing2022 commented Dec 11, 2024

Feecuin commented Dec 11, 2024

about deblur trainning #117

about deblur trainning #117

Comments

Davidcoach commented Jul 5, 2022

KKKLeouee commented Jul 20, 2022

KKKLeouee commented Jul 20, 2022

Davidcoach commented Jul 20, 2022 via email

adityac8 commented Jul 20, 2022

userHLN commented Oct 23, 2022

Makohhh commented Oct 31, 2022

wpc0086 commented Nov 25, 2022

drifterss commented Mar 27, 2023

Feecuin commented Apr 16, 2024

Turing2022 commented Dec 11, 2024

Turing2022 commented Dec 11, 2024

Feecuin commented Dec 11, 2024