We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我打印了前推时的log, print(data["name"]) print(data["left"]) print(data["right"]) with torch.cuda.amp.autocast(enabled=self.cfgs.OPTIMIZATION.AMP): model_pred = self.model(data) infer_timer = time.time() loss, tb_info = loss_func(model_pred, data) disp_pred = model_pred['disp_pred'] print(disp_pred) print("loss",loss) 发现输入的数据没问题,但是前推输出为nan,导致loss加计算为nan, ['/IRSDataset/Store/ConvenienceStore_Day/l_566.png'] tensor([[[[ 0.6392, 0.4166, 0.2624, ..., 1.1700, 1.1872, 1.2214], [ 1.7865, 1.4098, 0.7762, ..., 1.1700, 1.1700, 1.2214], [ 2.0092, 2.0263, 1.8893, ..., 1.1529, 1.1529, 1.2214], ..., [-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489], [-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489], [-2.0665, -2.1179, -2.0665, ..., 2.2489, 2.2489, 2.2489]],
[[-0.0049, -0.2325, -0.3725, ..., -2.0357, -1.9832, -1.8081], [ 1.2206, 0.8004, 0.1352, ..., -2.0357, -1.9132, -1.7731], [ 1.4657, 1.4832, 1.3081, ..., -2.0357, -1.9132, -1.6856], ..., [-2.0357, -2.0357, -2.0357, ..., 2.4111, 2.4111, 2.4111], [-2.0357, -2.0357, -2.0357, ..., 2.4111, 2.4111, 2.4111], [-1.9832, -2.0357, -2.0357, ..., 2.4111, 2.4111, 2.4111]], [[-0.8981, -1.1247, -1.2641, ..., -1.8044, -1.6302, -1.4210], [ 0.2696, -0.1138, -0.7587, ..., -1.7522, -1.6302, -1.4036], [ 0.4962, 0.5136, 0.3568, ..., -1.6824, -1.6302, -1.4036], ..., [-1.8044, -1.8044, -1.8044, ..., 2.5877, 2.5877, 2.5877], [-1.8044, -1.8044, -1.8044, ..., 2.5877, 2.5877, 2.5877], [-1.7522, -1.8044, -1.8044, ..., 2.5877, 2.5877, 2.5877]]]], device='cuda:0')
tensor([[[[ 0.2282, 0.2453, 0.2453, ..., 1.3070, 1.3242, 1.3242], [ 0.2453, 0.2453, 0.2453, ..., 1.3070, 1.3070, 1.3584], [ 0.2282, 0.2453, 0.2453, ..., 1.3242, 1.3242, 1.3584], ..., [-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489], [-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489], [-2.0665, -2.1179, -2.0665, ..., 2.2489, 2.2489, 2.2489]],
[[-0.3375, -0.3375, -0.3200, ..., -2.0357, -2.0357, -2.0357], [-0.3375, -0.3375, -0.3375, ..., -2.0357, -2.0357, -2.0357], [-0.3725, -0.3375, -0.3375, ..., -2.0357, -2.0357, -2.0357], ..., [-2.0357, -2.0357, -2.0357, ..., 2.4111, 2.4111, 2.4111], [-2.0357, -2.0357, -2.0357, ..., 2.4111, 2.4111, 2.4111], [-2.0357, -2.0357, -2.0357, ..., 2.4111, 2.4111, 2.4111]], [[-0.9678, -0.9330, -0.9330, ..., -1.8044, -1.7522, -1.8044], [-0.9504, -0.9504, -0.9504, ..., -1.8044, -1.7522, -1.7522], [-1.0027, -0.9678, -0.9504, ..., -1.7522, -1.7522, -1.6824], ..., [-1.8044, -1.8044, -1.8044, ..., 2.5877, 2.5877, 2.5877], [-1.8044, -1.8044, -1.8044, ..., 2.5877, 2.5877, 2.5877], [-1.8044, -1.8044, -1.8044, ..., 2.6051, 2.5877, 2.5877]]]], device='cuda:0')
tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0', grad_fn=) tensor(nan, device='cuda:0', grad_fn=) 2024-10-03 11:53:39,003 INFO Training Epoch: 9/50 Iter: 947/5661 Loss:nan(nan) LR:6.7625e-04 DataTime:0.12 InferTime:43.17ms Time cost: 06:46/33:37:36 l/OpenStereo/./stereo/utils/common_utils.py:198: RuntimeWarning: invalid value encountered in cast pred_tmp = cm(pred_tmp.astype('uint8')) //OpenStereo/./stereo/utils/common_utils.py:199: RuntimeWarning: invalid value encountered in cast error_map_tmp = cm(error_map_tmp.astype('uint8')) 请问一下,这是数据有问题吗?但是现在还不知道怎么排查数据,是左右目没有对齐吗?
The text was updated successfully, but these errors were encountered:
一样的情况,模型输出全为0,应该怎么处理?
Sorry, something went wrong.
The cfg will be released soon.
我也是自己的数据集出现了相同的问题,在第9个epoch loss出现了nan
可以暂时将配置文件中的LEFT_ATT 置为false
一样的情况IGEV配置文件,SceneFlow+Fat数据集
No branches or pull requests
我打印了前推时的log,
print(data["name"])
print(data["left"])
print(data["right"])
with torch.cuda.amp.autocast(enabled=self.cfgs.OPTIMIZATION.AMP):
model_pred = self.model(data)
infer_timer = time.time()
loss, tb_info = loss_func(model_pred, data)
disp_pred = model_pred['disp_pred']
print(disp_pred)
print("loss",loss)
发现输入的数据没问题,但是前推输出为nan,导致loss加计算为nan,
['/IRSDataset/Store/ConvenienceStore_Day/l_566.png']
tensor([[[[ 0.6392, 0.4166, 0.2624, ..., 1.1700, 1.1872, 1.2214],
[ 1.7865, 1.4098, 0.7762, ..., 1.1700, 1.1700, 1.2214],
[ 2.0092, 2.0263, 1.8893, ..., 1.1529, 1.1529, 1.2214],
...,
[-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489],
[-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489],
[-2.0665, -2.1179, -2.0665, ..., 2.2489, 2.2489, 2.2489]],
tensor([[[[ 0.2282, 0.2453, 0.2453, ..., 1.3070, 1.3242, 1.3242],
[ 0.2453, 0.2453, 0.2453, ..., 1.3070, 1.3070, 1.3584],
[ 0.2282, 0.2453, 0.2453, ..., 1.3242, 1.3242, 1.3584],
...,
[-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489],
[-2.1179, -2.1179, -2.1179, ..., 2.2489, 2.2489, 2.2489],
[-2.0665, -2.1179, -2.0665, ..., 2.2489, 2.2489, 2.2489]],
tensor([nan, nan, nan, ..., nan, nan, nan], device='cuda:0',
grad_fn=)
tensor(nan, device='cuda:0', grad_fn=)
2024-10-03 11:53:39,003 INFO Training Epoch: 9/50 Iter: 947/5661 Loss:nan(nan) LR:6.7625e-04 DataTime:0.12 InferTime:43.17ms Time cost: 06:46/33:37:36
l/OpenStereo/./stereo/utils/common_utils.py:198: RuntimeWarning: invalid value encountered in cast
pred_tmp = cm(pred_tmp.astype('uint8'))
//OpenStereo/./stereo/utils/common_utils.py:199: RuntimeWarning: invalid value encountered in cast
error_map_tmp = cm(error_map_tmp.astype('uint8'))
请问一下,这是数据有问题吗?但是现在还不知道怎么排查数据,是左右目没有对齐吗?
The text was updated successfully, but these errors were encountered: