-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuda OOM #5
Comments
I changed the clip_len to 8 and it worked, but when running to 47%, OOM appeared again :( @wjn922 processor 0: 47% 14/30 [06:20<06:29, 24.34s/it]Traceback (most recent call last): |
After checking, it was the image of 'goldfish' that caused the OOM, so I skip objects with num_obj greater than 3. |
Hi, We run the code on the V100 with 32G memory. We find it needs around 24G generally, while for some videos containing a lot of objects, it will reach 32G. To reduce the memory, one way is use a shorter clip like you do. Another way is to reduce the video resolution here. But these two solutions are likely to reduce the precision. |
It views the language as queries and directly attends to the most relevant regions in the video frames.... How to achieve using language as queries like the gif of the homepage shows? @wjn922 |
For the Transformer decoder, the decoder embedding is the pooled language feature, and the learnable queries are pos embedding. Please refer here. |
Hi @wjn922 , What about for inference_ytvos? Since there is no num_obj variable, is the adjustment of resizing the only way to solve CUDA OOM error? |
I tried after resizing it to 250, the 48th video gives the CUDA OOM error, the number of expressions is 2, and the length of the video is 36. The values are not very high compared to the previous videos. What causes this to happen? Below is the result of the error: processor 0: 24% 48/202 [04:14<18:29, 7.20s/it] Process Process-2: |
platform: windows10 anaconda, RTX2080 8G
python inference_davis.py --with_box_refine --binary --freeze_text_encoder --output_dir davis_dirs/resnet50 --resume ckpt/ytvos_r50.pth --backbone resnet50 --ngpu 1
Inference only supports for batch size = 1
Namespace(a2d_path='data/a2d_sentences', aux_loss=True, backbone='resnet50', backbone_pretrained=None, batch_size=1, bbox_loss_coef=5, binary=True, cache_mode=False, clip_max_norm=0.1, cls_loss_coef=2, coco_path='data/coco', controller_layers=3, dataset_file='davis', davis_path='data/ref-davis', dec_layers=4, dec_n_points=4, device='cuda', dice_loss_coef=5, dilation=False, dim_feedforward=2048, dist_url='env://', dropout=0.1, dynamic_mask_channels=8, enc_layers=4, enc_n_points=4, eos_coef=0.1, epochs=10, eval=False, focal_alpha=0.25, freeze_text_encoder=True, giou_loss_coef=2, hidden_dim=256, jhmdb_path='data/jhmdb_sentences', lr=0.0001, lr_backbone=5e-05, lr_backbone_names=['backbone.0'], lr_drop=[6, 8], lr_linear_proj_mult=1.0, lr_linear_proj_names=['reference_points', 'sampling_offsets'], lr_text_encoder=1e-05, lr_text_encoder_names=['text_encoder'], mask_dim=256, mask_loss_coef=2, masks=True, max_size=640, max_skip=3, ngpu=1, nheads=8, num_feature_levels=4, num_frames=5, num_queries=5, num_workers=4, output_dir='davis_dirs/resnet50', position_embedding='sine', pre_norm=False, pretrained_weights=None, rel_coord=True, remove_difficult=False, resume='ckpt/ytvos_r50.pth', seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_dice=5, set_cost_giou=2, set_cost_mask=2, split='valid', start_epoch=0, threshold=0.5, two_stage=False, use_checkpoint=False, visualize=False, weight_decay=0.0005, with_box_refine=True, world_size=1, ytvos_path='data/ref-youtube-vos')
Start inference
processor 0: 0% 0/30 [00:00<?, ?it/s]Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.dense.bias']
number of params: 51394175
D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ..\aten\src\ATen\native\BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
Traceback (most recent call last):
File "inference_davis.py", line 330, in
main(args)
File "inference_davis.py", line 103, in main
p.run()
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\multiprocessing\process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "inference_davis.py", line 224, in sub_processor
outputs = model([imgs], [exp], [target])
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\referformer.py", line 286, in forward
self.transformer(srcs, text_embed, masks, poses, query_embeds)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 170, in forward
memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 291, in forward
output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 261, in forward
src = self.forward_ffn(src)
File "D:\SourceCodes\Transformers\ReferFormer\models\deformable_transformer.py", line 248, in forward_ffn
src2 = self.linear2(self.dropout2(self.activation(self.linear1(src))))
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\modules\linear.py", line 96, in forward
return F.linear(input, self.weight, self.bias)
File "D:\DevelopTools\anaconda3\envs\dlenv\lib\site-packages\torch\nn\functional.py", line 1847, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA out of memory. Tried to allocate 1.43 GiB (GPU 0; 8.00 GiB total capacity; 3.75 GiB already allocated; 691.50 MiB free; 5.43 GiB reserved in total by PyTorch)
processor 0: 0% 0/30 [00:23<?, ?it/s]
At least how much memory is required to run?
or What parameters can be modified to reduce memory overhead?
Thanks!
The text was updated successfully, but these errors were encountered: