We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我按照readme文档配置好了mmfewshot和voc数据集,当我用自带的配置文件运行TFA算法的base-training时,迭代次数超过100后就会nan,请问可能的原因是什么?
2023-05-16 15:59:36,050 - mmfewshot - INFO - Iter [50/18000] lr: 9.810e-03, eta: 1:13:05, time: 0.244, data_time: 0.007, memory: 7041, loss_rpn_cls: 0.2213, loss_rpn_bbox: 0.0396, loss_cls: 0.5162, acc: 92.1133, loss_bbox: 0.0955, loss: 0.8727 2023-05-16 15:59:48,591 - mmfewshot - INFO - Iter [100/18000] lr: 1.980e-02, eta: 1:13:52, time: 0.251, data_time: 0.007, memory: 7041, loss_rpn_cls: 0.1074, loss_rpn_bbox: 0.0503, loss_cls: 0.2786, acc: 96.0000, loss_bbox: 0.1581, loss: 0.5944 2023-05-16 16:00:00,323 - mmfewshot - INFO - Iter [150/18000] lr: 2.000e-02, eta: 1:12:24, time: 0.235, data_time: 0.006, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 79.5828, loss_bbox: nan, loss: nan 2023-05-16 16:00:10,732 - mmfewshot - INFO - Iter [200/18000] lr: 2.000e-02, eta: 1:09:35, time: 0.208, data_time: 0.006, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 2.6863, loss_bbox: nan, loss: nan 2023-05-16 16:00:21,427 - mmfewshot - INFO - Iter [250/18000] lr: 2.000e-02, eta: 1:08:09, time: 0.214, data_time: 0.009, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 1.0000, loss_bbox: nan, loss: nan
The text was updated successfully, but these errors were encountered:
We recommend using English or English & Chinese for issues so that we could have broader discussion.
Sorry, something went wrong.
把batch_size调小试试
learning rate调小试试
learning rate默认的是八个gpu 调小lr/8即可
No branches or pull requests
我按照readme文档配置好了mmfewshot和voc数据集,当我用自带的配置文件运行TFA算法的base-training时,迭代次数超过100后就会nan,请问可能的原因是什么?
2023-05-16 15:59:36,050 - mmfewshot - INFO - Iter [50/18000] lr: 9.810e-03, eta: 1:13:05, time: 0.244, data_time: 0.007, memory: 7041, loss_rpn_cls: 0.2213, loss_rpn_bbox: 0.0396, loss_cls: 0.5162, acc: 92.1133, loss_bbox: 0.0955, loss: 0.8727
2023-05-16 15:59:48,591 - mmfewshot - INFO - Iter [100/18000] lr: 1.980e-02, eta: 1:13:52, time: 0.251, data_time: 0.007, memory: 7041, loss_rpn_cls: 0.1074, loss_rpn_bbox: 0.0503, loss_cls: 0.2786, acc: 96.0000, loss_bbox: 0.1581, loss: 0.5944
2023-05-16 16:00:00,323 - mmfewshot - INFO - Iter [150/18000] lr: 2.000e-02, eta: 1:12:24, time: 0.235, data_time: 0.006, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 79.5828, loss_bbox: nan, loss: nan
2023-05-16 16:00:10,732 - mmfewshot - INFO - Iter [200/18000] lr: 2.000e-02, eta: 1:09:35, time: 0.208, data_time: 0.006, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 2.6863, loss_bbox: nan, loss: nan
2023-05-16 16:00:21,427 - mmfewshot - INFO - Iter [250/18000] lr: 2.000e-02, eta: 1:08:09, time: 0.214, data_time: 0.009, memory: 7041, loss_rpn_cls: nan, loss_rpn_bbox: nan, loss_cls: nan, acc: 1.0000, loss_bbox: nan, loss: nan
The text was updated successfully, but these errors were encountered: