-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training failed to converge #37
Comments
@omerbrandis I am not sure about the problem. can you reduce the learning rate and try it again. |
I'm now trying with solver.IMS_PER_BATCH = 2. it looks better now, has broken past the loss 1.56 barrier, training is still running and i'm not sure how far it will reach. |
with solver.IMS_PER_BATCH = 2, training progressed nicely up to but from there it did not make any significant progress. i have a checkpoint from it's test results are: which is a noticeable improvement. :-) is there a way to visualize inference ? Thanks, |
tried "fine tuning" using a lower LR, changed the base_lr in the config file and restarted training. Omer. |
hello,
I've just tried training on my own dataset 4 classes , 24 images.
but after 90k iterations loss has not progressed passed 1.56:
iter: 90000 loss: 1.5658 (1.5819) loss_cls: 0.0000 (0.0114) loss_reg: 0.9772 (0.9775) loss_centerness: 0.5881 (0.5927) loss_mask: 0.0000 (0.0003) time: 0.3531 (0.3745) data: 0.0145 (0.0164) lr: 0.000010 max mem: 2017
Evaluate annotation type bbox
DONE (t=0.94s).
Accumulating evaluation results...
DONE (t=0.03s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
(test dataset matches train dataset)
there seems to have been very little progress since iter 5000
iter: 5000 loss: 1.5704 (1.8018) loss_cls: 0.0025 (0.2015) loss_reg: 0.9779 (0.9775) loss_centerness: 0.5880 (0.6171) loss_mask: 0.0000 (0.0057) time: 0.3798 (0.3738) data: 0.0161 (0.0164) lr: 0.001000 max mem: 2017
I'm using a modified sipmask_R_50_FPN_1x.yaml configuration:
FCOS with improvements
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
RPN_ONLY: True
SIPMASK_ON: True
BACKBONE:
CONV_BODY: "R-50-FPN-RETINANET"
RESNETS:
BACKBONE_OUT_CHANNELS: 256
RETINANET:
USE_C5: False # FCOS uses P5 instead of C5x
NUM_CLASSES : 4
SIPMASK:
# normalizing the regression targets with FPN strides
NORM_REG_TARGETS: True
# positioning centerness on the regress branch.
# Please refer to tianzhi0549/FCOS#89 (comment)
CENTERNESS_ON_REG: True
# using center sampling and GIoU.
# Please refer to https://github.com/yqyao/FCOS_PLUS
CENTER_SAMPLING_RADIUS: 1.5
IOU_LOSS_TYPE: "giou"
NUM_CLASSES : 4
ROI_KEYPOINT_HEAD:
NUM_CLASSES : 4
ROI_KEYPOINT_HEAD:
NUM_CLASSES : 4
FCOS:
NUM_CLASSES : 4
DATASETS:
TRAIN: ("ffr2coco",)
TEST: ("ffr2coco",)
INPUT:
MIN_SIZE_TRAIN: (720,)
MAX_SIZE_TRAIN: 1280
MIN_SIZE_TEST: 720
MAX_SIZE_TEST: 1280
PIXEL_MEAN : [103, 103, 103]
PIXEL_STD: [66.0, 66.0, 66.0]
DATALOADER:
SIZE_DIVISIBILITY: 32
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0001
STEPS: (60000, 80000)
MAX_ITER: 90000
IMS_PER_BATCH: 1
WARMUP_METHOD: "constant"
WARMUP_ITERS : 0
CHECKPOINT_PERIOD : 1000
TEST:
IMS_PER_BATCH : 1
any ideas ?
Omer.
The text was updated successfully, but these errors were encountered: