Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem in reproduce ICDAR2015 #24

Open
HotaekHan opened this issue Jul 28, 2023 · 0 comments
Open

problem in reproduce ICDAR2015 #24

HotaekHan opened this issue Jul 28, 2023 · 0 comments

Comments

@HotaekHan
Copy link

Hello, Thanks for your amazing work :)

I tried to reproduce ICDAR 2015 result from paper.
But I can't get the result from paper with pre-trained weights.

I'm not changing any code. download dataset and pre-trained weights.
train with pre-trained weight. but I got loss almost 30.0~
it looks like not converge.

below is my log.

[07/25 14:21:07] detectron2 INFO: Rank of current process: 0. World size: 8
[07/25 14:21:11] detectron2 INFO: Environment info:


sys.platform linux
Python 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0]
numpy 1.23.4
detectron2 0.6 @/usr/local/lib/python3.8/dist-packages/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.3
detectron2 arch flags 8.6
DETECTRON2_ENV_MODULE
PyTorch 1.12.1+cu113 @/usr/local/lib/python3.8/dist-packages/torch
PyTorch debug build False
torch._C._GLIBCXX_USE_CXX11_ABI False
GPU available Yes
GPU 0,1,2,3,4,5,6,7 Tesla T4 (arch=7.5)
Driver version 450.80.02
CUDA_HOME /usr/local/cuda
Pillow 9.2.0
torchvision 0.13.1+cu113 @/usr/local/lib/python3.8/dist-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.1.2


PyTorch built with:

  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.3.2 (built against CUDA 11.5)
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

[07/25 14:21:11] detectron2 INFO: Command line arguments: Namespace(config_file='configs/TESTR/ICDAR15/TESTR_R_50_Polygon.yaml', dist_url='tcp://127.0.0.1:59588', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False)
[07/25 14:21:11] detectron2 INFO: Contents of args.config_file=configs/TESTR/ICDAR15/TESTR_R_50_Polygon.yaml:
BASE: "Base-ICDAR15-Polygon.yaml"
MODEL:
WEIGHTS: "weights/TESTR/pretrain_testr_R_50_polygon.pth"
RESNETS:
DEPTH: 50
TRANSFORMER:
NUM_FEATURE_LEVELS: 4
INFERENCE_TH_TEST: 0.3
ENC_LAYERS: 6
DEC_LAYERS: 6
DIM_FEEDFORWARD: 1024
HIDDEN_DIM: 256
DROPOUT: 0.1
NHEADS: 8
NUM_QUERIES: 100
ENC_N_POINTS: 4
DEC_N_POINTS: 4
SOLVER:
IMS_PER_BATCH: 8
BASE_LR: 1e-5
LR_BACKBONE: 1e-6
WARMUP_ITERS: 0

STEPS: (200000,)

MAX_ITER: 200000
CHECKPOINT_PERIOD: 10000
TEST:
EVAL_PERIOD: 10000
OUTPUT_DIR: "output/TESTR/icdar15/TESTR_R_50_Polygon"

[07/25 14:21:11] detectron2 INFO: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: true
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:

  • icdar2015_test
    TRAIN:
  • icdar2015_train
    GLOBAL:
    HACK: 1.0
    INPUT:
    CROP:
    CROP_INSTANCE: false
    ENABLED: true
    SIZE:
    • 0.1
    • 0.1
      TYPE: relative_range
      FORMAT: RGB
      HFLIP_TRAIN: false
      MASK_FORMAT: polygon
      MAX_SIZE_TEST: 4000
      MAX_SIZE_TRAIN: 2333
      MIN_SIZE_TEST: 1440
      MIN_SIZE_TRAIN:
  • 800
  • 832
  • 864
  • 896
  • 1000
  • 1200
  • 1400
    MIN_SIZE_TRAIN_SAMPLING: choice
    RANDOM_FLIP: horizontal
    MODEL:
    ANCHOR_GENERATOR:
    ANGLES:
      • -90
      • 0
      • 90
        ASPECT_RATIOS:
      • 0.5
      • 1.0
      • 2.0
        NAME: DefaultAnchorGenerator
        OFFSET: 0.0
        SIZES:
      • 32
      • 64
      • 128
      • 256
      • 512
        BACKBONE:
        ANTI_ALIAS: false
        FREEZE_AT: 2
        NAME: build_resnet_backbone
        BASIS_MODULE:
        ANN_SET: coco
        COMMON_STRIDE: 8
        CONVS_DIM: 128
        IN_FEATURES:
    • p3
    • p4
    • p5
      LOSS_ON: false
      LOSS_WEIGHT: 0.3
      NAME: ProtoNet
      NORM: SyncBN
      NUM_BASES: 4
      NUM_CLASSES: 80
      NUM_CONVS: 3
      BATEXT:
      CANONICAL_SIZE: 96
      CONV_DIM: 256
      CUSTOM_DICT: ''
      IN_FEATURES:
    • p2
    • p3
    • p4
      NUM_CHARS: 25
      NUM_CONV: 2
      POOLER_RESOLUTION:
    • 8
    • 32
      POOLER_SCALES:
    • 0.25
    • 0.125
    • 0.0625
      RECOGNITION_LOSS: ctc
      RECOGNIZER: attn
      SAMPLING_RATIO: 1
      USE_AET: false
      USE_COORDCONV: false
      VOC_SIZE: 96
      BLENDMASK:
      ATTN_SIZE: 14
      BOTTOM_RESOLUTION: 56
      INSTANCE_LOSS_WEIGHT: 1.0
      POOLER_SAMPLING_RATIO: 1
      POOLER_SCALES:
    • 0.25
      POOLER_TYPE: ROIAlignV2
      TOP_INTERP: bilinear
      VISUALIZE: false
      BOXINST:
      BOTTOM_PIXELS_REMOVED: 10
      ENABLED: false
      PAIRWISE:
      COLOR_THRESH: 0.3
      DILATION: 2
      SIZE: 3
      WARMUP_ITERS: 10000
      BiFPN:
      IN_FEATURES:
    • res2
    • res3
    • res4
    • res5
      NORM: ''
      NUM_REPEATS: 6
      OUT_CHANNELS: 160
      CONDINST:
      BOTTOM_PIXELS_REMOVED: -1
      MASK_BRANCH:
      CHANNELS: 128
      IN_FEATURES:
      • p3
      • p4
      • p5
        NORM: BN
        NUM_CONVS: 4
        OUT_CHANNELS: 8
        SEMANTIC_LOSS_ON: false
        MASK_HEAD:
        CHANNELS: 8
        DISABLE_REL_COORDS: false
        NUM_LAYERS: 3
        USE_FP16: false
        MASK_OUT_STRIDE: 4
        MAX_PROPOSALS: -1
        TOPK_PROPOSALS_PER_IM: -1
        DEVICE: cuda
        DLA:
        CONV_BODY: DLA34
        NORM: FrozenBN
        OUT_FEATURES:
    • stage2
    • stage3
    • stage4
    • stage5
      FCOS:
      BOX_QUALITY: ctrness
      CENTER_SAMPLE: true
      FPN_STRIDES:
    • 8
    • 16
    • 32
    • 64
    • 128
      INFERENCE_TH_TEST: 0.05
      INFERENCE_TH_TRAIN: 0.05
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      LOC_LOSS_TYPE: giou
      LOSS_ALPHA: 0.25
      LOSS_GAMMA: 2.0
      LOSS_NORMALIZER_CLS: fg
      LOSS_WEIGHT_CLS: 1.0
      NMS_TH: 0.6
      NORM: GN
      NUM_BOX_CONVS: 4
      NUM_CLASSES: 80
      NUM_CLS_CONVS: 4
      NUM_SHARE_CONVS: 0
      POST_NMS_TOPK_TEST: 100
      POST_NMS_TOPK_TRAIN: 100
      POS_RADIUS: 1.5
      PRE_NMS_TOPK_TEST: 1000
      PRE_NMS_TOPK_TRAIN: 1000
      PRIOR_PROB: 0.01
      SIZES_OF_INTEREST:
    • 64
    • 128
    • 256
    • 512
      THRESH_WITH_CTR: false
      TOP_LEVELS: 2
      USE_DEFORMABLE: false
      USE_RELU: true
      USE_SCALE: true
      YIELD_BOX_FEATURES: false
      YIELD_PROPOSAL: false
      FPN:
      FUSE_TYPE: sum
      IN_FEATURES: []
      NORM: ''
      OUT_CHANNELS: 256
      KEYPOINT_ON: false
      LOAD_PROPOSALS: false
      MASK_ON: false
      MEInst:
      AGNOSTIC: true
      CENTER_SAMPLE: true
      DIM_MASK: 60
      FLAG_PARAMETERS: false
      FPN_STRIDES:
    • 8
    • 16
    • 32
    • 64
    • 128
      GCN_KERNEL_SIZE: 9
      INFERENCE_TH_TEST: 0.05
      INFERENCE_TH_TRAIN: 0.05
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      LAST_DEFORMABLE: false
      LOC_LOSS_TYPE: giou
      LOSS_ALPHA: 0.25
      LOSS_GAMMA: 2.0
      LOSS_ON_MASK: false
      MASK_LOSS_TYPE: mse
      MASK_ON: true
      MASK_SIZE: 28
      NMS_TH: 0.6
      NORM: GN
      NUM_BOX_CONVS: 4
      NUM_CLASSES: 80
      NUM_CLS_CONVS: 4
      NUM_MASK_CONVS: 4
      NUM_SHARE_CONVS: 0
      PATH_COMPONENTS: datasets/coco/components/coco_2017_train_class_agnosticTrue_whitenTrue_sigmoidTrue_60.npz
      POST_NMS_TOPK_TEST: 100
      POST_NMS_TOPK_TRAIN: 100
      POS_RADIUS: 1.5
      PRE_NMS_TOPK_TEST: 1000
      PRE_NMS_TOPK_TRAIN: 1000
      PRIOR_PROB: 0.01
      SIGMOID: true
      SIZES_OF_INTEREST:
    • 64
    • 128
    • 256
    • 512
      THRESH_WITH_CTR: false
      TOP_LEVELS: 2
      TYPE_DEFORMABLE: DCNv1
      USE_DEFORMABLE: false
      USE_GCN_IN_MASK: false
      USE_RELU: true
      USE_SCALE: true
      WHITEN: true
      META_ARCHITECTURE: TransformerDetector
      MOBILENET: false
      PANOPTIC_FPN:
      COMBINE:
      ENABLED: true
      INSTANCES_CONFIDENCE_THRESH: 0.5
      OVERLAP_THRESH: 0.5
      STUFF_AREA_LIMIT: 4096
      INSTANCE_LOSS_WEIGHT: 1.0
      PIXEL_MEAN:
  • 123.675
  • 116.28
  • 103.53
    PIXEL_STD:
  • 58.395
  • 57.12
  • 57.375
    PROPOSAL_GENERATOR:
    MIN_SIZE: 0
    NAME: RPN
    RESNETS:
    DEFORM_INTERVAL: 1
    DEFORM_MODULATED: false
    DEFORM_NUM_GROUPS: 1
    DEFORM_ON_PER_STAGE:
    • false
    • false
    • false
    • false
      DEPTH: 50
      NORM: FrozenBN
      NUM_GROUPS: 1
      OUT_FEATURES:
    • res3
    • res4
    • res5
      RES2_OUT_CHANNELS: 256
      RES5_DILATION: 1
      STEM_OUT_CHANNELS: 64
      STRIDE_IN_1X1: false
      WIDTH_PER_GROUP: 64
      RETINANET:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_WEIGHTS: &id002
    • 1.0
    • 1.0
    • 1.0
    • 1.0
      FOCAL_LOSS_ALPHA: 0.25
      FOCAL_LOSS_GAMMA: 2.0
      IN_FEATURES:
    • p3
    • p4
    • p5
    • p6
    • p7
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.4
    • 0.5
      NMS_THRESH_TEST: 0.5
      NORM: ''
      NUM_CLASSES: 80
      NUM_CONVS: 4
      PRIOR_PROB: 0.01
      SCORE_THRESH_TEST: 0.05
      SMOOTH_L1_LOSS_BETA: 0.1
      TOPK_CANDIDATES_TEST: 1000
      ROI_BOX_CASCADE_HEAD:
      BBOX_REG_WEIGHTS:
    • &id001
      • 10.0
      • 10.0
      • 5.0
      • 5.0
      • 20.0
      • 20.0
      • 10.0
      • 10.0
      • 30.0
      • 30.0
      • 15.0
      • 15.0
        IOUS:
    • 0.5
    • 0.6
    • 0.7
      ROI_BOX_HEAD:
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS: *id001
      CLS_AGNOSTIC_BBOX_REG: false
      CONV_DIM: 256
      FC_DIM: 1024
      FED_LOSS_FREQ_WEIGHT_POWER: 0.5
      FED_LOSS_NUM_CLASSES: 50
      NAME: ''
      NORM: ''
      NUM_CONV: 0
      NUM_FC: 0
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      SMOOTH_L1_BETA: 0.0
      TRAIN_ON_PRED_BOXES: false
      USE_FED_LOSS: false
      USE_SIGMOID_CE: false
      ROI_HEADS:
      BATCH_SIZE_PER_IMAGE: 512
      IN_FEATURES:
    • res4
      IOU_LABELS:
    • 0
    • 1
      IOU_THRESHOLDS:
    • 0.5
      NAME: Res5ROIHeads
      NMS_THRESH_TEST: 0.5
      NUM_CLASSES: 80
      POSITIVE_FRACTION: 0.25
      PROPOSAL_APPEND_GT: true
      SCORE_THRESH_TEST: 0.05
      ROI_KEYPOINT_HEAD:
      CONV_DIMS:
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
    • 512
      LOSS_WEIGHT: 1.0
      MIN_KEYPOINTS_PER_IMAGE: 1
      NAME: KRCNNConvDeconvUpsampleHead
      NORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: true
      NUM_KEYPOINTS: 17
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      ROI_MASK_HEAD:
      CLS_AGNOSTIC_MASK: false
      CONV_DIM: 256
      NAME: MaskRCNNConvUpsampleHead
      NORM: ''
      NUM_CONV: 0
      POOLER_RESOLUTION: 14
      POOLER_SAMPLING_RATIO: 0
      POOLER_TYPE: ROIAlignV2
      RPN:
      BATCH_SIZE_PER_IMAGE: 256
      BBOX_REG_LOSS_TYPE: smooth_l1
      BBOX_REG_LOSS_WEIGHT: 1.0
      BBOX_REG_WEIGHTS: *id002
      BOUNDARY_THRESH: -1
      CONV_DIMS:
    • -1
      HEAD_NAME: StandardRPNHead
      IN_FEATURES:
    • res4
      IOU_LABELS:
    • 0
    • -1
    • 1
      IOU_THRESHOLDS:
    • 0.3
    • 0.7
      LOSS_WEIGHT: 1.0
      NMS_THRESH: 0.7
      POSITIVE_FRACTION: 0.5
      POST_NMS_TOPK_TEST: 1000
      POST_NMS_TOPK_TRAIN: 2000
      PRE_NMS_TOPK_TEST: 6000
      PRE_NMS_TOPK_TRAIN: 12000
      SMOOTH_L1_BETA: 0.0
      SEM_SEG_HEAD:
      COMMON_STRIDE: 4
      CONVS_DIM: 128
      IGNORE_VALUE: 255
      IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      LOSS_WEIGHT: 1.0
      NAME: SemSegFPNHead
      NORM: GN
      NUM_CLASSES: 54
      SOLOV2:
      FPN_INSTANCE_STRIDES:
    • 8
    • 8
    • 16
    • 32
    • 32
      FPN_SCALE_RANGES:
      • 1
      • 96
      • 48
      • 192
      • 96
      • 384
      • 192
      • 768
      • 384
      • 2048
        INSTANCE_CHANNELS: 512
        INSTANCE_IN_CHANNELS: 256
        INSTANCE_IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
    • p6
      LOSS:
      DICE_WEIGHT: 3.0
      FOCAL_ALPHA: 0.25
      FOCAL_GAMMA: 2.0
      FOCAL_USE_SIGMOID: true
      FOCAL_WEIGHT: 1.0
      MASK_CHANNELS: 128
      MASK_IN_CHANNELS: 256
      MASK_IN_FEATURES:
    • p2
    • p3
    • p4
    • p5
      MASK_THR: 0.5
      MAX_PER_IMG: 100
      NMS_KERNEL: gaussian
      NMS_PRE: 500
      NMS_SIGMA: 2
      NMS_TYPE: matrix
      NORM: GN
      NUM_CLASSES: 80
      NUM_GRIDS:
    • 40
    • 36
    • 24
    • 16
    • 12
      NUM_INSTANCE_CONVS: 4
      NUM_KERNELS: 256
      NUM_MASKS: 256
      PRIOR_PROB: 0.01
      SCORE_THR: 0.1
      SIGMA: 0.2
      TYPE_DCN: DCN
      UPDATE_THR: 0.05
      USE_COORD_CONV: true
      USE_DCN_IN_INSTANCE: false
      TOP_MODULE:
      DIM: 16
      NAME: conv
      TRANSFORMER:
      AUX_LOSS: true
      DEC_LAYERS: 6
      DEC_N_POINTS: 4
      DIM_FEEDFORWARD: 1024
      DROPOUT: 0.1
      ENABLED: true
      ENC_LAYERS: 6
      ENC_N_POINTS: 4
      HIDDEN_DIM: 256
      INFERENCE_TH_TEST: 0.3
      LOSS:
      AUX_LOSS: true
      BOX_CLASS_WEIGHT: 2.0
      BOX_COORD_WEIGHT: 5.0
      BOX_GIOU_WEIGHT: 2.0
      FOCAL_ALPHA: 0.25
      FOCAL_GAMMA: 2.0
      POINT_CLASS_WEIGHT: 2.0
      POINT_COORD_WEIGHT: 5.0
      POINT_TEXT_WEIGHT: 4.0
      NHEADS: 8
      NUM_CHARS: 25
      NUM_CTRL_POINTS: 16
      NUM_FEATURE_LEVELS: 4
      NUM_QUERIES: 100
      POSITION_EMBEDDING_SCALE: 6.283185307179586
      USE_POLYGON: true
      VOC_SIZE: 96
      VOVNET:
      BACKBONE_OUT_CHANNELS: 256
      CONV_BODY: V-39-eSE
      NORM: FrozenBN
      OUT_CHANNELS: 256
      OUT_FEATURES:
    • stage2
    • stage3
    • stage4
    • stage5
      WEIGHTS: weights/TESTR/pretrain_testr_R_50_polygon.pth
      OUTPUT_DIR: output/TESTR/icdar15/TESTR_R_50_Polygon
      SEED: -1
      SOLVER:
      AMP:
      ENABLED: false
      BASE_LR: 1.0e-05
      BASE_LR_END: 0.0
      BIAS_LR_FACTOR: 1.0
      CHECKPOINT_PERIOD: 10000
      CLIP_GRADIENTS:
      CLIP_TYPE: full_model
      CLIP_VALUE: 0.1
      ENABLED: true
      NORM_TYPE: 2.0
      GAMMA: 0.1
      IMS_PER_BATCH: 8
      LR_BACKBONE: 1.0e-06
      LR_BACKBONE_NAMES:
  • backbone.0
    LR_LINEAR_PROJ_MULT: 0.1
    LR_LINEAR_PROJ_NAMES:
  • reference_points
  • sampling_offsets
    LR_SCHEDULER_NAME: WarmupMultiStepLR
    MAX_ITER: 200000
    MOMENTUM: 0.9
    NESTEROV: false
    NUM_DECAYS: 3
    OPTIMIZER: ADAMW
    REFERENCE_WORLD_SIZE: 0
    RESCALE_INTERVAL: false
    STEPS:
  • 30000
    WARMUP_FACTOR: 0.001
    WARMUP_ITERS: 0
    WARMUP_METHOD: linear
    WEIGHT_DECAY: 0.0001
    WEIGHT_DECAY_BIAS: null
    WEIGHT_DECAY_NORM: 0.0
    TEST:
    AUG:
    ENABLED: false
    FLIP: true
    MAX_SIZE: 4000
    MIN_SIZES:
    • 400
    • 500
    • 600
    • 700
    • 800
    • 900
    • 1000
    • 1100
    • 1200
      DETECTIONS_PER_IMAGE: 100
      EVAL_PERIOD: 10000
      EXPECTED_RESULTS: []
      KEYPOINT_OKS_SIGMAS: []
      LEXICON_TYPE: 3
      PRECISE_BN:
      ENABLED: false
      NUM_ITER: 200
      USE_LEXICON: true
      WEIGHTED_EDIT_DIST: true
      VERSION: 2
      VIS_PERIOD: 0

[07/25 14:21:11] detectron2 INFO: Full config saved to output/TESTR/icdar15/TESTR_R_50_Polygon/config.yaml
[07/25 14:21:11] d2.utils.env INFO: Using a generated random seed 11819301
[07/25 14:21:13] d2.engine.defaults INFO: Model:
TransformerDetector(
(testr): TESTR(
(backbone): Joiner(
(0): MaskedBackbone(
(backbone): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
)
)
(1): PositionalEncoding2D()
)
(text_pos_embed): PositionalEncoding1D()
(transformer): DeformableTransformer(
(encoder): DeformableTransformerEncoder(
(layers): ModuleList(
(0): DeformableTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.1, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(1): DeformableTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.1, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(2): DeformableTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.1, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(3): DeformableTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.1, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(4): DeformableTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.1, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(5): DeformableTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.1, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
(decoder): DeformableCompositeTransformerDecoder(
(layers): ModuleList(
(0): DeformableCompositeTransformerDecoderLayer(
(attn_cross): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross): Dropout(p=0.1, inplace=False)
(norm_cross): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra): Dropout(p=0.1, inplace=False)
(norm_intra): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter): Dropout(p=0.1, inplace=False)
(norm_inter): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout4): Dropout(p=0.1, inplace=False)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra_text): Dropout(p=0.1, inplace=False)
(norm_intra_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter_text): Dropout(p=0.1, inplace=False)
(norm_inter_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_cross_text): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross_text): Dropout(p=0.1, inplace=False)
(norm_cross_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1_text): Linear(in_features=256, out_features=1024, bias=True)
(dropout3_text): Dropout(p=0.1, inplace=False)
(linear2_text): Linear(in_features=1024, out_features=256, bias=True)
(dropout4_text): Dropout(p=0.1, inplace=False)
(norm3_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(1): DeformableCompositeTransformerDecoderLayer(
(attn_cross): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross): Dropout(p=0.1, inplace=False)
(norm_cross): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra): Dropout(p=0.1, inplace=False)
(norm_intra): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter): Dropout(p=0.1, inplace=False)
(norm_inter): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout4): Dropout(p=0.1, inplace=False)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra_text): Dropout(p=0.1, inplace=False)
(norm_intra_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter_text): Dropout(p=0.1, inplace=False)
(norm_inter_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_cross_text): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross_text): Dropout(p=0.1, inplace=False)
(norm_cross_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1_text): Linear(in_features=256, out_features=1024, bias=True)
(dropout3_text): Dropout(p=0.1, inplace=False)
(linear2_text): Linear(in_features=1024, out_features=256, bias=True)
(dropout4_text): Dropout(p=0.1, inplace=False)
(norm3_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(2): DeformableCompositeTransformerDecoderLayer(
(attn_cross): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross): Dropout(p=0.1, inplace=False)
(norm_cross): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra): Dropout(p=0.1, inplace=False)
(norm_intra): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter): Dropout(p=0.1, inplace=False)
(norm_inter): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout4): Dropout(p=0.1, inplace=False)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra_text): Dropout(p=0.1, inplace=False)
(norm_intra_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter_text): Dropout(p=0.1, inplace=False)
(norm_inter_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_cross_text): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross_text): Dropout(p=0.1, inplace=False)
(norm_cross_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1_text): Linear(in_features=256, out_features=1024, bias=True)
(dropout3_text): Dropout(p=0.1, inplace=False)
(linear2_text): Linear(in_features=1024, out_features=256, bias=True)
(dropout4_text): Dropout(p=0.1, inplace=False)
(norm3_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(3): DeformableCompositeTransformerDecoderLayer(
(attn_cross): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross): Dropout(p=0.1, inplace=False)
(norm_cross): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra): Dropout(p=0.1, inplace=False)
(norm_intra): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter): Dropout(p=0.1, inplace=False)
(norm_inter): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout4): Dropout(p=0.1, inplace=False)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra_text): Dropout(p=0.1, inplace=False)
(norm_intra_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter_text): Dropout(p=0.1, inplace=False)
(norm_inter_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_cross_text): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross_text): Dropout(p=0.1, inplace=False)
(norm_cross_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1_text): Linear(in_features=256, out_features=1024, bias=True)
(dropout3_text): Dropout(p=0.1, inplace=False)
(linear2_text): Linear(in_features=1024, out_features=256, bias=True)
(dropout4_text): Dropout(p=0.1, inplace=False)
(norm3_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(4): DeformableCompositeTransformerDecoderLayer(
(attn_cross): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross): Dropout(p=0.1, inplace=False)
(norm_cross): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra): Dropout(p=0.1, inplace=False)
(norm_intra): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter): Dropout(p=0.1, inplace=False)
(norm_inter): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout4): Dropout(p=0.1, inplace=False)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra_text): Dropout(p=0.1, inplace=False)
(norm_intra_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter_text): Dropout(p=0.1, inplace=False)
(norm_inter_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_cross_text): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross_text): Dropout(p=0.1, inplace=False)
(norm_cross_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1_text): Linear(in_features=256, out_features=1024, bias=True)
(dropout3_text): Dropout(p=0.1, inplace=False)
(linear2_text): Linear(in_features=1024, out_features=256, bias=True)
(dropout4_text): Dropout(p=0.1, inplace=False)
(norm3_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(5): DeformableCompositeTransformerDecoderLayer(
(attn_cross): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross): Dropout(p=0.1, inplace=False)
(norm_cross): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra): Dropout(p=0.1, inplace=False)
(norm_intra): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter): Dropout(p=0.1, inplace=False)
(norm_inter): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout3): Dropout(p=0.1, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout4): Dropout(p=0.1, inplace=False)
(norm3): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_intra_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_intra_text): Dropout(p=0.1, inplace=False)
(norm_intra_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_inter_text): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(dropout_inter_text): Dropout(p=0.1, inplace=False)
(norm_inter_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(attn_cross_text): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=256, bias=True)
(attention_weights): Linear(in_features=256, out_features=128, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout_cross_text): Dropout(p=0.1, inplace=False)
(norm_cross_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1_text): Linear(in_features=256, out_features=1024, bias=True)
(dropout3_text): Dropout(p=0.1, inplace=False)
(linear2_text): Linear(in_features=1024, out_features=256, bias=True)
(dropout4_text): Dropout(p=0.1, inplace=False)
(norm3_text): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
(enc_output): Linear(in_features=256, out_features=256, bias=True)
(enc_output_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(pos_trans): Linear(in_features=256, out_features=256, bias=True)
(pos_trans_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(bbox_class_embed): Linear(in_features=256, out_features=1, bias=True)
(bbox_embed): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
)
(ctrl_point_class): ModuleList(
(0): Linear(in_features=256, out_features=1, bias=True)
(1): Linear(in_features=256, out_features=1, bias=True)
(2): Linear(in_features=256, out_features=1, bias=True)
(3): Linear(in_features=256, out_features=1, bias=True)
(4): Linear(in_features=256, out_features=1, bias=True)
(5): Linear(in_features=256, out_features=1, bias=True)
)
(ctrl_point_coord): ModuleList(
(0): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=2, bias=True)
)
)
(1): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=2, bias=True)
)
)
(2): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=2, bias=True)
)
)
(3): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=2, bias=True)
)
)
(4): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=2, bias=True)
)
)
(5): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=2, bias=True)
)
)
)
(bbox_coord): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=4, bias=True)
)
)
(bbox_class): Linear(in_features=256, out_features=1, bias=True)
(text_class): Linear(in_features=256, out_features=97, bias=True)
(ctrl_point_embed): Embedding(16, 256)
(text_embed): Embedding(25, 256)
(input_proj): ModuleList(
(0): Sequential(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(1): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(2): Sequential(
(0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(3): Sequential(
(0): Conv2d(2048, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
)
)
)
(criterion): SetCriterion(
(enc_matcher): BoxHungarianMatcher()
(dec_matcher): CtrlPointHungarianMatcher()
)
)
[07/25 14:21:13] d2.data.dataset_mapper INFO: [DatasetMapper] Augmentations used in training: [RandomCrop(crop_type='relative_range', crop_size=[0.1, 0.1]), ResizeShortestEdge(short_edge_length=(800, 832, 864, 896, 1000, 1200, 1400), max_size=2333, sample_style='choice'), RandomFlip()]
[07/25 14:21:13] adet.data.dataset_mapper INFO: Rebuilding the augmentations. The previous augmentations will be overridden.
[07/25 14:21:13] adet.data.detection_utils INFO: Augmentations used in training: [ResizeShortestEdge(short_edge_length=(800, 832, 864, 896, 1000, 1200, 1400), max_size=2333, sample_style='choice')]
[07/25 14:21:13] adet.data.dataset_mapper INFO: Cropping used in training: RandomCropWithInstance(crop_type='relative_range', crop_size=[0.1, 0.1], crop_instance=False)
[07/25 14:21:13] adet.data.datasets.text INFO: Loaded 1000 images in COCO format from datasets/icdar2015/train_poly.json
[07/25 14:21:13] d2.data.build INFO: Removed 21 images with no usable annotations. 979 images left.
[07/25 14:21:13] d2.data.build INFO: Distribution of instances among all 1 categories:
�[36m| category | #instances |
|:----------:|:-------------|
| text | 4468 |
| | |�[0m
[07/25 14:21:13] d2.data.build INFO: Using training sampler TrainingSampler
[07/25 14:21:13] d2.data.common INFO: Serializing the dataset using: <class 'detectron2.data.common.TorchSerializedList'>
[07/25 14:21:13] d2.data.common INFO: Serializing 979 elements to byte tensors and concatenating them all ...
[07/25 14:21:13] d2.data.common INFO: Serialized dataset takes 1.64 MiB
[07/25 14:21:13] d2.checkpoint.detection_checkpoint INFO: [DetectionCheckpointer] Loading from weights/TESTR/pretrain_testr_R_50_polygon.pth ...
[07/25 14:21:13] fvcore.common.checkpoint INFO: [Checkpointer] Loading from weights/TESTR/pretrain_testr_R_50_polygon.pth ...
[07/25 14:21:14] adet.trainer INFO: Starting training from iteration 0
[07/25 17:20:06] d2.utils.events INFO: eta: 2 days, 13:01:22 iter: 9359 total_loss: 44.08 loss_ce: 0.783 loss_ctrl_points: 2.31 loss_texts: 3.764 loss_ce_0: 0.8143 loss_ctrl_points_0: 2.423 loss_texts_0: 3.801 loss_ce_1: 0.8142 loss_ctrl_points_1: 2.4 loss_texts_1: 3.759 loss_ce_2: 0.8032 loss_ctrl_points_2: 2.351 loss_texts_2: 3.756 loss_ce_3: 0.7866 loss_ctrl_points_3: 2.334 loss_texts_3: 3.758 loss_ce_4: 0.7786 loss_ctrl_points_4: 2.311 loss_texts_4: 3.77 loss_ce_enc: 0.8066 loss_bbox_enc: 0.3008 loss_giou_enc: 0.7569 time: 1.1431 last_time: 0.8115 data_time: 0.0088 last_data_time: 0.0066 lr: 1e-05 max_mem: 12183M
[07/25 17:20:28] d2.utils.events INFO: eta: 2 days, 13:02:11 iter: 9379 total_loss: 42.63 loss_ce: 0.7653 loss_ctrl_points: 2.407 loss_texts: 3.758 loss_ce_0: 0.8062 loss_ctrl_points_0: 2.635 loss_texts_0: 3.792 loss_ce_1: 0.7863 loss_ctrl_points_1: 2.568 loss_texts_1: 3.736 loss_ce_2: 0.7788 loss_ctrl_points_2: 2.537 loss_texts_2: 3.737 loss_ce_3: 0.77 loss_ctrl_points_3: 2.508 loss_texts_3: 3.748 loss_ce_4: 0.7641 loss_ctrl_points_4: 2.456 loss_texts_4: 3.748 loss_ce_enc: 0.7962 loss_bbox_enc: 0.2918 loss_giou_enc: 0.73 time: 1.1431 last_time: 0.9134 data_time: 0.0084 last_data_time: 0.0075 lr: 1e-05 max_mem: 12183M
[07/25 17:20:51] d2.utils.events INFO: eta: 2 days, 13:05:45 iter: 9399 total_loss: 44.09 loss_ce: 0.7944 loss_ctrl_points: 2.32 loss_texts: 3.633 loss_ce_0: 0.8154 loss_ctrl_points_0: 2.634 loss_texts_0: 3.668 loss_ce_1: 0.802 loss_ctrl_points_1: 2.506 loss_texts_1: 3.633 loss_ce_2: 0.8023 loss_ctrl_points_2: 2.369 loss_texts_2: 3.626 loss_ce_3: 0.7987 loss_ctrl_points_3: 2.281 loss_texts_3: 3.624 loss_ce_4: 0.7966 loss_ctrl_points_4: 2.309 loss_texts_4: 3.62 loss_ce_enc: 0.8003 loss_bbox_enc: 0.2937 loss_giou_enc: 0.7454 time: 1.1431 last_time: 1.1894 data_time: 0.0081 last_data_time: 0.0227 lr: 1e-05 max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant