MM Grounding Dino inference result of DetInferencer() is much worse than tools/test.py #12276

shojint · 2024-12-23T06:45:29Z

Describe the issue
I have trained a checkpoint of mm_grounding_dino_swin-b. When I do inference using the testing dataset using tools/test.py, the accuracy is satisfactory. However, doing inference using the same image with DetInferencer() resulted in a much worse result.

The output of tools/test.py

The output of Detinferecer()

Reproduction
In an environment with mmdetection installed, run the following code.
I'm sorry I can't share my checkpoint. You would need to use your own checkpoint.

from mmdet.apis import DetInferencer

inferencer = DetInferencer(model='/home/vgpu/mmdetection/configs/mm_grounding_dino/grounding_dino_swin-b_finetune_taco_map.py', 
                           weights='/home/vgpu/mmdetection/taco_work_dir/best_coco_bbox_mAP_epoch_14_20241024_151401.pth')

local_image_path = "/home/vgpu/TACO/data/batch_1/000045.jpg"
TEXT_PROMPT = 'can'
inferencer(local_image_path, show=True, texts=TEXT_PROMPT, custom_entities=True)

Config file

_base_ = 'grounding_dino_swin-b_pretrain_obj365_goldg_v3det.py'

data_root = '/home/vgpu/TACO/data/'
class_name = ('Aluminium foil', 'Battery', 'Aluminium blister pack', 'Glass bottle',
              'Plastic bottle cap', 'Metal bottle cap', 'Broken glass', 'Food waste',
              'Plastic lid', 'Metal lid', 'Other plastic', 'Plastic film', 'Plastic utensils',
              'Pop tab', 'Rope & strings', 'Squeezable tube', 'Styrofoam piece',
              'Unlabeled litter', 'Cigarette', 'Can', 'Plastic bottle', 'Carton', 'Cup', 'Paper',
              'Wrapper', 'Plastic container', 'Straw')
num_classes = len(class_name)
metainfo = dict(
    classes=class_name,
    # palette=[(235, 211, 70),
    #          (106, 90, 205),
    #          (160, 32, 240),
    #          (176, 23, 31),
    #          (142, 0, 0),
    #          (230, 0, 0),
    #          (106, 0, 228),
    #          (60, 100, 0)]
)

model = dict(bbox_head=dict(num_classes=num_classes))

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RandomFlip', prob=0.5),
    dict(
        type='RandomChoice',
        transforms=[
            [
                dict(
                    type='RandomChoiceResize',
                    scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                            (736, 1333), (768, 1333), (800, 1333)],
                    keep_ratio=True)
            ],
            [
                dict(
                    type='RandomChoiceResize',
                    # The radio of all image in train dataset < 7
                    # follow the original implement
                    scales=[(400, 4200), (500, 4200), (600, 4200)],
                    keep_ratio=True),
                dict(
                    type='RandomCrop',
                    crop_type='absolute_range',
                    crop_size=(384, 600),
                    allow_negative_crop=True),
                dict(
                    type='RandomChoiceResize',
                    scales=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
                            (736, 1333), (768, 1333), (800, 1333)],
                    keep_ratio=True)
            ]
        ]),
    dict(
        type='PackDetInputs',
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor', 'flip', 'flip_direction', 'text',
                   'custom_entities'))
]

train_dataloader = dict(
    dataset=dict(
        _delete_=True,
        type='CocoDataset',
        data_root=data_root,
        metainfo=metainfo,
        return_classes=True,
        pipeline=train_pipeline,
        filter_cfg=dict(filter_empty_gt=False, min_size=32),
        ann_file='mapped_annotations/annotations_0_train.json',
        data_prefix=dict(img='.')))

val_dataloader = dict(
    dataset=dict(
        metainfo=metainfo,
        data_root=data_root,
        ann_file='mapped_annotations/annotations_0_val.json',
        data_prefix=dict(img='.')))

test_dataloader = val_dataloader

val_evaluator = dict(
    ann_file=data_root + 'mapped_annotations/annotations_0_val.json', classwise=True)
test_evaluator = val_evaluator

max_epochs = 100

default_hooks = dict(
    checkpoint=dict(interval=1, max_keep_ckpts=1, save_best='coco/bbox_mAP'),
    logger=dict(type='LoggerHook', interval=5))
train_cfg = dict(max_epochs=max_epochs, val_interval=1)

param_scheduler = [
    dict(
        type='MultiStepLR',
        begin=0,
        end=max_epochs,
        by_epoch=True,
        milestones=[15],
        gamma=0.1)
]

optim_wrapper = dict(
    optimizer=dict(lr=0.0001),
    paramwise_cfg=dict(
        custom_keys={
            'absolute_pos_embed': dict(decay_mult=0.),
            'backbone': dict(lr_mult=0.1), #Should I change this?
            'language_model': dict(lr_mult=0.0)
        }))

load_from = 'https://download.openmmlab.com/mmdetection/v3.0/mm_grounding_dino/grounding_dino_swin-b_pretrain_all/grounding_dino_swin-b_pretrain_all-f9818a7c.pth'  # noqa

Environment
sys.platform: linux
Python: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1: NVIDIA GeForce RTX 3090 Ti
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 12.4, V12.4.99
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.4.1
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v3.4.2 (Git Hash 1137e04ec0b5251ca2b4400a4fd3c667ce843d67)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 12.1
NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 90.1 (built against CUDA 12.4)
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.4.1, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.19.1
OpenCV: 4.10.0
MMEngine: 0.10.5
MMDetection: 3.3.0+cfd5d3a

Output

/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/mmengine/optim/optimizer/zero_optimizer.py:11: DeprecationWarning: `TorchScript` support for functional optimizers is deprecated and will be removed in a future PyTorch release. Consider using the `torch.compile` optimizer instead.
  from torch.distributed.optim import \
Loads checkpoint by local backend from path: /home/vgpu/mmdetection/taco_work_dir/best_coco_bbox_mAP_epoch_14_20241024_151401.pth
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/mmengine/runner/checkpoint.py:347: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  checkpoint = torch.load(filename, map_location=map_location)
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/huggingface_hub/file_download.py:1142: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
12/23 14:36:18 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the `save_dir` argument.
  warnings.warn(f'Failed to add {vis_backend.__class__}, '
/home/vgpu/miniconda3/envs/gdsam/lib/python3.11/site-packages/torch/functional.py:513: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at 
/opt/conda/conda-bld/pytorch_1724789172399/work/aten/src/ATen/native/TensorShape.cpp:3609.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/vgpu/mmcv/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention.
  warnings.warn(f'position encoding of key is'
Inference ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

The text was updated successfully, but these errors were encountered:

mm-assistant bot assigned Czm369 Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MM Grounding Dino inference result of DetInferencer() is much worse than tools/test.py #12276

MM Grounding Dino inference result of DetInferencer() is much worse than tools/test.py #12276

shojint commented Dec 23, 2024 •

edited

Loading

MM Grounding Dino inference result of DetInferencer() is much worse than tools/test.py #12276

MM Grounding Dino inference result of DetInferencer() is much worse than tools/test.py #12276

Comments

shojint commented Dec 23, 2024 • edited Loading

shojint commented Dec 23, 2024 •

edited

Loading