title | booktitle | year | volume | series | month | publisher | url | openreview | abstract | layout | issn | id | tex_title | firstpage | lastpage | page | order | cycles | bibtex_editor | editor | bibtex_author | author | date | address | container-title | genre | issued | extras | |||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining |
Proceedings of the 16th Asian Conference on Machine Learning |
2025 |
260 |
Proceedings of Machine Learning Research |
0 |
PMLR |
RDa1Uj27U9 |
Small object detection (SOD) aims to precisely localize and accurately classify objects from limited spatial extent and discernible features. Despite significant advancements in object detection driven by CNN-based and Transformer-based methods, SOD remains a significant challenge. This is primarily due to their minimal spatial dimensions and distinct features which pose difficulties in both computational efficiency and effective supervision. Particularly, Transformer-based detectors suffer from the high computational cost caused by the introduction of a feature pyramid network (FPN) and the sparse supervision for the encoder output due to insufficient positive queries. Current approaches attempt to mitigate these issues through sparse attention mechanisms and auxiliary one-to-many label assignment strategies. However, these approaches often still suffer from inefficiencies in processing multi-scale information and a deficiency in generating adequate positive queries for small objects. To address this issue, we propose a novel small object detector MRQM, which integrates Multi-scale Refinement and Query-aided Mining. The scale-aware encoder strategically refines features across multiple scales from a bi-directional feature pyramid network (BiFPN) through iterative updates. This process not only reduces redundant computations but also significantly enhances the representation of features at various scales. Furthermore, the IoU-aware head integrates the dynamic anchors mining strategy and one-to-many label assignments to fully mine potential high-quality auxiliary positive queries for small instances, and mitigate issues related to sparse supervision for the encoder. Extensive experiments on the SODA-D and VisDrone datasets consistently demonstrate the superiority and effectiveness of our MRQM method. |
inproceedings |
2640-3498 |
fu25a |
Enhancing DETRs for Small Object Detection via Multi-Scale Refinement and Query-Aided Mining |
936 |
951 |
936-951 |
936 |
false |
Nguyen, Vu and Lin, Hsuan-Tien |
|
Fu, Sisi and Chen, Zhiming and Fang, Xiaocheng and Cai, Jieyi and Liu, Huanyu and Wen, Huosheng and Chen, Bingzhi |
|
2025-01-14 |
Proceedings of the 16th Asian Conference on Machine Learning |
inproceedings |
|