🎮 Awesome Remote Sensing Image-Text Retrieval | Remote Sensing Cross-model Retrieval | Remote Sensing Vision-Lanuage Models
A benchmark and awesome collection of papers on Remote Sensing Image-Text Retrieval (RSITR) | Remote Sensing Cross-model Retrieval (RSCMR) from the Internet, if there are any omissions, please contact me [email protected]. 🤝 If you want to join Remote Sensing Vision-Language Models (RSVLMs), you can click Slack Group.
Record the major news of RSVLMs community.
- 2023/12/20:: SkyScript-a comprehensive vision-language dataset for remote sensing images covering 29K distinct semantic tags (AAAI 2024) [link].
- 2023/11/24: GeoChat: Grounded Large Vision-Language Model for Remote Sensing [link].
- 2023/06/20: 5M+ image-text pairs datasets RS5M for remote sensing released [link].
- 2023/06/19: The first vision-language foundation model for remote sensing RemoteCLIP proposed [link].
Collect the more popular image-text pairs datasets on remote sensing, and welcome contact for additions if there are more.
Dataset Name | Image size | Image Resolution | VLMs |
---|---|---|---|
UCM-Captions | 613 | 256 × 256 | - |
Sydney-Captions | 2,100 | 500 × 500 | - |
RSICD | 10,921 | 224 × 224 | - |
RSITMD | 4,743 | 256 × 256 | - |
NWPU-Captions | 31,500 | 256 × 256 | - |
RS5M | 5 million+ | All Resolutions | GeoRSCLIP |
SkyScript | 5.2 million+ | All Resolutions | SkyCLIP |
Welcome to add more RSITR | RSCMR methods.
📌 Cross-Modal Retrieval on RSICD:
https://paperswithcode.com/sota/cross-modal-retrieval-on-rsicd
📌 Cross-Modal Retrieval on RSITMD:
https://paperswithcode.com/sota/cross-modal-retrieval-on-rsitmd
Closed-Domain Method: Training and testing on a single dataset.
Open-Domain Method: Using extra datasets for pre-training to gain more inter-domain knowledge.
Hashing Method: Efficient retrieval on large-scale datasets becomes feasible.
-
[AAAI 2024] | SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing | [paper] [github]
- Zhecheng Wang, Rajanie Prabha, Tianyuan Huang, Jiajun Wu, Ram Rajagopal
-
[ArXiv 2023] | RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | [paper] [github]
- Fan Liu, Delong Chen, Zhan-Rong Guan, Xiaocong Zhou, Jiale Zhu, Jun Zhou
-
[ArXiv 2023] | RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model | [paper] [github]
- Zilun Zhang, Tiancheng Zhao, Yulong Guo, Jianwei Yin.
-
[ArXiv 2023] | RSGPT: A Remote Sensing Vision Language Model and Benchmark | [paper]
- Yuan Hu, Jianlong Yuan, Congcong Wen, Xiaonan Lu, Xiang Li.
-
[TGRS 2023] | Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval | [paper]
- Yuan Yuan, Yangfan Zhan, Zhitong Xiong.
-
[ACMMM 2023] | A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | [paper] [github]
- Jiancheng Pan, Qing Ma, Cong Bai.
-
[ArXiv 2023] | Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval | [paper]
- Jiancheng Pan, Qing Ma, Cong Bai.
-
[Sensors 2023] | A Fine-Grained Semantic Alignment Method Specific to Aggregate Multi-Scale Information for Cross-Modal Remote Sensing Image Retrieval | [paper]
- Fuzhong Zheng, Xu Wang, Luyao Wang, Xiong Zhang, Hongze Zhu, Long Wang, Haisu Zhang.
-
[Remote Sensing 2023] | A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text–Image Retrieval in Remote Sensing | [paper]
- Xiong Zhang, Weipeng Li, Xu Wang, Luyao Wang, Fuzhong Zheng, Long Wang, Haisu Zhang.
-
[IGARSS 2023] | A Texture and Saliency Enhanced Image Learning Method For Cross-Modal Remote Sensing Image-Text Retrieval | [paper]
- Rui Yang, Di Zhang, Yanhe Guo, Shuang Wang.
-
[IGARSS 2023] | A Fast and Accurate Method for Remote Sensing Image-Text Retrieval Based On Large Model Knowledge Distillation | [paper]
- Yu Liao, Rui Yang, Tao Xie, Hantong Xing, Dou Quan, Shuang Wang, B. Hou.
-
[TGRS 2023] | Knowledge-Aided Momentum Contrastive Learning for Remote-Sensing Image Text Retrieval | [paper]
- Zhong Ji, Changxu Meng, Yan Zhang, Yanwei Pang, Xuelong Li.
-
[Mathematics 2023] | An End-to-End Framework Based on Vision-Language Fusion for Remote Sensing Cross-Modal Text-Image Retrieval | [paper]
- Liu He, Shuyan Liu, Ran An, Yudong Zhuo, Jian Tao.
-
[TGRS 2023] Hypersphere-based Remote Sensing Cross-Modal Text-Image Retrieval via Curriculum Learning | [paper]
- Weihang Zhang, Jihao Li, Shuoke Li, Jialiang Chen, Wenkai Zhang, Xin Gao, Xian Sun.
-
[TGRS 2023] | Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval | [paper]
- Xu Tang, Yijing Wang, Jingjing Ma, Xiangrong Zhang, F. Liu, Licheng Jiao.
-
[ICMR 2023] | Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval | [paper] [github]
- Jiancheng Pan, Qing Ma, Cong Bai.
-
[CDCEO 2022] | Knowledge-Aware Cross-Modal Text-Image Retrieval for Remote Sensing Images | [paper]
- Li Mi, Siran Li, Christel Chappuis, D. Tuia.
-
[IGARSS 2022] | A transformer-based cross-modal image-text retrieval method using feature decoupling and reconstruction | [paper]
- Huan Zhang, Yingzhi Sun, Yu Liao, Siyuan Xu, R. Yang, Shuang Wang, B. Hou, Licheng Jiao.
-
[INT J APPL EARTH OBS 2022] | MCRN: A Multi-source Cross-modal Retrieval Network for remote sensing | [paper]
- Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Yongqiang Mao, Ruixue Zhou, Hongqi Wang, K. Fu, Xian Sun.
-
[JSTARS 2022] | Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval | [paper]
- Mohamad Mahmoud Al Rahhal, Y. Bazi, Norah A. Alsharif, Laila Bashmal, N. Alajlan, F. Melgani.
-
[Applied Sciences 2022] | Contrasting Dual Transformer Architectures for Multi-Modal Remote Sensing Image Retrieval | [paper]
- Mohamad Mahmoud Al Rahhal, M. Bencherif, Y. Bazi, Abdullah Alharbi, M. L. Mekhalfi.
-
[TGRS 2022] | Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information | [paper] [github]
- Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Xuee Rong, Zhengyuan Zhang, Hongqi Wang, K. Fu, Xian Sun.
-
[TGRS 2021] | A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing | [paper]
- Zhiqiang Yuan, Wenkai Zhang, Xuee Rong, Xuan Li, Jialiang Chen, Hongqi Wang, K. Fu, Xian Sun.
-
[TGRS 2021] | Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval | [paper] [github]
- Zhiqiang Yuan, Wenkai Zhang, K. Fu, Xuan Li, Chubo Deng, Hongqi Wang, Xian Sun.
-
[JSTARS 2021] | A Deep Semantic Alignment Network for the Cross-Modal Image-Text Retrieval in Remote Sensing | [paper]
- Qimin Cheng, Yuzhuo Zhou, Peng Fu, Yuan Xu, Liang Zhang.
-
[LGRS 2021] | Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval | [paper]
- Yafei Lv, Wei Xiong, Xiaohan Zhang, Yaqi Cui.
-
[Remote Sensing 2020] | TextRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images | [paper]
- T. M. Ali, Y. Bazi, Mohamad Mahmoud Al Rahhal, M. L. Mekhalfi, Lalitha Rangarajan, M. Zuair.
-
[JSTARS 2022] | Remote Sensing Cross-Modal Retrieval by Deep Image-Voice Hashing | [paper]
- Yichao Zhang, Xiangtao Zheng, Xiaoqiang Lu.
-
[ArXiv 2022] | Deep Unsupervised Contrastive Hashing for Large-Scale Cross-Modal Text-Image Retrieval in Remote Sensing | [paper]
- Georgii Mikriukov, Mahdyar Ravanbakhsh, Begüm Demir.
-
[ICIP 2022] | An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing | [paper]
- Georgii Mikriukov, Mahdyar Ravanbakhsh, Begüm Demir.