Skip to content

Latest commit

 

History

History
64 lines (64 loc) · 2.52 KB

2025-01-14-wang25d.md

File metadata and controls

64 lines (64 loc) · 2.52 KB
title booktitle year volume series month publisher pdf url openreview abstract layout issn id tex_title firstpage lastpage page order cycles bibtex_editor editor bibtex_author author date address container-title genre issued extras
Prompting vision-language fusion for Zero-Shot Composed Image Retrieval
Proceedings of the 16th Asian Conference on Machine Learning
2025
260
Proceedings of Machine Learning Research
0
PMLR
8NM3qmWjkt
The composed image retrieval (CIR) aims to retrieve target image given the combination of an image and a textual description as a query. Recently, benefiting from vision-language pretrained (VLP) models and large language models (LLM), the use of textual inversion or generating large-scale datasets has become a novel approach for zero-shot CIR task (ZS-CIR). However, the existing ZS-CIR models overlook one case where the textual description is often too brief or inherently inaccurate, making it challenging to effectively integrate the reference image into the query for retrieving the target image. To address this problem, we propose a simple yet effective method—prompting vision-language fusion (PVLF), which adapts representations in VLP models to dynamically fuse the vision and language (V&L) representation spaces. In addition, by injecting the context learnable prompt tokens in Transformer fusion encoder, the PVLF promotes the comprehensive coupling between V&L modalities, enriching the semantic representation of the query. We evaluate the effectiveness and robustness of our method on various VLP backbones, and the experimental results show that the proposed PVLF outperforms previous methods and achieves the state-of-the-art on two public ZS-CIR benchmarks (CIRR and FashionIQ).
inproceedings
2640-3498
wang25d
Prompting vision-language fusion for Zero-Shot Composed Image Retrieval
671
686
671-686
671
false
Nguyen, Vu and Lin, Hsuan-Tien
given family
Vu
Nguyen
given family
Hsuan-Tien
Lin
Wang, Peng and Chen, Zining and Zhao, Zhicheng and Su, Fei
given family
Peng
Wang
given family
Zining
Chen
given family
Zhicheng
Zhao
given family
Fei
Su
2025-01-14
Proceedings of the 16th Asian Conference on Machine Learning
inproceedings
date-parts
2025
1
14