title | booktitle | year | volume | series | month | publisher | url | software | openreview | abstract | layout | issn | id | tex_title | firstpage | lastpage | page | order | cycles | bibtex_editor | editor | bibtex_author | author | date | address | container-title | genre | issued | extras | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Adapting the Attention of Cloud-Based Recognition Model to Client-Side Images without Local Re-Training |
Proceedings of the 16th Asian Conference on Machine Learning |
2025 |
260 |
Proceedings of Machine Learning Research |
0 |
PMLR |
i5YqAtOGiD |
The mainstream workflow of image recognition applications is first training one global model on the cloud for a wide range of classes and then serving numerous clients. Images uploaded by each client typically come from a small subset of classes. From the cloud-client discrepancy on the range of image classes, the recognition model is desired to have strong adaptiveness, intuitively by focusing on each client’s local dynamic class subset, while incurring negligible overhead. In this work, we propose to plug a new intra-client and inter-image attention (ICIIA) module into existing backbone recognition models, requiring only one-time cloud-based training to be client-adaptive. In particular, given an image to be recognized from a certain client, ICIIA introduces multi-head self-attention to retrieve relevant images from the client’s local images, thereby calibrating the focus and the recognition result. We further identify the bottleneck of ICIIA’s overhead being in linear projection, propose to group and shuffle the features before the projections, and allow increasing the number of feature groups to dramatically improve efficiency without scarifying much accuracy. We extensively evaluate ICIIA and compare its performance against several baselines, demonstrating effectiveness and efficiency. Specifically, for a partitioned version of ImageNet-1K with the backbone models of MobileNetV3-L and Swin-B, ICIIA improves the classification accuracy to 83.37% (+8.11%) and 88.86% (+5.28%), while adding only 1.62% and 0.02% of FLOPs, respectively. Source code is available in the supplementary materials. |
inproceedings |
2640-3498 |
tan25a |
Adapting the Attention of Cloud-Based Recognition Model to Client-Side Images without Local Re-Training |
223 |
238 |
223-238 |
223 |
false |
Nguyen, Vu and Lin, Hsuan-Tien |
|
Tan, Yangwenjian and Yan, Yikai and Niu, Chaoyue |
|
2025-01-14 |
Proceedings of the 16th Asian Conference on Machine Learning |
inproceedings |
|
|