Skip to content

대한전자공학회 2024 추계학술대회 학부생 논문경진대회 포스터세션 발표 논문

Notifications You must be signed in to change notification settings

deepdaiv-multimodal/24su-DM-CLIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DM-CLIP

DM-CLIP: Knowledge Distillation Transformer to Mamba for efficient CLIP

Abstract

This study addresses the challenge of deploying Contrastive Learning-based CLIP models, which learn the relationship between images and text, in resource-constrained environments due to their high computational complexity and large model size. To overcome this, we propose an approach that enhances the performance of Mamba-based image encoders by applying Knowledge Distillation from Transformer-based ViT models. Experimental results show that the Mamba-based encoder reduces image encoder latency by 49.58% and overall model latency by 40.82%, with only a 0.12% performance loss. Additionally, it demonstrates 6.6% and 19.4% improvements on the SVHN and EuroSAT datasets, respectively, showcasing strengths in sequential pattern processing and high-resolution spatial information learning. This study validates that the lightweight CLIP encoder can be effectively utilized in mobile and edge device environments and suggests future research directions for developing Mamba-based text encoders and enhancing knowledge distillation techniques.

setup

conda create -n clipenv python=3.10
conda activate clipenv
pip install -r requirements.txt

git clone https://github.com/NVlabs/MambaVision.git
cd MambaVision
pip install -e .
cd ..
bash download_imagenet.sh

run

bash run_datacompdr12m.sh

About

대한전자공학회 2024 추계학술대회 학부생 논문경진대회 포스터세션 발표 논문

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published