BoostFormer

📰 Contributors

CV-16조 💡 비전길잡이 💡
NAVER Connect Foundation boostcamp AI Tech 4th

민기	박민지	유영준	장지훈	최동혁

📰 Links

📰 Objective

SegFormer : 임베디드 및 모바일 기기를 위한 Transformer 기반 Semantic Segmentation 모델 경량화
Model driven approach : 하이퍼파라미터 튜닝 등 고도화된 학습 기법 배제 · 순수 모델링을 통한 성능 향상
Pruning 및 quantization 등 compression 방법 배제 : 모델 블록 · 레이어 재설계 등 경량화 구조변경 진행

📰 Dataset

	Tiny-ImageNet	ADE20K

Purpose	Pre-training	Fine-tuning
Num_classes	200	150
Training set	100,000 images	20,210 images
Validation set	10,000 images	2,000 images

|-- ADEChallengeData2016
|   |-- image
|   |   |-- train
|   |   `-- val
|   `-- mask
|       |-- train
|       `-- val
`-- tiny-imagenet-200
    |-- train
    |-- val

📰 Base Model

Encoder	Decoder
Overlap Patch Embedding	MLP Layer (upsampling)
SegFormer Block	Concat
Efficient Self-Attention	Linear-Fuse
Mix-FFN	Classifier

📰 BoostFormer(Ours)

Encoder	Decoder
Poolin Patch Embedding	MLP Layer (upsampling)
PoolFormer Block	Weighed Sum
SegFormerV2 Block	Classifier
Custom Efficient Self-Attention	-
Mix-CFN	-

📰 Strategy

Segformer-B2와 custom model 성능 비교 및 Params와 Flops 측정 (util/get_flops_params.py)

📰 Method

1. Patch Embedding

NxN Conv를 Pooling + 1x1 Conv로 대체

2. Transformer Block

Token Mixer : MHSA 대신 Pooling으로 feature 추출
- $\hat {F_0}=\mathrm {LayerScale}(\mathrm {Pooling}(F_{in}))+F_{in}$
- $\hat {F_1}=\mathrm {LayerScale}(\mathrm {MixCFN}(\hat {F_0}))+\hat {F_0}$
기존 Self Output 모듈 삭제
- $\hat {F_0}=\mathrm {CSA}(F_{in})+F_{in}$
- $\hat {F_1}=\mathrm {MixCFN}(\hat {F_0})+\hat {F_0}$

3. Attention Layer

Pooling으로 K, V 차원 축소
- $K, V=\mathrm {Pooling}(F_C)$
1x1 Convolution 삭제
- $\mathrm {Attention}(Q,K,V)=\mathrm {Softmax}({{QK^T}\over {\sqrt {d_{head}}}}V)$

4. FFN

기존의 Linear(dense) embedding 연산을 1x1 Conv로 변경
- $\hat {F_C}=\mathrm {Conv}_{1 \times 1}(F_C)$
3x3 DWConv를 3x3과 5x5 DWConv로 channel-wise로 나누어 연산 후 Concat (Mix-CFN)
- $\hat {F_C}=\mathrm {Conv}_{1 \times 1}(\mathrm {Concat}(\hat {F_1},\hat {F_2}))$
Batch-Normalization 추가

5. Decode Head

Stage Features Upsample
Weighted Sum 적용

📰 Result

model	Params	Flops	Acc_val^(%)	mIoU_val^(%)
SegFormer-B2	27.462M	58.576G	66.48	29.84
BoostFormer (Ours)	17.575M (-36.00%)	15.826G (-72.98%)	72.28 (+8.72%)	34.29 (+14.91%)

기존 모델 대비 Params 36% 감소, FLOPs 72% 감소, mIoU 성능 14% 향상

📰 Qualitative results on ADE20K

📰 Mobile Inference Time Comparison

📰 NVIDIA Jetson Nano Time Comparision

⚙️ Installation

git clone https://github.com/boostcampaitech4lv23cv3/final-project-level3-cv-16.git

🧰 How to Use

Pretraining (tiny_imagenet)

bash dist_train.sh {사용하는 gpu 개수} \
    --data-path {tiny_imagenet path} \ # 이름에 tiny가 포함되어야함
    --output_dir {save dir path} \
    --batch-size {batch size per gpu } # default=128

# example
bash dist_train.sh 4 \
    --data-path /workspace/dataset/tiny_imagenet \
    --output_dir result/mod_segformer/ \
    --batch-size 64

ADE20K fine-tuning

# 현재 디렉토리: /final-project-level3-cv-16
python train.py \
    --data_dir {ADE20K의 path} \
    --device 0,1,2,3 \ # 환경에 맞게 수정 
    --save_path {save하고자 하는 dir의 path} \ 
    --pretrain {pretrain 모델 dir 혹은 .pth의 path} # .pth(pretrain의 output), dir(huggingface의 모델허브에서 제공하는 형태)
    --batch_size {batch size} # default=16

Evaluation 수행

# phase를 통해 val 또는 test set 설정
python eval.py \ # eval.py 내의 model을 정의하는 코드 수정
    --data_dir {ADE20K의 path} \
    --pretrain {pretrain 모델 dir의 path}

Params, FLOPs 확인

python util/get_flops_params.py \ # get_flops_params.py 내의 model을 정의하는 코드 수정
    --data_dir {ADE20K의 path}

📰 Directory Structure

|-- 🗂 appendix          : 발표자료 및 WrapUpReport
|-- 🗂 segformer         : HuggingFace 기반 segformer 모델 코드
|-- 🗂 boostformer       : Segformer 경량화 모델 코드
|-- 🗂 imagenet_pretrain : Tiny-ImageNet encoder 학습시 사용한 코드
|-- 🗂 util              : tools 코드 모음
|-- Dockerfile
|-- train.py             : ADE20K Finetuning 코드
|-- eval.py              : 모델 Inference 결과 출력 코드
|-- requirements.txt
`-- README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BoostFormer

📰 Contributors

📰 Links

📰 Objective

📰 Dataset

📰 Base Model

📰 BoostFormer(Ours)

📰 Strategy

📰 Method

1. Patch Embedding

2. Transformer Block

3. Attention Layer

4. FFN

5. Decode Head

📰 Result

📰 Qualitative results on ADE20K

📰 Mobile Inference Time Comparison

📰 NVIDIA Jetson Nano Time Comparision

⚙️ Installation

🧰 How to Use

Pretraining (tiny_imagenet)

ADE20K fine-tuning

Evaluation 수행

Params, FLOPs 확인

📰 Directory Structure

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
boostformer		boostformer
imagenet_pretrain		imagenet_pretrain
segformer		segformer
util		util
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

boostcampaitech4lv23cv3/final-project-level3-cv-16

Folders and files

Latest commit

History

Repository files navigation

BoostFormer

📰 Contributors

📰 Links

📰 Objective

📰 Dataset

📰 Base Model

📰 BoostFormer(Ours)

📰 Strategy

📰 Method

1. Patch Embedding

2. Transformer Block

3. Attention Layer

4. FFN

5. Decode Head

📰 Result

📰 Qualitative results on ADE20K

📰 Mobile Inference Time Comparison

📰 NVIDIA Jetson Nano Time Comparision

⚙️ Installation

🧰 How to Use

Pretraining (tiny_imagenet)

ADE20K fine-tuning

Evaluation 수행

Params, FLOPs 확인

📰 Directory Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages