Foundation Model for Endoscopy Video Analysis

This repository provides the official PyTorch implementation of the paper Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train by Zhao Wang*, Chang Liu*, Shaoting Zhang†, and Qi Dou†.

Key Features

First foundation model for endoscopy video analysis.
A large-scale endoscopic video dataset with over 33K video clips.
Support 3 types of downstream tasks, including classification, segmentation, and detection.

Links

Details

Foundation models have exhibited remarkable success in various applications, such as disease diagnosis and text report generation. To date, a foundation model for endoscopic video analysis is still lacking. In this paper, we propose Endo-FM, a foundation model specifically developed using massive endoscopic video data. First, we build a video transformer, which captures both local and global long-range dependencies across spatial and temporal dimensions. Second, we pre-train our transformer model using global and local views via a self-supervised manner, aiming to make it robust to spatial-temporal variations and discriminative across different scenes. To develop the foundation model, we construct a large-scale endoscopy video dataset by combining 9 publicly available datasets and a privately collected dataset from Baoshan Branch of Renji Hospital in Shanghai, China. Our dataset overall consists of over 33K video clips with up to 5 million frames, encompassing various protocols, target organs, and disease types. Our pre-trained Endo-FM can be easily adopted for a given downtream task via fine-tuning by serving as the backbone. With experiments on 3 different types of downstream tasks, including classification, segmentation, and detection, our Endo-FM surpasses the current state-of-the-art self-supervised pre-training and adapter-based transfer learning methods by a significant margin.

Datasets

We utilize 6 public and 1 private datasets for pre-training and 3 datasets as the downstream tasks. Except for SUN & SUN-SEG, we provide our preprocessed data for pre-training and downstream tasks.

Pre-training Data (6 public + 1 private)

Colonoscopic [original paper] [original dataset] [our preprocessed dataset]
SUN & SUN-SEG [original paper1] [original paper2] [original dataset1] [original dataset2]
LPPolypVideo [original paper] [original dataset] [our preprocessed dataset]
Hyper-Kvasir [original paper] [original dataset] [our preprocessed dataset]
Kvasir-Capsule [original paper] [original dataset] [our preprocessed dataset]
CholecTriplet [original paper] [original dataset] [our preprocessed dataset]
Our Private [our preprocessed dataset]

Downstream Data (3 public)

PolypDiag [original paper] [original dataset] [our preprocessed dataset]
CVC-12k [original paper] [original dataset] [our preprocessed dataset]
KUMC [original paper] [original dataset] [our preprocessed dataset]

For SUN & SUN-SEG, you need first request the original videos following this instruction. Then, you can transfer the data for pre-training videos by the following:

cd Endo-FM/data
python sun.py
python sun_seg.py
python trans_videos_pretrain.py

Finally, generating the video list pretrain/train.csv for pre-training by the following:

cd Endo-FM/data
python gencsv.py

Get Started

Main Requirements

torch==1.8.0
torchvision==0.9.0
pillow==6.2.2
timm==0.4.12

Installation

We suggest using Anaconda to setup environment on Linux, if you have installed anaconda, you can skip this step.

wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh && zsh Anaconda3-2020.11-Linux-x86_64.sh

Then, we can install packages using provided environment.yaml.

cd Endo-FM
conda env create -f environment.yaml
conda activate endofm

Pre-trained Weights

You can directly download our pre-trained Endo-FM via this link and put it under checkpoints/ for downstream fine-tuning.

Downstream Fine-tuned Weights

Also, we provide the pre-trained weights of 3 downstream tasks for direct downstream testing.

Dataset	PolypDiag	CVC-12k	KUMC
Our Paper	90.7	73.9	84.1
Released Model	91.5	76.6	84.0
Weights	link	link	link

Pre-training

cd Endo-FM
wget -P checkpoints/ https://github.com/kahnchana/svt/releases/download/v1.0/kinetics400_vitb_ssl.pth
bash scripts/train_clips32k.sh

Downstream Fine-tuning

# PolypDiag (Classification)
cd Endo-FM
bash scripts/eval_finetune_polypdiag.sh

# CVC (Segmentation)
cd Endo-FM/TransUNet
python train.py

# KUMC (Detection)
cd Endo-FM/STMT
python setup.py build develop
python -m torch.distributed.launch \
    --nproc_per_node=1 \
    tools/train_net.py \
    --master_port=$((RANDOM + 10000)) \
    --config-file configs/STFT/kumc_R_50_STFT.yaml \
    OUTPUT_DIR log_dir/kumc_finetune

Direct Downstream Testing

# PolypDiag (Classification)
cd Endo-FM
bash scripts/test_finetune_polypdiag.sh

# CVC (Segmentation)
cd Endo-FM/TransUNet
python train.py --test

# KUMC (Detection)
cd Endo-FM/STMT
python setup.py build develop
python -m torch.distributed.launch \
    --nproc_per_node=1 \
    tools/test_net.py \
    --master_port=$((RANDOM + 10000)) \
    --config-file configs/STFT/kumc_R_50_STFT.yaml \
    MODEL.WEIGHT kumc.pth \
    OUTPUT_DIR log_dir/kumc_finetune

🙋‍♀️ Feedback and Contact

For further questions, pls feel free to contact Zhao Wang.

🛡️ License

This project is under the Apache License 2.0 license. See LICENSE for details.

🙏 Acknowledgement

Our code is based on DINO, TimeSformer, SVT, TransUNet, and STFT. Thanks them for releasing their codes.

📝 Citation

If you find this code useful, please cite in your research papers.

@inproceedings{
    wang2023foundation,
    title={Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train},
    author={Zhao Wang and Chang Liu and Shaoting Zhang and Qi Dou},
    booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
    pages={101--111},
    year={2023},
    organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
STFT		STFT
TransUNet		TransUNet
assets		assets
checkpoints		checkpoints
data		data
datasets		datasets
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
eval_finetune.py		eval_finetune.py
train_ssl.py		train_ssl.py
vision_transformer.py		vision_transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundation Model for Endoscopy Video Analysis

Key Features

Links

Details

Datasets

Pre-training Data (6 public + 1 private)

Downstream Data (3 public)

Get Started

Main Requirements

Installation

Pre-trained Weights

Downstream Fine-tuned Weights

Pre-training

Downstream Fine-tuning

Direct Downstream Testing

🙋‍♀️ Feedback and Contact

🛡️ License

🙏 Acknowledgement

📝 Citation

About

Contributors 2

Languages

License

med-air/Endo-FM

Folders and files

Latest commit

History

Repository files navigation

Foundation Model for Endoscopy Video Analysis

Key Features

Links

Details

Datasets

Pre-training Data (6 public + 1 private)

Downstream Data (3 public)

Get Started

Main Requirements

Installation

Pre-trained Weights

Downstream Fine-tuned Weights

Pre-training

Downstream Fine-tuning

Direct Downstream Testing

🙋‍♀️ Feedback and Contact

🛡️ License

🙏 Acknowledgement

📝 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages