GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.
First, clone the repository locally,
git clone https://github.com/amazon-research/gluonmm.git
Then install dependencies,
conda create -n gluonmm python=3.7
conda activate gluonmm
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install timm tensorboardX yacs tqdm requests pandas decord scikit-image opencv-python
# Install apex for half-precision training (optional)
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
We have extensively tested the usage with PyTorch 1.8.1 and torchvision 0.9.1 with CUDA 10.2.
For detailed usage, please refer to the README file in each model family. For example, the training, evaluation and model zoo information of video transformer VidTr can be found at here.
See CONTRIBUTING for more information.
This project is licensed under the Apache-2.0 License.
Parts of the code are heavily derived from pytorch-image-models, DeiT, Swin-transformer, vit-pytorch and vision_transformer.