Analyzing Visual Attention Mechanisms for Handwritten Digit Classification

Visual Attention Analysis with Spatial Transformer Networks for Handwritten Digit Classification on MNIST

Getting Started

Step 1: Clone this repository and change directory to repository root

git clone https://github.com/biswassanket/STN_FGC.git
cd STN_FGC

Step 2: Create a conda environment to run the above project and install required dependencies.

To create conda environment: conda env create -f environment.yml

Step 3: Activate the conda environment

conda activate stn_fgc

Step 4: Training STN Models on MNIST

To run base STN with standard Conv layers:

$ python main.py --stn

To run STN with Coordconv layers:

$ python main.py --stncoordconv --localization

Step 5: Training ViT Model on MNIST

$ python main.py --vit

Step 6: For the detailed analysis on the experimented visual attention models, here is the complete report

Results

Model Variant	Accuracy	Best Epoch
Simple Conv	0.9879	48
Simple STN+Conv	0.9889	44
Simple STN+CoordConv	0.9850	43
Simple STN+CoordConv+localization	0.9910	47
Simple STN=CoordConv+localization+r-channel	0.9868	40
Vision Transformers	0.9844	49

Authors

Sanket Biswas

Conclusion

Enjoyed playing with the models. Stay tuned, more implementations of visual attention models on fine-grained image classification task is coming soon. Thank you and sorry for the bugs,as usual.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
files		files
.gitignore		.gitignore
README.md		README.md
STN.py		STN.py
demo.ipynb		demo.ipynb
environment.yml		environment.yml
main.py		main.py
modules.py		modules.py
utils.py		utils.py
vit_pytorch.py		vit_pytorch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing Visual Attention Mechanisms for Handwritten Digit Classification

Getting Started

Step 1: Clone this repository and change directory to repository root

Step 2: Create a conda environment to run the above project and install required dependencies.

Step 3: Activate the conda environment

Step 4: Training STN Models on MNIST

Step 5: Training ViT Model on MNIST

Step 6: For the detailed analysis on the experimented visual attention models, here is the complete report

Results

Authors

Conclusion

About

Releases

Packages

Languages

biswassanket/STN_FGC

Folders and files

Latest commit

History

Repository files navigation

Analyzing Visual Attention Mechanisms for Handwritten Digit Classification

Getting Started

Step 1: Clone this repository and change directory to repository root

Step 2: Create a conda environment to run the above project and install required dependencies.

Step 3: Activate the conda environment

Step 4: Training STN Models on MNIST

Step 5: Training ViT Model on MNIST

Step 6: For the detailed analysis on the experimented visual attention models, here is the complete report

Results

Authors

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages