Analyzing Visual Attention Mechanisms for Handwritten Digit Classification

Visual Attention Analysis with Spatial Transformer Networks for Handwritten Digit Classification on MNIST

Getting Started

Step 1: Clone this repository and change directory to repository root

git clone https://github.com/biswassanket/STN_FGC.git
cd STN_FGC

Step 2: Create a conda environment to run the above project and install required dependencies.

To create conda environment: conda env create -f environment.yml

Step 3: Activate the conda environment

conda activate stn_fgc

Step 4: Training STN Models on MNIST

To run base STN with standard Conv layers:

$ python main.py --stn

To run STN with Coordconv layers:

$ python main.py --stncoordconv --localization

Step 5: Training ViT Model on MNIST

$ python main.py --vit

Step 6: For the detailed analysis on the experimented visual attention models, here is the complete report

Results

Model Variant	Accuracy	Best Epoch
Simple Conv	0.9879	48
Simple STN+Conv	0.9889	44
Simple STN+CoordConv	0.9850	43
Simple STN+CoordConv+localization	0.9910	47
Simple STN=CoordConv+localization+r-channel	0.9868	40
Vision Transformers	0.9844	49

Authors

Sanket Biswas

Conclusion

Enjoyed playing with the models. Stay tuned, more implementations of visual attention models on fine-grained image classification task is coming soon. Thank you and sorry for the bugs,as usual.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Analyzing Visual Attention Mechanisms for Handwritten Digit Classification

Getting Started

Step 1: Clone this repository and change directory to repository root

Step 2: Create a conda environment to run the above project and install required dependencies.

Step 3: Activate the conda environment

Step 4: Training STN Models on MNIST

Step 5: Training ViT Model on MNIST

Step 6: For the detailed analysis on the experimented visual attention models, here is the complete report

Results

Authors

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

Analyzing Visual Attention Mechanisms for Handwritten Digit Classification

Getting Started

Step 1: Clone this repository and change directory to repository root

Step 2: Create a conda environment to run the above project and install required dependencies.

Step 3: Activate the conda environment

Step 4: Training STN Models on MNIST

Step 5: Training ViT Model on MNIST

Step 6: For the detailed analysis on the experimented visual attention models, here is the complete report

Results

Authors

Conclusion