Skip to content

Latest commit

 

History

History
69 lines (48 loc) · 1.91 KB

README.md

File metadata and controls

69 lines (48 loc) · 1.91 KB

Analyzing Visual Attention Mechanisms for Handwritten Digit Classification



Visual Attention Analysis with Spatial Transformer Networks for Handwritten Digit Classification on MNIST

Getting Started

Step 1: Clone this repository and change directory to repository root

git clone https://github.com/biswassanket/STN_FGC.git
cd STN_FGC

Step 2: Create a conda environment to run the above project and install required dependencies.

  • To create conda environment: conda env create -f environment.yml

Step 3: Activate the conda environment

conda activate stn_fgc

Step 4: Training STN Models on MNIST

  • To run base STN with standard Conv layers:
$ python main.py --stn
  • To run STN with Coordconv layers:
$ python main.py --stncoordconv --localization

Step 5: Training ViT Model on MNIST

$ python main.py --vit

Step 6: For the detailed analysis on the experimented visual attention models, here is the complete report

Results

Model Variant Accuracy Best Epoch
Simple Conv 0.9879 48
Simple STN+Conv 0.9889 44
Simple STN+CoordConv 0.9850 43
Simple STN+CoordConv+localization 0.9910 47
Simple STN=CoordConv+localization+r-channel 0.9868 40
Vision Transformers 0.9844 49

Authors

Conclusion

Enjoyed playing with the models. Stay tuned, more implementations of visual attention models on fine-grained image classification task is coming soon. Thank you and sorry for the bugs,as usual.