Our project involves implementing a network and training it on a dataset of Thai food images. We will then compare its performance with that of three pre-trained models in PyTorch: DenseNet, ResNet50, and Vision Transformer (base-sized model). We will modify the pre-trained models to work with our dataset and visualize some of their layers to understand how they classify Thai dishes.
- Thean Cheat Lim
- Wenlin Fang
- datasets
- transformers
- evaluate
image_preprocessing.py
: run the main function to calculate the mean and the standard deviations of the training imagescustom.py
: run the main function to train the model and see the accuracies on the validation and test datasetresnet_densenet.py
:- Fine-tune ResNet and DenseNet models
- Usage: python resnet_densenet.py resnet50 model_finetuned_outname 20 /train_data_dir /val_data_dir /test_data_dir
- Usage: python resnet_densenet.py densenet161 model_finetuned_outname 20 /train_data_dir /val_data_dir /test_data_dir
swinv2.py
:- Fine-Tune the SwinV2 (tiny) model using the THFOOD-50 dataset.
- Usage: python swinv2.py 20 /output_dir 20 means training for 20 epoch
visualization.py
: run the main function to visualize the attention maps of the four modelspytorch_model_utils.py
: Utility functions for creating/training/testing Pytorch models.