Skip to content

Final project for the Machine Learning course @ NTU: A disease detection model.

Notifications You must be signed in to change notification settings

timchen0618/NTU_FINAL_DEEPQ

Repository files navigation

Medical Image Classification - Disease Detection

Package Used:

  1. PyTorch
  2. numpy, pandas
  3. torchvision
  4. PIL
  5. sklearn
  6. torchsummary

Reproduction

We use multiple models to form one ensemble model. They mainly differ in validation data, dropout and linear_drop. All models other than model_4 and model_5 are validated on the first 1/10 portion of data. Parameters of all models are listed below.

To reproduce the training result, we have to modify the script a little bit because the partition of training and validation set are determined in the python scripts. If we need to adjust the partition, we have to modify the numbers in genLabels_Partition in load.py. Or, you can run shuffle.py to make the results more random.

The parameters of all seven models:

  • model_1: dropout = 0.4, linear_drop = 0.2
  • model_2: dropout = 0.2, linear_drop = 0.2
  • model_3: dropout = 0.5, linear_drop = 0.2
  • model_4: dropout = 0.5, linear_drop = 0.2 #validation data => 0.2-0.3
  • model_5: dropout = 0.5, linear_drop = 0.2 #validation data =>0.3-0.4
  • model_6: dropout = 0.5, linear_drop = 0.4
  • model_7: dropout = 0, linear_drop = 0

Run the Code

Preprocessing

Before training, we have to execute shuffle.py first in order to ensure similar data distribution in training and validation set. (Thus we can prevent from grouping similar data in validation set.) The command are as shown below:
python3 shuffle.py [input_name] (e.g. -python3 shuffle.py ./train.csv)
The output file label_only.csv will be the input file for training.

Run Training:

python3 train_600.py [input_file] [root_dir] [dropout] [lineardrop]

  • input_file is the file generated by shuffle.py and should be label_only.csv
  • root_dir is where the images are stored
  • dropout and lineardrop are hyperparameters (dropout:the dropout rate of denseblock; linear_drop: the dropout rate of the linear classifier)
    Example Input:python3 train_600.py ./label_only.csv ./images 0.5 0.2

After training, we save the models with validation score over a certain threshold in ./result/ directory. We will later use these models for testing.

Run Testing:

One Model

python3 evaluate_600.py [input_file] [model_file] [root_dir_of_image] [output_file]
(e.g python3 evaluate_600.py ./test.csv ./train_best.model ./image.csv ./result.csv)

The output is the testing result of a single model.

Ensemble

bash fianl_test.sh [input_file] [output_file] [root_dir_of_image]
(e.g. bash final_test.sh ./test.csv ./result.csv ./ntu_final/images)

Also, the models in final/result are needed for testing (the paths are written in ./result/model_name).

About

Final project for the Machine Learning course @ NTU: A disease detection model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published