Here we use a classification model to classify small batches extracted from very large whole-slide histopathology images. Since the patches are very small compare to the whole image, we can then use this model for detection of tumor in different area of a whole-slide pathology image.
The model is based on ResNet18 with the last fully connected layer replaced by a 1x1 convolution layer.
All the data used to train and validate this model is from Camelyon-16 Challenge. You can download all the images for "CAMELYON16" data set from various sources listed here.
Location information for training/validation patches (the location on the whole slide image where patches are extracted) are adopted from NCRF/coords. The reformatted coordinations and labels in CSV format for training (training.csv
) can be found here and for validation (validation.csv
) can be found here.
This pipeline expects the training/validation data (whole slide images) reside in cfg["data_root"]/training/images
. By default data_root
is pointing to the code folder ./
; however, you can easily modify it to point to a different directory by passing the following argument in the runtime: --data-root /other/data/root/dir/
.
training_sub.csv
andvalidation_sub.csv
is also provided to check the functionality of the pipeline using only two of the whole slide images:tumor_001
(for training) andtumor_101
(for validation). This dataset should not be used for the real training or any performance evaluation.
Input for the training pipeline is a json file (dataset.json) which includes path to each WSI, the location and the label information for each training patch.
Output of the network is the probability of whether the input patch contains tumor or not.
This is an example, not to be used for diagnostic purposes.