Lung Cancer Detection

About

Lung Cancer Detection is a project made as part of Engineers Thesis "Applications of artificial intelligence in oncology on computer tomography dataset" by Jakub Owczarek, under the guidance of Thesis Advisor dr. hab. inz Mariusz Mlynarczuk prof. AGH.

The goal of this project is to process the LIDC-IDRI dataset and evaluate the performance of deep learning models pre-trained on Image Net by leveraging transfer learning.

Project Structure

This repository contains the following directories:

docs - contains markdown files with more specific descriptions of the project components
notebooks - contains Jupyter Notebooks that were used for experiments, analysis, visualizations, etc
scripts - this directory is the actual workhorse and contains two notable subdirectories:
- azure - contains scripts for Azure Virtual Machine and Azure Machine Learning
- local - contains scripts that were used for local development
src - contains main components of the project:
- azure - contains utilities specific to Azure services
- dataset - contains DatasetLoader component used to feed data during model training
- model - contains model builder and director classes
- preprocessing - contains classes used for LIDC-IDRI dataset preprocessing
- config.py - some constants used throughout the project
tests - contains (few) tests for the project components

Usage

This project was created with Azure in mind and therefore the main scripts are meant for usage on Azure.

1. Preprocessing

First step is to download the LIDC-IDRI dataset on Azure Virtual Machine. The azure/virtual_machine/download_dataset.sh script is meant for this task.
Then, it's time to preprocess this dataset to a format suitable for supervised deep learning model training. The azure/virtual_machine/process_dataset.py script is meant for this task. Additionally, in the same directory is train_test_split.py, which should be used to split processed data.
Finally, the preprocessed dataset can be uploaded with the upload_dataset_2.sh script to Azure Blob Storage. There is also upload_dataset.sh script, but it doesn't use the azcopy utility and is too slow.

2. Model training

With preprocessed dataset on Azure Blob Storage, the Virtual Machine will be no longer necessary. From this dataset an Azure Machine Learning data asset can be created, which can be utilized during model training.
Now to run the actual model training under scripts/azure/machine_learing is the run_training_job.py script. This script can be used to create a job on AML, to build, compile and train desired model.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Lung Cancer Detection

Table Of Contents

About

Project Structure

Usage

1. Preprocessing

2. Model training

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Lung Cancer Detection

Table Of Contents

About

Project Structure

Usage

1. Preprocessing

2. Model training

License