Skip to content

JacksonH44/SlideSleuth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SlideSleuth

Description

SlideSleuth is a tool to analyze large whole slide image (WSI) datasets of lung adenocarcinoma (LUAD) via feature extraction and unsupervised learning. Specifically, SlideSleuth uses each slide image as input to a variational autoencoder (VAE), then clusters made by the VAE are analyzed. Within the clusters, we aim to identify biomarkers/cancer drivers for LUAD.

The tool includes pipelines that prepare WSI datasets for both a supervised classifier and a variational autoencoder.

Currently, the tool is still in active development. As of right now, only the data pipeline has been built. The development languages are Python and R, and bash. Pipelining and development are done with the help of Tensorflow, Openslide, and the R package Bioconductor. Containerization is done with Apptainer (formerly Singularity).

Table of Contents

Installation Instructons

Requirements

Setup Instructions

Compute Canada

Run the command ./setup.sh, followed by the command source ENV/bin/activate in the same directory. This will install all necessary dependencies.

Use Instructions

The project source code is divided into 4 main sections: features - code to tile images and sort the images into , data - code to perform model-specific post-processing on the data, models - code to train and test supervised and unsupervised models on the data, and visualization - code to visualize the trained model performance.

Features

Assuming the tiled images are in folder, execution of the script src/data/cvae_data_pipeline.sh with folder as the DIR_PATH global variable will reorganize the data into a format that is readable by Tensorflow's data pipeline APIs. Similar to the data step, this step has been done by Jackson already for the UHN dataset (it is a little bit time consuming).

Data

Assuming the use case of the UHN private dataset that this project was developed with, the script src/features/tile_uhn_binary.sh should be run to make tiles from raw slide images. In the case of the UHN dataset, this has been done by Jackson already and may save some time if you contact him about transferring the data (assuming you have permission to view the data).

Models

Once the data is processed, the convolutional variational autoencoder can be trained by running src/models/train_cvae.sh with the desired DIR_PATH, SAVE_PATH, and FIG_PATH global variables.

Visualization

Once the model is trained, the autoencoder can reconstruct a sample of images by calling src/visualization/analyze_cvae.sh.

Data Availability

The dataset used for the current iteration of this tool is a private dataset from UHN. Please contact the authors for inquiries regarding data availability. Other test datasets were used during development, mainly the TCGA-BRCA and TCGA-PAAD projects from GDC.

Contact

Please contact [email protected] for any questions, concerns, bug fixes, or further clarifications.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published