SlideSleuth

Description

SlideSleuth is a tool to analyze large whole slide image (WSI) datasets of lung adenocarcinoma (LUAD) via feature extraction and unsupervised learning. Specifically, SlideSleuth uses each slide image as input to a variational autoencoder (VAE), then clusters made by the VAE are analyzed. Within the clusters, we aim to identify biomarkers/cancer drivers for LUAD.

The tool includes pipelines that prepare WSI datasets for both a supervised classifier and a variational autoencoder.

Currently, the tool is still in active development. As of right now, only the data pipeline has been built. The development languages are Python and R, and bash. Pipelining and development are done with the help of Tensorflow, Openslide, and the R package Bioconductor. Containerization is done with Apptainer (formerly Singularity).

Installation Instructons

Requirements

Setup Instructions

Compute Canada

Run the command ./setup.sh, followed by the command source ENV/bin/activate in the same directory. This will install all necessary dependencies.

Use Instructions

The project source code is divided into 4 main sections: features - code to tile images and sort the images into , data - code to perform model-specific post-processing on the data, models - code to train and test supervised and unsupervised models on the data, and visualization - code to visualize the trained model performance.

Features

Assuming the tiled images are in folder, execution of the script src/data/cvae_data_pipeline.sh with folder as the DIR_PATH global variable will reorganize the data into a format that is readable by Tensorflow's data pipeline APIs. Similar to the data step, this step has been done by Jackson already for the UHN dataset (it is a little bit time consuming).

Data

Assuming the use case of the UHN private dataset that this project was developed with, the script src/features/tile_uhn_binary.sh should be run to make tiles from raw slide images. In the case of the UHN dataset, this has been done by Jackson already and may save some time if you contact him about transferring the data (assuming you have permission to view the data).

Models

Once the data is processed, the convolutional variational autoencoder can be trained by running src/models/train_cvae.sh with the desired DIR_PATH, SAVE_PATH, and FIG_PATH global variables.

Visualization

Once the model is trained, the autoencoder can reconstruct a sample of images by calling src/visualization/analyze_cvae.sh.

Data Availability

The dataset used for the current iteration of this tool is a private dataset from UHN. Please contact the authors for inquiries regarding data availability. Other test datasets were used during development, mainly the TCGA-BRCA and TCGA-PAAD projects from GDC.

Contact

Please contact [email protected] for any questions, concerns, bug fixes, or further clarifications.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
data		data
notebooks		notebooks
reports		reports
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
container.def		container.def
requirements_cc.txt		requirements_cc.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SlideSleuth

Description

Table of Contents

Installation Instructons

Requirements

Setup Instructions

Compute Canada

Use Instructions

Features

Data

Models

Visualization

Data Availability

Contact

About

Releases

Packages

Languages

JacksonH44/SlideSleuth

Folders and files

Latest commit

History

Repository files navigation

SlideSleuth

Description

Table of Contents

Installation Instructons

Requirements

Setup Instructions

Compute Canada

Use Instructions

Features

Data

Models

Visualization

Data Availability

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages