HONeYBEE: Harmonized Oncology Biomedical Embedding Encoder

Tutorial Notebooks

Quick-Start Guides for Clinical, Radiology, Pathology & Molecular Workflows

Examples | Documentation | Paper | Issues

🚀 Overview

HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.

Warning

Alpha Release: This framework is currently in alpha. APIs may change, and some features are still under development.

✨ Key Features

🏗️ Modular Architecture

3-Layer Design: Clean separation between data loaders, embedding models, and processors
Unified API: Consistent interface across all modalities
Extensible: Easy to add new models and data sources
Production-Ready: Optimized for both research and clinical deployment

📊 Comprehensive Data Support

Medical Imaging

Pathology: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
Radiology: DICOM, NIFTI processing with 3D support
Preprocessing: Advanced augmentation and normalization pipelines

Clinical Text

Document Processing: PDF support with OCR for scanned documents
NLP Pipeline: Cancer entity extraction, temporal parsing, medical ontology integration
Database Integration: Native MINDS format support
Long Document Handling: Multiple tokenization strategies for clinical notes

Molecular Data

Genomics: Support for expression data and mutation profiles
Integration: Seamless combination with imaging and clinical data

🧠 State-of-the-Art Embedding Models

Clinical Text Embeddings

GatorTron: Domain-specific clinical language model
BioBERT: Biomedical text understanding
PubMedBERT: Scientific literature embeddings
Clinical-T5: Text-to-text clinical transformers

Medical Image Embeddings

REMEDIS: Self-supervised medical image representations
RadImageNet: Pre-trained radiological feature extractors
UNI: Universal medical image encoder
Custom Models: Easy integration of proprietary models

🛠️ Advanced Capabilities

Multimodal Integration

Cross-Modal Learning: Unified representations across modalities
Attention Mechanisms: Interpretable fusion strategies
Patient-Level Aggregation: Comprehensive patient profiles

Analysis Tools

Survival Analysis: Cox PH, Random Survival Forest, DeepSurv
Classification: Multi-class cancer type prediction
Retrieval: Similar patient identification
Visualization: Interactive t-SNE dashboards

Clinical Applications

Risk Stratification: Patient outcome prediction
Treatment Planning: Personalized therapy recommendations
Biomarker Discovery: Multi-omic pattern identification

🚀 Quick Start (for the complete framework)

Prerequisites

Python 3.8+
PyTorch 2.0+
CUDA 11.7+ (optional, for GPU acceleration)

🚀 Quick Start (for this Workshop)

1 · Step-1 Open Google Colab

Just login with your GMAIL account and launch Google Colab.

Launching the Notebooks

The tutorials are located at:

Tutorial	Path	Purpose
Clinical Processing	`examples/clinical_processing_tutorial.ipynb`	NLP pipeline → embedding → survival & retrieval demos
Radiology Workflow	`examples/radiology_tutorial.ipynb`	3-D DICOM loading, windowing, REMEDIS embedding & patient-level aggregation
Pathology (WSI)	`examples/wsi/wsi.ipynb`	Whole-slide tiling, tissue detection, patch embedding & MIL pooling

Double-click the notebook in the Jupyter file browser, then execute cells top-to-bottom (⌘/Ctrl + ↵).

Colab Upload the notebook or open directly via the GitHub URL (File › Open Notebook › GitHub) if you prefer a cloud environment. Ensure GPU runtime is enabled (Runtime › Change runtime type › GPU).

Citation

If these notebooks assist your research, please cite:

@article{tripathi2024honeybee,
  title   = {HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},
  author  = {Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},
  journal = {arXiv preprint arXiv:2405.07460},
  year    = {2024}
}

Built with 🔬 and 🖤 by the Lab Rasool team

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
docs		docs
examples		examples
honeybee		honeybee
results		results
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Main-README.md		Main-README.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HONeYBEE: Harmonized Oncology Biomedical Embedding Encoder

Tutorial Notebooks

🚀 Overview

✨ Key Features

🏗️ Modular Architecture

📊 Comprehensive Data Support

Medical Imaging

Clinical Text

Molecular Data

🧠 State-of-the-Art Embedding Models

Clinical Text Embeddings

Medical Image Embeddings

🛠️ Advanced Capabilities

Multimodal Integration

Analysis Tools

Clinical Applications

🚀 Quick Start (for the complete framework)

Prerequisites

🚀 Quick Start (for this Workshop)

1 · Step-1 Open Google Colab

Launching the Notebooks

Citation

About

Uh oh!

Releases

Packages

Languages

License

lab-rasool/HoneyBee-Workshop

Folders and files

Latest commit

History

Repository files navigation

HONeYBEE: Harmonized Oncology Biomedical Embedding Encoder

Tutorial Notebooks

🚀 Overview

✨ Key Features

🏗️ Modular Architecture

📊 Comprehensive Data Support

Medical Imaging

Clinical Text

Molecular Data

🧠 State-of-the-Art Embedding Models

Clinical Text Embeddings

Medical Image Embeddings

🛠️ Advanced Capabilities

Multimodal Integration

Analysis Tools

Clinical Applications

🚀 Quick Start (for the complete framework)

Prerequisites

🚀 Quick Start (for this Workshop)

1 · Step-1 Open Google Colab

Launching the Notebooks

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages