HoneyBee

A Scalable Modular Framework for Multimodal AI in Oncology

Documentation | Paper | Examples | Demo | Google Colab

🚀 Overview

HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.

Warning

Alpha Release: This framework is currently in alpha. APIs may change, and some features are still under development.

✨ Key Features

🏗️ Modular Architecture

3-Layer Design: Clean separation between data loaders, embedding models, and processors
Unified API: Consistent interface across all modalities
Extensible: Easy to add new models and data sources
Production-Ready: Optimized for both research and clinical deployment

📊 Comprehensive Data Support

Medical Imaging

Pathology: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
Radiology: DICOM, NIFTI processing with 3D support
Preprocessing: Advanced augmentation and normalization pipelines

Clinical Text

Document Processing: PDF support with OCR for scanned documents
NLP Pipeline: Cancer entity extraction, temporal parsing, medical ontology integration
Database Integration: Native MINDS format support
Long Document Handling: Multiple tokenization strategies for clinical notes

Molecular Data

Genomics: Support for expression data and mutation profiles
Integration: Seamless combination with imaging and clinical data

🧠 State-of-the-Art Embedding Models

Clinical Text Embeddings

GatorTron: Domain-specific clinical language model
BioBERT: Biomedical text understanding
PubMedBERT: Scientific literature embeddings
Clinical-T5: Text-to-text clinical transformers

Medical Image Embeddings

REMEDIS: Self-supervised medical image representations
RadImageNet: Pre-trained radiological feature extractors
UNI: Universal medical image encoder
Custom Models: Easy integration of proprietary models

🛠️ Advanced Capabilities

Multimodal Integration

Cross-Modal Learning: Unified representations across modalities
Attention Mechanisms: Interpretable fusion strategies
Patient-Level Aggregation: Comprehensive patient profiles

Analysis Tools

Survival Analysis: Cox PH, Random Survival Forest, DeepSurv
Classification: Multi-class cancer type prediction
Retrieval: Similar patient identification
Visualization: Interactive t-SNE dashboards

Clinical Applications

Risk Stratification: Patient outcome prediction
Treatment Planning: Personalized therapy recommendations
Biomarker Discovery: Multi-omic pattern identification

🚀 Quick Start

Prerequisites

Python 3.8+
PyTorch 2.0+
CUDA 11.7+ (optional, for GPU acceleration)

System Dependencies

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y openslide-tools tesseract-ocr

# macOS
brew install openslide tesseract

# Windows
# Install from official websites:
# - OpenSlide: https://openslide.org/download/
# - Tesseract: https://github.com/UB-Mannheim/tesseract/wiki

Installation

# Clone the repository
git clone https://github.com/lab-rasool/HoneyBee.git
cd HoneyBee

# Install dependencies
pip install -r requirements.txt

# Download required NLTK data
python -c "import nltk; nltk.download('punkt')"

# Install HoneyBee in development mode
pip install -e .

Environment Setup

Create a .env file in the project root:

# MINDS database credentials (if using MINDS format)
HOST=your_server
PORT=5433
DB_USER=postgres
PASSWORD=your_password
DATABASE=minds

# HuggingFace API (for some models)
HF_API_KEY=your_huggingface_api_key

🔬 Research Applications

HoneyBee has been successfully applied to:

Cancer Subtype Classification: Automated identification of cancer subtypes from multimodal data
Survival Prediction: Risk stratification and outcome prediction for treatment planning
Similar Patient Retrieval: Finding patients with similar clinical profiles for precision medicine
Biomarker Discovery: Identifying multimodal patterns associated with treatment response

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Fork and clone your fork
git clone https://github.com/YOUR_USERNAME/HoneyBee.git
cd HoneyBee

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -r requirements.txt
pip install -e .

🐛 Known Issues & Limitations

Alpha Status: Some features are still under development
Memory Requirements: WSI processing requires significant RAM (16GB+ recommended)
GPU Recommended: While CPU fallback exists, GPU acceleration significantly improves performance
Limited Test Coverage: Comprehensive test suite is planned for future releases

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

📝 Citation

If you use HoneyBee in your research, please cite our paper:

@article{tripathi2024honeybee,
    title={HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},
    author={Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},
    journal={arXiv preprint arXiv:2405.07460},
    year={2024},
    eprint={2405.07460},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Made with ❤️ by the Lab Rasool team

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
honeybee		honeybee
results		results
website		website
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HoneyBee

🚀 Overview

✨ Key Features

🏗️ Modular Architecture

📊 Comprehensive Data Support

Medical Imaging

Clinical Text

Molecular Data

🧠 State-of-the-Art Embedding Models

Clinical Text Embeddings

Medical Image Embeddings

🛠️ Advanced Capabilities

Multimodal Integration

Analysis Tools

Clinical Applications

🚀 Quick Start

Prerequisites

System Dependencies

Installation

Environment Setup

🔬 Research Applications

🤝 Contributing

Development Setup

🐛 Known Issues & Limitations

📜 License

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

lab-rasool/HoneyBee

Folders and files

Latest commit

History

Repository files navigation

HoneyBee

🚀 Overview

✨ Key Features

🏗️ Modular Architecture

📊 Comprehensive Data Support

Medical Imaging

Clinical Text

Molecular Data

🧠 State-of-the-Art Embedding Models

Clinical Text Embeddings

Medical Image Embeddings

🛠️ Advanced Capabilities

Multimodal Integration

Analysis Tools

Clinical Applications

🚀 Quick Start

Prerequisites

System Dependencies

Installation

Environment Setup

🔬 Research Applications

🤝 Contributing

Development Setup

🐛 Known Issues & Limitations

📜 License

📝 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages