
A Scalable Modular Framework for Multimodal AI in Oncology
Documentation | Paper | Examples | Demo | Google Colab
HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data typesβclinical text, radiology images, pathology slides, and molecular dataβthrough a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.
Warning
Alpha Release: This framework is currently in alpha. APIs may change, and some features are still under development.
- 3-Layer Design: Clean separation between data loaders, embedding models, and processors
- Unified API: Consistent interface across all modalities
- Extensible: Easy to add new models and data sources
- Production-Ready: Optimized for both research and clinical deployment
- Pathology: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
- Radiology: DICOM, NIFTI processing with 3D support
- Preprocessing: Advanced augmentation and normalization pipelines
- Document Processing: PDF support with OCR for scanned documents
- NLP Pipeline: Cancer entity extraction, temporal parsing, medical ontology integration
- Database Integration: Native MINDS format support
- Long Document Handling: Multiple tokenization strategies for clinical notes
- Genomics: Support for expression data and mutation profiles
- Integration: Seamless combination with imaging and clinical data
- GatorTron: Domain-specific clinical language model
- BioBERT: Biomedical text understanding
- PubMedBERT: Scientific literature embeddings
- Clinical-T5: Text-to-text clinical transformers
- REMEDIS: Self-supervised medical image representations
- RadImageNet: Pre-trained radiological feature extractors
- UNI: Universal medical image encoder
- Custom Models: Easy integration of proprietary models
- Cross-Modal Learning: Unified representations across modalities
- Attention Mechanisms: Interpretable fusion strategies
- Patient-Level Aggregation: Comprehensive patient profiles
- Survival Analysis: Cox PH, Random Survival Forest, DeepSurv
- Classification: Multi-class cancer type prediction
- Retrieval: Similar patient identification
- Visualization: Interactive t-SNE dashboards
- Risk Stratification: Patient outcome prediction
- Treatment Planning: Personalized therapy recommendations
- Biomarker Discovery: Multi-omic pattern identification
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.7+ (optional, for GPU acceleration)
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y openslide-tools tesseract-ocr
# macOS
brew install openslide tesseract
# Windows
# Install from official websites:
# - OpenSlide: https://openslide.org/download/
# - Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
# Clone the repository
git clone https://github.com/lab-rasool/HoneyBee.git
cd HoneyBee
# Install dependencies
pip install -r requirements.txt
# Download required NLTK data
python -c "import nltk; nltk.download('punkt')"
# Install HoneyBee in development mode
pip install -e .
Create a .env
file in the project root:
# MINDS database credentials (if using MINDS format)
HOST=your_server
PORT=5433
DB_USER=postgres
PASSWORD=your_password
DATABASE=minds
# HuggingFace API (for some models)
HF_API_KEY=your_huggingface_api_key
HoneyBee has been successfully applied to:
- Cancer Subtype Classification: Automated identification of cancer subtypes from multimodal data
- Survival Prediction: Risk stratification and outcome prediction for treatment planning
- Similar Patient Retrieval: Finding patients with similar clinical profiles for precision medicine
- Biomarker Discovery: Identifying multimodal patterns associated with treatment response
We welcome contributions! Please see our Contributing Guidelines for details.
# Fork and clone your fork
git clone https://github.com/YOUR_USERNAME/HoneyBee.git
cd HoneyBee
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -r requirements.txt
pip install -e .
- Alpha Status: Some features are still under development
- Memory Requirements: WSI processing requires significant RAM (16GB+ recommended)
- GPU Recommended: While CPU fallback exists, GPU acceleration significantly improves performance
- Limited Test Coverage: Comprehensive test suite is planned for future releases
This project is licensed under the MIT License - see the LICENSE file for details.
If you use HoneyBee in your research, please cite our paper:
@article{tripathi2024honeybee,
title={HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},
author={Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},
journal={arXiv preprint arXiv:2405.07460},
year={2024},
eprint={2405.07460},
archivePrefix={arXiv},
primaryClass={cs.LG}
}