Quick-Start Guides for Clinical, Radiology, Pathology & Molecular Workflows
Examples | Documentation | Paper | Issues
HoneyBee is a comprehensive multimodal AI framework designed specifically for oncology research and clinical applications. It seamlessly integrates and processes diverse medical data types—clinical text, radiology images, pathology slides, and molecular data—through a unified, modular architecture. Built with scalability and extensibility in mind, HoneyBee empowers researchers to develop sophisticated AI models for cancer diagnosis, prognosis, and treatment planning.
Warning
Alpha Release: This framework is currently in alpha. APIs may change, and some features are still under development.
- 3-Layer Design: Clean separation between data loaders, embedding models, and processors
- Unified API: Consistent interface across all modalities
- Extensible: Easy to add new models and data sources
- Production-Ready: Optimized for both research and clinical deployment
- Pathology: Whole Slide Images (WSI) - SVS, TIFF formats with tissue detection
- Radiology: DICOM, NIFTI processing with 3D support
- Preprocessing: Advanced augmentation and normalization pipelines
- Document Processing: PDF support with OCR for scanned documents
- NLP Pipeline: Cancer entity extraction, temporal parsing, medical ontology integration
- Database Integration: Native MINDS format support
- Long Document Handling: Multiple tokenization strategies for clinical notes
- Genomics: Support for expression data and mutation profiles
- Integration: Seamless combination with imaging and clinical data
- GatorTron: Domain-specific clinical language model
- BioBERT: Biomedical text understanding
- PubMedBERT: Scientific literature embeddings
- Clinical-T5: Text-to-text clinical transformers
- REMEDIS: Self-supervised medical image representations
- RadImageNet: Pre-trained radiological feature extractors
- UNI: Universal medical image encoder
- Custom Models: Easy integration of proprietary models
- Cross-Modal Learning: Unified representations across modalities
- Attention Mechanisms: Interpretable fusion strategies
- Patient-Level Aggregation: Comprehensive patient profiles
- Survival Analysis: Cox PH, Random Survival Forest, DeepSurv
- Classification: Multi-class cancer type prediction
- Retrieval: Similar patient identification
- Visualization: Interactive t-SNE dashboards
- Risk Stratification: Patient outcome prediction
- Treatment Planning: Personalized therapy recommendations
- Biomarker Discovery: Multi-omic pattern identification
- Python 3.8+
- PyTorch 2.0+
- CUDA 11.7+ (optional, for GPU acceleration)
Just login with your GMAIL account and launch Google Colab.
The tutorials are located at:
Tutorial | Path | Purpose |
---|---|---|
Clinical Processing | examples/clinical_processing_tutorial.ipynb |
NLP pipeline → embedding → survival & retrieval demos |
Radiology Workflow | examples/radiology_tutorial.ipynb |
3-D DICOM loading, windowing, REMEDIS embedding & patient-level aggregation |
Pathology (WSI) | examples/wsi/wsi.ipynb |
Whole-slide tiling, tissue detection, patch embedding & MIL pooling |
Double-click the notebook in the Jupyter file browser, then execute cells top-to-bottom (⌘/Ctrl + ↵).
Colab Upload the notebook or open directly via the GitHub URL (
File › Open Notebook › GitHub
) if you prefer a cloud environment. Ensure GPU runtime is enabled (Runtime › Change runtime type › GPU
).
If these notebooks assist your research, please cite:
@article{tripathi2024honeybee,
title = {HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models},
author = {Aakash Tripathi and Asim Waqas and Yasin Yilmaz and Ghulam Rasool},
journal = {arXiv preprint arXiv:2405.07460},
year = {2024}
}