Skip to content

lab-rasool/EAGLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

27 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EAGLE Logo

πŸ¦… EAGLE

Efficient Alignment of Generalized Latent Embeddings

A State-of-the-Art Multimodal Survival Prediction Framework

Python 3.8+ PyTorch License: MIT Code style: black

Features β€’ Quick Start β€’ Documentation β€’ Citation


🎯 Overview

EAGLE is a multimodal deep learning framework designed for survival prediction in cancer patients. By integrating imaging, clinical, and textual data through attention-based fusion, EAGLE provides a survival predictions with interpretability through attribution analysis.

πŸ”¬ Why EAGLE?

  • πŸ† State-of-the-Art Performance: Achieves superior C-index scores across multiple cancer types
  • πŸ” Interpretable AI: Attribution analysis reveals which modalities drive predictions
  • ⚑ Efficient Architecture: 99.96% dimensionality reduction while maintaining competitive performance
  • πŸ₯ Clinical Ready: Designed with healthcare practitioners in mind, providing actionable insights
  • πŸ“Š Comprehensive Evaluation: Built-in comparison with traditional survival models (RSF, CoxPH, DeepSurv)

✨ Key Features

🧬 Multimodal Integration

  • Seamless fusion of imaging embeddings (MRI/CT)
  • Clinical feature processing with standardization
  • Automated text feature extraction from reports
  • Attention-based modality fusion

πŸ“ˆ Advanced Analytics

  • Risk stratification into clinically meaningful groups
  • Kaplan-Meier survival analysis
  • Time-dependent AUC evaluation
  • Comprehensive performance metrics

πŸ” Interpretability

  • Patient-level attribution analysis
  • Modality contribution visualization
  • Feature importance rankings
  • Cohort-level insights

πŸš€ Production Ready

  • Modular, extensible architecture
  • Comprehensive logging and checkpointing
  • Cross-validation support
  • Automatic visualization generation

πŸš€ Quick Start

πŸ“‹ Prerequisites

  • Python 3.8+
  • PyTorch 2.0+
  • CUDA-capable GPU (recommended)
  • 16GB+ RAM

πŸ”§ Installation

# Clone the repository
git clone https://github.com/lab-rasool/EAGLE.git
cd EAGLE

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

🎯 Basic Usage

# Run survival analysis on GBM dataset
python main.py --dataset GBM

# Enable attribution analysis for interpretability
python main.py --dataset NSCLC --comprehensive-attribution

# Run with custom configuration
python main.py --dataset IPMN \
               --epochs 150 \
               --batch-size 24 \
               --lr 5e-5 \
               --comprehensive-attribution

πŸ”¬ Advanced Usage

# Compare with baseline models
python main.py --mode baseline --dataset GBM

# Run complete analysis (EAGLE + all baselines)
python main.py --mode all --comprehensive-attribution

# Use MedGemma embeddings
python main.py --dataset NSCLC \
               --data-path data/NSCLC/medgemma.parquet

πŸ“š Detailed Examples

🐍 Python API

from eagle import UnifiedPipeline, GBM_CONFIG, ModelConfig

# Configure model
model_config = ModelConfig(
    imaging_encoder_dims=[512, 256, 128],
    clinical_encoder_dims=[128, 64, 32],
    text_encoder_dims=[256, 128],
    fusion_dims=[256, 128, 64],
    dropout=0.35,
    batch_size=32,
    learning_rate=1e-4,
    num_epochs=100
)

# Create pipeline
pipeline = UnifiedPipeline(GBM_CONFIG, model_config)

# Run analysis
results, risk_df, stats = pipeline.run(
    n_folds=5,
    n_risk_groups=3,
    enable_attribution=True
)

# Display results
print(f"Mean C-index: {results['mean_cindex']:.4f}")
print(f"Std C-index: {results['std_cindex']:.4f}")

πŸ“Š Attribution Analysis

from eagle import ModalityAttributionAnalyzer

# Analyze modality contributions
analyzer = ModalityAttributionAnalyzer(model, dataset)
contributions = analyzer.analyze_cohort()

# Analyze specific patient
patient_attr = analyzer.analyze_patient(patient_idx=42)
print(f"Imaging contribution: {patient_attr['imaging']:.2%}")
print(f"Clinical contribution: {patient_attr['clinical']:.2%}")
print(f"Text contribution: {patient_attr['text']:.2%}")

🎨 Custom Dataset

from eagle import DatasetConfig, UnifiedPipeline

# Define custom dataset configuration
custom_config = DatasetConfig(
    name="MyDataset",
    data_path="path/to/data.parquet",
    imaging_modality="MRI",
    imaging_embedding_dim=1000,
    clinical_features=["age", "gender", "stage", "biomarker"],
    text_columns=["radiology_report", "pathology_report"],
    survival_time_col="survival_months",
    event_col="status",
    patient_col="patient_id"
)

# Run pipeline
pipeline = UnifiedPipeline(custom_config, model_config)
results, risk_df, stats = pipeline.run()

πŸ“ Project Structure

EAGLE/
β”‚
β”œβ”€β”€ πŸ“‚ eagle/                    # Core library
β”‚   β”œβ”€β”€ __init__.py             # Main API and pipeline
β”‚   β”œβ”€β”€ data.py                 # Data loading and preprocessing
β”‚   β”œβ”€β”€ models.py               # Neural network architectures
β”‚   β”œβ”€β”€ train.py                # Training logic
β”‚   β”œβ”€β”€ eval.py                 # Evaluation and metrics
β”‚   β”œβ”€β”€ attribution.py          # Interpretability analysis
β”‚   └── viz.py                  # Visualization utilities
β”‚
β”œβ”€β”€ πŸ“‚ data/                     # Dataset directory
β”‚   β”œβ”€β”€ GBM/                    # Glioblastoma data
β”‚   β”œβ”€β”€ IPMN/                   # Pancreatic cyst data
β”‚   └── NSCLC/                  # Lung cancer data
β”‚
β”œβ”€β”€ πŸ“‚ results/                  # Output directory
β”‚   └── [Dataset]/[Timestamp]/  # Experiment results
β”‚
β”œβ”€β”€ πŸ“„ main.py                   # CLI interface
β”œβ”€β”€ πŸ“„ requirements.txt          # Dependencies
└── πŸ“„ README.md                # This file

πŸ”§ Configuration

Dataset Configuration

EAGLE supports three cancer datasets out of the box:

Dataset Cancer Type Imaging Key Features
GBM Glioblastoma MRI MGMT status, age, gender
IPMN Pancreatic Cysts CT Cyst size, location, morphology
NSCLC Lung Cancer CT TNM staging, histology, smoking

Model Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Imaging   β”‚     β”‚   Clinical   β”‚     β”‚    Text     β”‚
β”‚  Embeddings β”‚     β”‚   Features   β”‚     β”‚ Embeddings  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                   β”‚                   β”‚
       β–Ό                   β–Ό                   β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Imaging   β”‚     β”‚   Clinical   β”‚     β”‚    Text     β”‚
β”‚   Encoder   β”‚     β”‚   Encoder    β”‚     β”‚   Encoder   β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                   β”‚                   β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                    β”‚  Attention  β”‚
                    β”‚   Fusion    β”‚
                    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                           β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
                    β”‚   Survival  β”‚
                    β”‚ Prediction  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ˆ Output Structure

Each experiment generates a comprehensive set of outputs:

results/[Dataset]/[Timestamp]/
β”œβ”€β”€ πŸ“Š figures/                 # Visualizations
β”‚   β”œβ”€β”€ kaplan_meier_curves.png
β”‚   β”œβ”€β”€ risk_distribution.png
β”‚   └── risk_vs_survival.png
β”œβ”€β”€ 🧠 models/                  # Trained models
β”‚   β”œβ”€β”€ best_model_fold1.pth
β”‚   └── ...
β”œβ”€β”€ πŸ“ˆ results/                 # Metrics and predictions
β”‚   └── risk_scores.csv
β”œβ”€β”€ πŸ” attribution/             # Interpretability
β”‚   β”œβ”€β”€ modality_contributions.png
β”‚   └── patient_attribution.csv
└── πŸ“ run_info.txt            # Experiment configuration

πŸ› οΈ Command Line Arguments

Argument Description Default
--dataset Dataset to use (GBM, IPMN, NSCLC) Required
--mode Run mode (eagle, baseline, all) eagle
--epochs Number of training epochs 200
--batch-size Batch size for training 16
--lr Learning rate 1e-4
--comprehensive-attribution Enable attribution analysis False
--top-patients Number of patients for detailed analysis 5
--output-dir Output directory results/

πŸ”¬ Research Applications

EAGLE has been designed for various research applications:

  • πŸ₯ Clinical Decision Support: Risk stratification for treatment planning
  • 🧬 Biomarker Discovery: Understanding which features drive outcomes
  • πŸ“Š Comparative Studies: Built-in baseline comparisons
  • πŸ” Interpretable AI Research: Advanced attribution methods
  • 🎯 Precision Medicine: Patient-specific risk assessment

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run code formatting
black eagle/
isort eagle/

# Run linting
flake8 eagle/

# Run tests (when available)
pytest

πŸ“š Documentation

For detailed documentation, please visit our Documentation Site.

Quick Links


πŸ“ Citation

If you use EAGLE in your research, please cite our paper:

... Pending publication ...

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments


Built with ❀️ for advancing healthcare through interpretable AI

GitHub β€’ Documentation β€’ Issues β€’ Discussions

About

πŸ¦… | Efficient Alignment of Generative Language and Embedding models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages