CoCo Pipe is a comprehensive Python framework designed for advanced processing and analysis of bio M/EEG data. It seamlessly integrates traditional machine learning, deep learning, and signal processing techniques into a unified pipeline architecture. Key features include:
- Flexible Data Processing: Support for various data formats (tabular, M/EEG, embeddings) with automated preprocessing and feature extraction
- Advanced ML Capabilities: Integrated classification and regression pipelines with automated feature selection and hyperparameter optimization
- Modular Design: Easy-to-extend architecture for adding custom processing steps, models, and analysis methods
- Experiment Management: Built-in tools for experiment configuration, reproducibility, and results tracking
- Visualization & Reporting: Comprehensive visualization tools and automated report generation for both signal processing and ML results
- Scientific Workflow: End-to-end support for neuroimaging research, from raw data processing to publication-ready results
Whether you're conducting clinical research, developing ML models for brain-computer interfaces, or exploring neural signal patterns, CoCo Pipe provides the tools and flexibility to streamline your workflow.
-
Clone the Repository:
git clone https://github.com/BabaSanfour/coco-pipe.git cd coco-pipe
-
(Optional) Create and Activate a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Install the Package in Editable Mode:
pip install -e .
CoCo Pipe provides two main ways to use the ML module:
You can use the ML module directly in your Python scripts by importing from coco_pipe.io
for data loading/feature selection and coco_pipe.ml
for machine learning pipelines:
from coco_pipe.io import load, select_features
from coco_pipe.ml import MLPipeline
# Load your data
X, y = load(
type="tabular", # Supports: 'tabular', 'embeddings', 'meeg'
data_path="data/your_dataset.csv",
)
# Optionally select specific features
X, y = select_features(
df=X, # Your feature DataFrame
target_columns=y, # Target variable(s)
covariates=["age", "sex"], # Optional demographic/clinical variables
spatial_units=["left_frontal", "right_frontal"], # Brain regions/sensors
feature_names=["alpha", "beta"] # Features to include
)
# Configure and run ML pipeline
config = {
"task": "classification", # or 'regression'
"analysis_type": "baseline", # Options: 'baseline', 'feature_selection', 'hp_search', 'hp_search_fs'
"models": "all", # or list of specific models
"metrics": ["accuracy", "f1-score"],
"cv_strategy": "stratified",
"n_splits": 5,
"n_features": 10, # For feature selection
"direction": "forward", # For feature selection
"search_type": "grid", # For hyperparameter search
"n_iter": 100, # For random search
"scoring": "accuracy",
"n_jobs": -1
}
pipeline = MLPipeline(X=X, y=y, config=config)
results = pipeline.run()
For batch processing or experiment management, use the CLI tool with a YAML configuration file:
# -----------------------------------------------------------------------------
# Toy config for MLPipeline
# -----------------------------------------------------------------------------
# Global parameters shared across analyses
global_experiment_id: "toy_ml_config"
data_path: "../datasets/toy_dataset.csv"
results_dir: "../results"
results_file: "toy_ml_config"
# Default analysis parameters (can be overridden per analysis)
defaults:
random_state: 42
n_jobs: -1
cv_kwargs:
strategy: "stratified"
n_splits: 5
shuffle: true
random_state: 42
covariates: ["age"]
spatial_units: ["regionX", "regionY"]
feature_names: ["feat1", "feat2", "feat3"]
# List of analyses to run
analyses:
- id: "classification_baseline"
task: "classification"
analysis_type: "baseline"
target_columns: ["target_class"]
row_filter:
- column: "age"
values: 13
operator: ">"
- column: "sex"
values: ["male"]
models:
- "Logistic Regression"
- "Random Forest"
metrics:
- "accuracy"
- "roc_auc"
- id: "regression_hp_search"
task: "regression"
analysis_type: "hp_search"
target_columns: ["target_reg"]
feature_names: ["feat1"]
spatial_units: ["regionX"]
models: "all"
metrics:
- "r2"
- "neg_mse"
cv_kwargs:
strategy: "kfold"
n_splits: 3
search_type: "grid"
n_iter: 20
scoring: "r2"
Run the analysis using:
python scripts/run_ml.py --config configs/your_config.yml
The pipeline will:
- Load and preprocess your data
- Run all specified analyses
- Save results for each model/analysis
- Generate a combined results file
Full documentation for CoCo Pipe is available at:
https://cocopipe.readthedocs.io/en/latest/index.html
Contributions are welcome! If you have suggestions or find any bugs, please open issues or submit pull requests.
TODO