🔧 Blueprint Addon Classifier

A machine learning project to automatically classify Minecraft mods as valid Create mod addons, reducing human effort in mod validation and curation.

📋 Project Overview

Goal: Build an ML classifier that can automatically identify which Minecraft mods are legitimate Create mod addons based on mod metadata.

Dataset: ~2,000 mod records with human-verified "isValid" labels
Problem Type: Binary classification (Valid Create add-on: true/false)
Current Status: ✅ Exploration phase complete, 🚧 Production pipeline in development

�️ Project Structure

ml-create-addon-classifier/
├── 📊 data/
│   └── addons.json              # Raw dataset (~2K mod records)
├── 📓 notebooks/
│   └── 01-exploration.ipynb     # ✅ Complete EDA & baseline models
├── 🔧 src/
│   ├── api/
│   │   └── server.py           # 🚧 FastAPI inference endpoint  
│   ├── data/
│   │   └── loader.py           # 🚧 Data loading utilities
│   ├── features/
│   │   └── feature_engineering.py  # 🚧 Feature processing pipeline
│   ├── models/                 # 🚧 Model implementations
│   └── evaluation/             # 🚧 Model evaluation framework
├── 📝 scripts/
│   └── train_model.py          # 🚧 Training pipeline
├── ⚙️ config/                   # 🚧 Configuration management
└── 🧪 tests/                    # 🚧 Unit tests

Legend: ✅ Complete | 🚧 In Development | ❌ Not Started

🚀 Current Status & Key Achievements

✅ Exploration Phase Complete

Comprehensive EDA: Analyzed text patterns, categorical distributions, and numerical features
Feature Engineering: Created keyword-based features, text metrics, and categorical encodings
Baseline Models: Implemented Logistic Regression and Random Forest with educational explanations
Performance: Achieved ~80-90% accuracy with basic features
Educational Framework: Added detailed markdown cells explaining ML concepts for each code cell

🔍 Key Findings

Strong Signal: Create-specific keywords in mod names are highly predictive
Text Patterns: Valid Create add-ons follow distinct naming conventions
Feature Importance: Keyword count, author patterns, and categories are most discriminative
Data Quality: Clean dataset with consistent labeling and minimal missing values

🎯 Getting Started

Prerequisites

# Python 3.8+ required
pip install -r requirements.txt

Quick Start - Exploration

# 1. Start Jupyter
jupyter lab

# 2. Open and run the exploration notebook
notebooks/01-exploration.ipynb

Current Capabilities

✅ Data Loading: Load and preprocess mod dataset from JSON
✅ Feature Engineering: Extract text, categorical, and numerical features
✅ Baseline Classification: Train and evaluate Logistic Regression & Random Forest
✅ Model Comparison: Compare multiple algorithms with ROC curves and confusion matrices
✅ Educational Content: Comprehensive explanations of ML concepts for each analysis step

📊 Dataset Details

Source: Aggregated Minecraft mod data from CurseForge and Modrinth
Size: ~2,000 mod records with rich metadata
Features:

name: Mod name/title
description: Mod description text
author: Mod creator
categories: List of assigned categories
downloads: Download count
sources: Platform (CurseForge/Modrinth)
isValid: Target variable (human-verified Create add-on status)

Sample Record:

{
  "name": "Create",
  "description": "Aesthetic Technology that empowers the Player",
  "categories": ["decoration", "technology", "utility"],
  "downloads": 116041058,
  "isValid": true
}

🧠 Machine Learning Approach

Current Models (Baseline)

Logistic Regression: Linear classifier for interpretability and feature importance
Random Forest: Tree-based ensemble for non-linear patterns and interactions

Feature Engineering Strategy

Text Features: Create-specific keyword extraction, name/description length, word counts
Categorical Features: Author encoding, category analysis, source platform
Boolean Features: "Create" in name detection, category count features
Numerical Features: Download counts, author productivity metrics

Performance Metrics

Accuracy: Overall prediction correctness (~85-90% achieved)
AUC-ROC: Ability to distinguish between valid/invalid mods (~0.85-0.90)
Precision/Recall: Balance between false positives and false negatives
Feature Importance: Random Forest reveals most predictive features

� Next Steps & Roadmap

🚧 Phase 2: Advanced Modeling (Next Sprint)

Priority: High | Effort: 2-3 weeks

Ensemble Models: Implement XGBoost, LightGBM, and CatBoost
Text Vectorization: Add TF-IDF and word embeddings for full description analysis
Cross-Validation: Implement proper stratified k-fold validation
Hyperparameter Tuning: Grid search and Bayesian optimization
Advanced Features: Dependency parsing, version pattern analysis

🚧 Phase 3: Production Pipeline (Following Sprint)

Priority: High | Effort: 3-4 weeks

Module Implementation: Complete all src/ package implementations
Training Pipeline: Automated model training and validation scripts
Inference API: FastAPI endpoint for real-time classification
Model Persistence: Save/load trained models with versioning
Logging & Monitoring: Comprehensive logging and performance tracking

🚧 Phase 4: Deployment & Scaling (Future)

Priority: Medium | Effort: 2-3 weeks

Containerization: Docker setup for consistent deployment
Cloud Deployment: Deploy inference API to cloud platform
Batch Processing: Handle bulk mod classification efficiently
Feedback Loop: System for continuous model improvement
Web Interface: User-friendly interface for manual validation

🧪 Running Tests

# Install test dependencies
pip install pytest pytest-cov

# Run all tests (when implemented)
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

📈 Performance Benchmarks

Current Baseline Results (from exploration notebook):

Best Model: Random Forest
Accuracy: ~85-90%
AUC Score: ~0.85-0.90
Training Time: <30 seconds
Inference Time: <1ms per prediction

Target Production Goals:

Accuracy: >92%
AUC Score: >0.95
Precision: >90% (minimize false positives)
Recall: >85% (catch most valid add-ons)
Scalability: Handle 1000+ predictions/minute

🤝 Contributing

Development Workflow

Experimentation: Use notebooks (notebooks/) for rapid prototyping
Implementation: Move proven concepts to src/ modules with proper structure
Testing: Add comprehensive unit tests with >80% coverage
Documentation: Update README and add detailed docstrings

Code Standards

Style: Follow PEP 8 with Black formatter
Type Hints: Use type annotations for all function signatures
Documentation: Comprehensive docstrings and inline comments
Testing: Unit tests for all production code with pytest

📚 Educational Value

The exploration notebook (01-exploration.ipynb) includes detailed educational content:

Machine Learning Concepts: Feature engineering, model evaluation, cross-validation
Data Science Workflow: EDA, preprocessing, modeling, interpretation
Domain Knowledge: Minecraft modding ecosystem and Create mod characteristics
Best Practices: Code organization, reproducibility, visualization techniques

Perfect for: Onboarding new team members, teaching ML concepts, understanding the problem domain

🛠️ Dependencies

Core ML Stack:

pandas>=2.0.0: Data manipulation and analysis
scikit-learn>=1.3.0: Machine learning algorithms
numpy>=1.24.0: Numerical computing
matplotlib>=3.7.0, seaborn>=0.12.0: Visualization

Advanced ML (for future phases):

xgboost>=1.7.0, lightgbm>=4.0.0: Gradient boosting
transformers>=4.30.0: Neural language models
fastapi>=0.100.0: API development

🎯 Success Criteria

Technical Goals:

✅ Proof of Concept: Demonstrate feasibility (COMPLETE)
🎯 Production Model: >92% accuracy with robust evaluation
🎯 Deployment: Real-time API with <100ms response time
🎯 Scalability: Handle production traffic loads

Business Impact:

Primary: Reduce manual validation effort by 80%+
Secondary: Improve consistency in Create addon curation
Long-term: Enable automatic mod discovery and recommendation

Status: 🔬 Research Complete → 🚧 Development Phase
Next Milestone: Advanced modeling and production pipeline
Last Updated: June 2025 | Team: Spencer + AI Assistant

� Quick Start Commands

# 1. Environment setup
git clone <repo-url> && cd ml-create-addon-classifier
pip install -r requirements.txt

# 2. Run exploration (educational)
jupyter lab notebooks/01-exploration.ipynb

# 3. Future: Train production model
python scripts/train_model.py --config config/production.yaml

# 4. Future: Start inference API  
uvicorn src.api.server:app --reload

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔧 Blueprint Addon Classifier

📋 Project Overview

�️ Project Structure

🚀 Current Status & Key Achievements

✅ Exploration Phase Complete

🔍 Key Findings

🎯 Getting Started

Prerequisites

Quick Start - Exploration

Current Capabilities

📊 Dataset Details

🧠 Machine Learning Approach

Current Models (Baseline)

Feature Engineering Strategy

Performance Metrics

� Next Steps & Roadmap

🚧 Phase 2: Advanced Modeling (Next Sprint)

🚧 Phase 3: Production Pipeline (Following Sprint)

🚧 Phase 4: Deployment & Scaling (Future)

🧪 Running Tests

📈 Performance Benchmarks

🤝 Contributing

Development Workflow

Code Standards

📚 Educational Value

🛠️ Dependencies

🎯 Success Criteria

� Quick Start Commands

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
data		data
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

blueprint-site/blueprint-addon-classifier

Folders and files

Latest commit

History

Repository files navigation

🔧 Blueprint Addon Classifier

📋 Project Overview

�️ Project Structure

🚀 Current Status & Key Achievements

✅ Exploration Phase Complete

🔍 Key Findings

🎯 Getting Started

Prerequisites

Quick Start - Exploration

Current Capabilities

📊 Dataset Details

🧠 Machine Learning Approach

Current Models (Baseline)

Feature Engineering Strategy

Performance Metrics

� Next Steps & Roadmap

🚧 Phase 2: Advanced Modeling (Next Sprint)

🚧 Phase 3: Production Pipeline (Following Sprint)

🚧 Phase 4: Deployment & Scaling (Future)

🧪 Running Tests

📈 Performance Benchmarks

🤝 Contributing

Development Workflow

Code Standards

📚 Educational Value

🛠️ Dependencies

🎯 Success Criteria

� Quick Start Commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages