Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mnist): implement adaptive neural network with dynamic architecture #54

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

leonvanbokhorst
Copy link
Owner

@leonvanbokhorst leonvanbokhorst commented Nov 15, 2024

This commit introduces a new adaptive MNIST neural network demo that automatically
optimizes its architecture and training parameters. Key features include:

  • Dynamic network complexity adjustment based on training performance
  • Automatic device selection (CPU/MPS) with benchmarking
  • Adaptive batch size optimization
  • Hardware-specific optimizations using PyTorch 2.0+
  • Comprehensive performance visualization
  • Advanced regularization techniques

Technical details:

  • Implements AdaptiveNeuralNet with dynamic layer management
  • Adds AdaptiveLearningSystem for automated optimization
  • Includes performance monitoring and visualization tools
  • Supports automatic device selection and benchmarking
  • Implements adaptive learning rate and regularization

Testing: Manual testing completed with MNIST dataset
Performance: Achieves >98% accuracy with automatic optimization

Summary by Sourcery

Implement an adaptive neural network for MNIST digit classification that dynamically adjusts its architecture and training parameters based on performance. The system includes features such as automatic device selection, adaptive batch size optimization, and hardware-specific optimizations. Performance visualization tools are also integrated to monitor training progress.

New Features:

  • Introduce an adaptive MNIST neural network demo that automatically optimizes its architecture and training parameters.

Enhancements:

  • Implement dynamic network complexity adjustment based on training performance.
  • Add automatic device selection and benchmarking for optimal hardware usage.
  • Incorporate adaptive batch size optimization for improved training efficiency.
  • Utilize hardware-specific optimizations using PyTorch 2.0+ for enhanced performance.
  • Include comprehensive performance visualization tools.

Tests:

  • Conduct manual testing with the MNIST dataset to ensure functionality and performance.

This commit introduces a new adaptive MNIST neural network demo that automatically
optimizes its architecture and training parameters. Key features include:

- Dynamic network complexity adjustment based on training performance
- Automatic device selection (CPU/MPS) with benchmarking
- Adaptive batch size optimization
- Hardware-specific optimizations using PyTorch 2.0+
- Comprehensive performance visualization
- Advanced regularization techniques

Technical details:
- Implements AdaptiveNeuralNet with dynamic layer management
- Adds AdaptiveLearningSystem for automated optimization
- Includes performance monitoring and visualization tools
- Supports automatic device selection and benchmarking
- Implements adaptive learning rate and regularization

Testing: Manual testing completed with MNIST dataset
Performance: Achieves >98% accuracy with automatic optimization
Copy link
Contributor

sourcery-ai bot commented Nov 15, 2024

Reviewer's Guide by Sourcery

This PR implements an adaptive neural network system for MNIST digit classification that automatically optimizes its architecture and training parameters. The implementation uses PyTorch and features dynamic network complexity adjustment, hardware-specific optimizations, and comprehensive performance monitoring. The system is built around two main classes: AdaptiveNeuralNet for the neural network architecture and AdaptiveLearningSystem for training optimization.

Sequence diagram for adaptive training process

sequenceDiagram
    actor User
    participant Main
    participant AdaptiveLearningSystem
    participant AdaptiveNeuralNet
    User->>Main: Run adaptive MNIST demo
    Main->>AdaptiveLearningSystem: Initialize with model and data loaders
    AdaptiveLearningSystem->>AdaptiveNeuralNet: Move model to optimal device
    AdaptiveLearningSystem->>AdaptiveNeuralNet: Compile model if supported
    loop Train for each epoch
        AdaptiveLearningSystem->>AdaptiveNeuralNet: Train one epoch
        AdaptiveLearningSystem->>AdaptiveNeuralNet: Evaluate model
        AdaptiveLearningSystem->>AdaptiveNeuralNet: Adapt model if needed
    end
    AdaptiveLearningSystem->>Main: Return training results
    Main->>User: Display training progress and results
Loading

Class diagram for AdaptiveNeuralNet and AdaptiveLearningSystem

classDiagram
    class AdaptiveNeuralNet {
        +int input_size
        +ModuleList layers
        +dict training_history
        +float dropout_rate
        +float learning_rate
        +int current_complexity
        +__init__(int input_size, int initial_hidden_size)
        +forward(Tensor x) Tensor
        +add_complexity()
        +add_regularization()
    }
    class AdaptiveLearningSystem {
        +AdaptiveNeuralNet model
        +str device
        +int optimal_batch_size
        +int plateau_threshold
        +float improvement_threshold
        +int max_complexity
        +__init__(AdaptiveNeuralNet model, DataLoader train_loader, DataLoader test_loader)
        +benchmark_devices(Module model, int num_iterations) str
        +find_optimal_batch_size() int
        +update_dataloader(Dataset dataset, bool train) DataLoader
        +train_epoch() (float, float)
        +calculate_loss() float
        +evaluate() float
        +check_plateau(list accuracies, float threshold) bool
        +adapt_model(int epoch)
        +train(int epochs) (list, list, list)
        +plot_training_progress()
    }
    AdaptiveLearningSystem --> AdaptiveNeuralNet
Loading

File-Level Changes

Change Details Files
Implementation of adaptive neural network architecture
  • Created base neural network with dynamic layer management
  • Added complexity increase mechanism that doubles hidden layer size
  • Implemented adaptive regularization with adjustable dropout rate
  • Added performance history tracking for adaptation decisions
pocs/adaptive_mnist_demo.py
Implementation of adaptive learning system with hardware optimization
  • Added automatic device selection between CPU and MPS
  • Implemented batch size optimization through benchmarking
  • Added PyTorch 2.0+ compilation optimization support
  • Created parallel data loading with pinned memory optimization
pocs/adaptive_mnist_demo.py
Implementation of training and adaptation logic
  • Added plateau detection for architecture adaptation
  • Implemented dynamic learning rate adjustment
  • Created comprehensive training loop with metrics tracking
  • Added visualization system for training progress
pocs/adaptive_mnist_demo.py
Added data augmentation and optimization configurations
  • Implemented MNIST dataset loading with transformations
  • Added random affine transformations for training data
  • Configured CPU thread optimization
  • Enabled cuDNN benchmarking
pocs/adaptive_mnist_demo.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@leonvanbokhorst leonvanbokhorst self-assigned this Nov 15, 2024
@leonvanbokhorst leonvanbokhorst added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 15, 2024
@leonvanbokhorst leonvanbokhorst added this to the Phase 1 milestone Nov 15, 2024
@leonvanbokhorst leonvanbokhorst merged commit ee64284 into main Nov 15, 2024
1 check failed
@leonvanbokhorst leonvanbokhorst deleted the adaptive-misnt branch November 15, 2024 16:28
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Please add comprehensive unit tests for the adaptation logic and core functionality. Manual testing alone is insufficient for this complexity level.
  • Include documentation of performance benchmarks and test results to validate the adaptation strategy effectiveness.
Here's what I looked at during the review
  • 🟡 General issues: 3 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 2 issues found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

if self.model.current_complexity < self.max_complexity:
self.model.add_complexity()
# Smaller learning rate increase
self.model.learning_rate *= 1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider using a more sophisticated learning rate adjustment strategy

The current fixed multipliers (1.1 for complexity increase, 0.98 for decay) could lead to unstable training. Consider implementing a learning rate scheduler like ReduceLROnPlateau or CosineAnnealingLR for more stable adaptation.

                    self.scheduler = optim.lr_scheduler.ReduceLROnPlateau(self.optimizer, mode='min', factor=0.1, patience=5)
                    self.model.learning_rate = self.optimizer.param_groups[0]['lr']

raise ValueError("No Linear layer found in network")

current_hidden_size = last_linear.in_features
new_hidden_size = current_hidden_size * 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (performance): Add memory safety checks when increasing network complexity

Doubling the hidden size could cause out-of-memory errors on GPU/MPS devices. Consider adding a try-except block and fallback mechanism when memory allocation fails.


print("\nBenchmarking batch sizes:")
for batch in batch_sizes:
batched_input = sample_input.repeat(batch, 1, 1, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Add error handling for batch size testing

Wrap batch size testing in try-except blocks to gracefully handle out-of-memory errors and skip unsupported batch sizes.

]
return all(abs(imp) < threshold for imp in recent_improvements)

def adapt_model(self, epoch):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting the adaptation logic into a dedicated strategy class to improve code organization.

The adapt_model method contains complex nested conditionals that make the adaptation logic hard to follow. Consider extracting this into a separate AdaptationStrategy class:

class AdaptationStrategy:
    def __init__(self, max_complexity=4, accuracy_threshold=98.0):
        self.max_complexity = max_complexity
        self.accuracy_threshold = accuracy_threshold

    def should_adapt(self, current_acc, avg_recent_acc, epoch, complexity):
        return (
            current_acc < self.accuracy_threshold
            and current_acc <= avg_recent_acc
            and epoch > complexity * 4
        )

    def get_adaptation(self, model, optimizer, current_complexity):
        if current_complexity < self.max_complexity:
            return self.increase_complexity(model, optimizer)
        return self.adjust_learning_rate(optimizer)

    def increase_complexity(self, model, optimizer):
        model.add_complexity()
        model.learning_rate *= 1.1
        return optim.Adam(
            model.parameters(),
            lr=model.learning_rate,
            weight_decay=1e-5
        )

    def adjust_learning_rate(self, optimizer):
        for param_group in optimizer.param_groups:
            param_group["lr"] *= 0.98
        return optimizer

This simplifies the adapt_model method to:

def adapt_model(self, epoch):
    if len(self.test_accuracies) < 5:
        return

    recent_accuracies = self.test_accuracies[-5:]
    current_acc = self.test_accuracies[-1]
    avg_recent_acc = sum(recent_accuracies) / len(recent_accuracies)

    if self.strategy.should_adapt(current_acc, avg_recent_acc, 
                                epoch, self.model.current_complexity):
        self.optimizer = self.strategy.get_adaptation(
            self.model, self.optimizer, self.model.current_complexity)
        self.adaptation_points.append((epoch, "Adapted Model"))

This improves maintainability by:

  1. Separating adaptation rules from execution
  2. Making thresholds and strategies configurable
  3. Reducing nesting depth
  4. Making the adaptation logic easier to test

x = layer(x)
return x

def add_complexity(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider using a pre-initialized maximum architecture with neuron activation control instead of dynamically rebuilding layers.

The dynamic layer rebuilding in add_complexity() introduces unnecessary complexity. Consider using a simpler activation-based approach with a pre-initialized maximum architecture:

class AdaptiveNeuralNet(nn.Module):
    def __init__(self, input_size=784, max_hidden_size=512):
        super().__init__()
        self.input_size = input_size
        self.flatten = nn.Flatten()

        # Initialize maximum architecture but only activate part initially
        self.hidden_layers = nn.ModuleList([
            nn.Sequential(
                nn.Linear(input_size, max_hidden_size),
                nn.BatchNorm1d(max_hidden_size),
                nn.ReLU()
            )
        ])
        self.output = nn.Linear(max_hidden_size, 10)
        self.active_hidden = max_hidden_size // 8  # Start with smaller size

    def forward(self, x):
        x = self.flatten(x)
        # Only use active portion of layers
        for layer in self.hidden_layers:
            x = layer(x)
            x = x[:, :self.active_hidden]  # Use only active neurons
        return self.output(x)

    def add_complexity(self):
        # Simply activate more neurons
        self.active_hidden = min(
            self.active_hidden * 2,
            self.hidden_layers[0][0].out_features
        )

This approach:

  1. Maintains adaptivity while being more maintainable
  2. Eliminates complex layer rebuilding
  3. Reduces potential for errors
  4. Makes the code more predictable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant