Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement advanced CartPole and Bayesian MNIST training modules #62

Merged
merged 14 commits into from
Nov 24, 2024

Conversation

leonvanbokhorst
Copy link
Owner

@leonvanbokhorst leonvanbokhorst commented Nov 24, 2024

Summary by Sourcery

Introduce a new CartPole environment with advanced dynamics and a personality-based agent system. Implement a Bayesian learning rate adapter for adaptive training. Update dependencies to include gymnasium and related packages.

New Features:

  • Introduce a new CartPole environment with advanced wind dynamics, realistic physics, and adaptive control mechanisms.
  • Add a Bayesian learning rate adapter for dynamic adjustment of learning rates based on model performance.
  • Implement a personality-based CartPole agent with different behavioral styles.

Enhancements:

  • Enhance the CartPole environment with detailed performance tracking and sophisticated reward shaping.

Build:

  • Add new dependencies 'gymnasium', 'pygame', and 'gymnasium[other]' to the requirements.

- Added state history for adaptation
- Implemented enhanced feature extraction with dropout
- Added adaptive control branch for improved performance
- Updated position bias and velocity influence calculations
- Adjusted temperature adjustment for stability
- Improved final logits combination
- Introduced progressive difficulty and adaptive learning
- Implemented dynamic reward shaping based on performance
- Increased gust strength from 0.03 to 0.10
- Increased wind change rate from 0.04 to 0.06
- Doubled gust strength from 0.05 to 0.10
- Increased turbulence from 0.02 to 0.05
- Increased wind change rate from 0.995 to 0.99
- Increased random variation from 0.04 to 0.07
- Increased gust strength range from (0.6, 1.2) to (0.8, 1.4)
- Decreased disturbance countdown range from (70, 130) to (50, 100)
Copy link
Contributor

sourcery-ai bot commented Nov 24, 2024

Reviewer's Guide by Sourcery

This PR introduces a sophisticated CartPole environment implementation with advanced physics and AI training capabilities, along with additional utilities for Bayesian learning and personality-based approaches. The implementation significantly extends the classic CartPole problem with complex wind dynamics, realistic physics, and enhanced reward shaping.

Class diagram for CartPoleWithDisturbances

Loading
classDiagram
    class CartPoleWithDisturbances {
        -bool recovery_window
        -int recovery_attempts
        -int successful_recoveries
        -float last_wind_force
        -float wind_buildup
        -float gust_strength
        -float wind_change_rate
        -float current_wind
        -int wind_direction
        -float turbulence
        -float max_wind_force
        -int gust_count
        -int direction_changes
        -int strength_changes
        -int steps_beyond_done
        +reset(**kwargs)
        +step(action)
    }

Class diagram for PolicyNetwork

Loading
classDiagram
    class PolicyNetwork {
        -Sequential features
        -Sequential adaptive_net
        -Sequential value_head
        -Sequential policy_head
        -Sequential velocity_pred
        -float temperature
        +forward(x)
    }

Class diagram for AIStats

Loading
classDiagram
    class AIStats {
        -defaultdict decisions
        -list reaction_times
        -list stability_scores
        -int recovery_count
        -list position_history
        -int oscillations
        -int time_in_danger_zone
        -int max_recovery_time
        -list energy_usage
        -last_action
        -list recovery_windows
        +add_decision(action)
        +add_reaction_time(time_ms)
        +add_stability(state)
        +add_position(state)
        +log_recovery(old_state, new_state)
        +add_state(state, action)
        +get_summary()
    }

Class diagram for BayesianLearningRateAdapter

Loading
classDiagram
    class BayesianLearningRateAdapter {
        -float alpha
        -float beta
        -float base_lr
        -list lr_history
        -list confidence_history
        -float current_lr
        -previous_loss
        +update_from_batch(current_loss)
    }

Class diagram for BayesianMNISTTrainer

Loading
classDiagram
    class BayesianMNISTTrainer {
        -Sequential model
        -device
        -Adam optimizer
        -BayesianLearningRateAdapter lr_adapter
        -list loss_history
        +train_epoch(dataloader, epoch)
    }

Class diagram for PersonalityNetwork

Loading
classDiagram
    class PersonalityNetwork {
        -personality
        -Sequential network
        -float temperature
        +forward(x)
    }

File-Level Changes

Change Details Files
Added a sophisticated CartPole environment implementation with advanced wind dynamics and physics
  • Implemented complex wind modeling with base wind force, gusts, turbulence, and oscillating patterns
  • Added realistic physics with momentum, damping, and position-based force scaling
  • Implemented recovery mechanics with recovery windows and progressive assistance
  • Added detailed performance tracking and statistics monitoring
gym/cart_pole.py
Implemented a deep learning policy network for CartPole control
  • Created a multi-branch neural network architecture with specialized pathways
  • Implemented adaptive control mechanisms with dynamic temperature scaling
  • Added safety mechanisms for emergency responses and risk-based interventions
  • Implemented training utilities with experience replay and reward shaping
gym/cart_pole.py
Added a Bayesian learning rate adaptation system for MNIST training
  • Implemented Bayesian learning rate adaptation with confidence tracking
  • Created a CNN-based MNIST trainer with dynamic learning rate adjustment
  • Added visualization utilities for training progress monitoring
bayes/bayes_minst_trainer.py
Added personality-based CartPole implementations
  • Created different personality types for CartPole agents
  • Implemented temperature-based behavior modification
  • Added a personality contest demonstration system
gym/cart_pole_personalities.py
Updated project dependencies
  • Added Gymnasium and PyGame dependencies
  • Added support for additional Gymnasium environments
requirements.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai bot changed the title @sourcery-ai Implement advanced CartPole and Bayesian MNIST training modules Nov 24, 2024
@leonvanbokhorst leonvanbokhorst self-assigned this Nov 24, 2024
@leonvanbokhorst leonvanbokhorst added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 24, 2024
@leonvanbokhorst leonvanbokhorst added this to the Phase 1 milestone Nov 24, 2024
@leonvanbokhorst leonvanbokhorst merged commit 5933852 into main Nov 24, 2024
1 check failed
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding some basic unit tests for the core functionality, particularly for the BayesianLearningRateAdapter and PolicyNetwork classes.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

ax1.scatter(trainer.lr_adapter.confidence_history,
trainer.lr_adapter.lr_history,
alpha=0.5, s=1)
ax1.set_xlabel('Confidence')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Extract duplicate code into function (extract-duplicate-method)

self.steps_beyond_done = 0
return super().reset(**kwargs)

def step(self, action):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Low code quality found in CartPoleWithDisturbances.step - 18% (low-code-quality)


ExplanationThe quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

  • Reduce the function length by extracting pieces of functionality out into
    their own functions. This is the most important thing you can do - ideally a
    function should be less than 10 lines.
  • Reduce nesting, perhaps by introducing guard clauses to return early.
  • Ensure that variables are tightly scoped, so that code using related concepts
    sits together within the function rather than being scattered.

avg_position = mean(self.position_history) if self.position_history else 0
max_pos = max(self.position_history) if self.position_history else 0

summary = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant