-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement advanced CartPole and Bayesian MNIST training modules #62
Conversation
- Added state history for adaptation - Implemented enhanced feature extraction with dropout - Added adaptive control branch for improved performance - Updated position bias and velocity influence calculations - Adjusted temperature adjustment for stability - Improved final logits combination - Introduced progressive difficulty and adaptive learning - Implemented dynamic reward shaping based on performance
- Increased gust strength from 0.03 to 0.10 - Increased wind change rate from 0.04 to 0.06 - Doubled gust strength from 0.05 to 0.10 - Increased turbulence from 0.02 to 0.05 - Increased wind change rate from 0.995 to 0.99 - Increased random variation from 0.04 to 0.07 - Increased gust strength range from (0.6, 1.2) to (0.8, 1.4) - Decreased disturbance countdown range from (70, 130) to (50, 100)
… CartPoleWithDisturbances
…oleWithDisturbances
Reviewer's Guide by SourceryThis PR introduces a sophisticated CartPole environment implementation with advanced physics and AI training capabilities, along with additional utilities for Bayesian learning and personality-based approaches. The implementation significantly extends the classic CartPole problem with complex wind dynamics, realistic physics, and enhanced reward shaping. Class diagram for CartPoleWithDisturbancesclassDiagram
class CartPoleWithDisturbances {
-bool recovery_window
-int recovery_attempts
-int successful_recoveries
-float last_wind_force
-float wind_buildup
-float gust_strength
-float wind_change_rate
-float current_wind
-int wind_direction
-float turbulence
-float max_wind_force
-int gust_count
-int direction_changes
-int strength_changes
-int steps_beyond_done
+reset(**kwargs)
+step(action)
}
Class diagram for PolicyNetworkclassDiagram
class PolicyNetwork {
-Sequential features
-Sequential adaptive_net
-Sequential value_head
-Sequential policy_head
-Sequential velocity_pred
-float temperature
+forward(x)
}
Class diagram for AIStatsclassDiagram
class AIStats {
-defaultdict decisions
-list reaction_times
-list stability_scores
-int recovery_count
-list position_history
-int oscillations
-int time_in_danger_zone
-int max_recovery_time
-list energy_usage
-last_action
-list recovery_windows
+add_decision(action)
+add_reaction_time(time_ms)
+add_stability(state)
+add_position(state)
+log_recovery(old_state, new_state)
+add_state(state, action)
+get_summary()
}
Class diagram for BayesianLearningRateAdapterclassDiagram
class BayesianLearningRateAdapter {
-float alpha
-float beta
-float base_lr
-list lr_history
-list confidence_history
-float current_lr
-previous_loss
+update_from_batch(current_loss)
}
Class diagram for BayesianMNISTTrainerclassDiagram
class BayesianMNISTTrainer {
-Sequential model
-device
-Adam optimizer
-BayesianLearningRateAdapter lr_adapter
-list loss_history
+train_epoch(dataloader, epoch)
}
Class diagram for PersonalityNetworkclassDiagram
class PersonalityNetwork {
-personality
-Sequential network
-float temperature
+forward(x)
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding some basic unit tests for the core functionality, particularly for the BayesianLearningRateAdapter and PolicyNetwork classes.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
ax1.scatter(trainer.lr_adapter.confidence_history, | ||
trainer.lr_adapter.lr_history, | ||
alpha=0.5, s=1) | ||
ax1.set_xlabel('Confidence') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Extract duplicate code into function (extract-duplicate-method
)
self.steps_beyond_done = 0 | ||
return super().reset(**kwargs) | ||
|
||
def step(self, action): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Low code quality found in CartPoleWithDisturbances.step - 18% (low-code-quality
)
Explanation
The quality score for this function is below the quality threshold of 25%.This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
avg_position = mean(self.position_history) if self.position_history else 0 | ||
max_pos = max(self.position_history) if self.position_history else 0 | ||
|
||
summary = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Inline variable that is immediately returned (inline-immediately-returned-variable
)
Summary by Sourcery
Introduce a new CartPole environment with advanced dynamics and a personality-based agent system. Implement a Bayesian learning rate adapter for adaptive training. Update dependencies to include gymnasium and related packages.
New Features:
Enhancements:
Build: