-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Other Areas for Exploration #23
Comments
Variance between Training Rounds and within Agent performance: We have yet to really flesh out what is the right cocktail of training budget and hyperparameter selection that consistently trains an agent to closely approximate bang-bang solution on |
Model-based RL: I have not touched model-based RL at all yet, and from what I have been learning recently, it seems like it should perform well especially on the deterministic version of the fishing envs. But I have not seen any packages that implement model-based rl algorithms. |
Recurrent Policy: I have only briefly attempted to train an agent that uses a recurrent policy but was not able to make any significant progress. While this is an easy add-on within stable baselines, training recurrent policies is usually a bit more finnicky then vanilla feedforward networks -- in lstm for example, you need to play with the number of lstm cells. But it would be interesting to see if you could get an agent to learn what the final timestep is and fish to zero at this point. |
Thanks @mlap, this is a great list. Another direction of exploration is in varying the model parameters. Note that the environments include a range of configurable parameters: gym_fishing/gym_fishing/envs/fishing_cts_env.py Lines 16 to 24 in 43f0181
which you can then set when calling env = gym.make('fishing-v1', r = 0.2, K = 2, sigma = 0.02) Currently environments run with a fixed parameter set. Something we might add to the existing environments is allowing model parameters to be chosen randomly, so the agent can train across simulations with a range of Generalizing further, I've also added an environment that provides an alternative model structure: one with a tipping point instead of a logistic curve. eventually it might be nice to combine those into a meta-model so one can easily train agents across a spectrum of parametrizations... |
Offline RL: It would be possible to create a dataset from the environment that we could then use with an offline RL algorithm -- i.e. RL without interaction with the environment. This would be a good project to work on as a test case because we could then work towards using offline RL on actual fishing data. I think an interesting avenue to explore here would be if off-line algorithms can recover an optimal policy from a sub-optimal agent. |
In the course of playing around with
fishing-v1
, I've come across a few peculiar things that are worthy of exploration or at least some public disclosure.Action Space Size: In training an agent, I have found that action space size has a definite effect on agent performance. For instance, when I trained an agent on the action space [0, 2K], the agent would harvest the population to around K/2.5 when optimal level is K/2. But when I would train the agent on the action space [0, K], the agent would harvest the population to around K/2. Strangely, when I increased the training budget x2.5, even with the same model parameters, I was observing sub-optimal behavior from the agent on the [0, 2K] action space. This all seems to point to an unexpected fragility with action space selection.
Network Architecture: Also unexpectedly, I have found that deeper networks (5 hidden layers of width 100) have performed better than shallower networks (2 hidden layers of width 100). But I think this is a product of me not doing enough hyperparameter tuning with shallower networks more so than anything else. But it would be beneficial to do a thorough network architecture survey to see what is the smallest network we can get that still performs optimally as this could cut down on training time.
The text was updated successfully, but these errors were encountered: