Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other Areas for Exploration #23

Open
mlap opened this issue Sep 4, 2020 · 5 comments
Open

Other Areas for Exploration #23

mlap opened this issue Sep 4, 2020 · 5 comments

Comments

@mlap
Copy link
Collaborator

mlap commented Sep 4, 2020

In the course of playing around with fishing-v1, I've come across a few peculiar things that are worthy of exploration or at least some public disclosure.

Action Space Size: In training an agent, I have found that action space size has a definite effect on agent performance. For instance, when I trained an agent on the action space [0, 2K], the agent would harvest the population to around K/2.5 when optimal level is K/2. But when I would train the agent on the action space [0, K], the agent would harvest the population to around K/2. Strangely, when I increased the training budget x2.5, even with the same model parameters, I was observing sub-optimal behavior from the agent on the [0, 2K] action space. This all seems to point to an unexpected fragility with action space selection.

Network Architecture: Also unexpectedly, I have found that deeper networks (5 hidden layers of width 100) have performed better than shallower networks (2 hidden layers of width 100). But I think this is a product of me not doing enough hyperparameter tuning with shallower networks more so than anything else. But it would be beneficial to do a thorough network architecture survey to see what is the smallest network we can get that still performs optimally as this could cut down on training time.

@mlap
Copy link
Collaborator Author

mlap commented Sep 8, 2020

Variance between Training Rounds and within Agent performance: We have yet to really flesh out what is the right cocktail of training budget and hyperparameter selection that consistently trains an agent to closely approximate bang-bang solution on fishing-v1 with SAC. One could certainly start exploring this problem with other algorithms. From an intuitive sense, SAC may not be the best algorithm to use on a deterministic instantiation of v1 as SAC uses a stochastic policy.

@mlap
Copy link
Collaborator Author

mlap commented Oct 7, 2020

Model-based RL: I have not touched model-based RL at all yet, and from what I have been learning recently, it seems like it should perform well especially on the deterministic version of the fishing envs. But I have not seen any packages that implement model-based rl algorithms.

@mlap
Copy link
Collaborator Author

mlap commented Oct 12, 2020

Recurrent Policy: I have only briefly attempted to train an agent that uses a recurrent policy but was not able to make any significant progress. While this is an easy add-on within stable baselines, training recurrent policies is usually a bit more finnicky then vanilla feedforward networks -- in lstm for example, you need to play with the number of lstm cells. But it would be interesting to see if you could get an agent to learn what the final timestep is and fish to zero at this point.

@cboettig
Copy link
Member

Thanks @mlap, this is a great list.

Another direction of exploration is in varying the model parameters. Note that the environments include a range of configurable parameters:

def __init__(self,
K = 1.0,
r = 0.3,
price = 1.0,
sigma = 0.0,
init_state = 0.75,
init_harvest = 0.0125,
Tmax = 100,
file = None):

which you can then set when calling gym.make. e.g. to add process noise, decrease growth rate r and increase carrying capacity K, you might do:

env = gym.make('fishing-v1', r = 0.2, K = 2, sigma = 0.02)

Currently environments run with a fixed parameter set. Something we might add to the existing environments is allowing model parameters to be chosen randomly, so the agent can train across simulations with a range of r, K, etc. We can then see how well such an agent performs when given different r and K values (possibly including those outside it's range -- which you might consider a trivial example of 'transfer learning'?)

Generalizing further, I've also added an environment that provides an alternative model structure: one with a tipping point instead of a logistic curve. eventually it might be nice to combine those into a meta-model so one can easily train agents across a spectrum of parametrizations...

@mlap
Copy link
Collaborator Author

mlap commented Oct 14, 2020

Offline RL: It would be possible to create a dataset from the environment that we could then use with an offline RL algorithm -- i.e. RL without interaction with the environment. This would be a good project to work on as a test case because we could then work towards using offline RL on actual fishing data. I think an interesting avenue to explore here would be if off-line algorithms can recover an optimal policy from a sub-optimal agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants