Other Areas for Exploration #23

mlap · 2020-09-04T16:19:12Z

In the course of playing around with fishing-v1, I've come across a few peculiar things that are worthy of exploration or at least some public disclosure.

Action Space Size: In training an agent, I have found that action space size has a definite effect on agent performance. For instance, when I trained an agent on the action space [0, 2K], the agent would harvest the population to around K/2.5 when optimal level is K/2. But when I would train the agent on the action space [0, K], the agent would harvest the population to around K/2. Strangely, when I increased the training budget x2.5, even with the same model parameters, I was observing sub-optimal behavior from the agent on the [0, 2K] action space. This all seems to point to an unexpected fragility with action space selection.

Network Architecture: Also unexpectedly, I have found that deeper networks (5 hidden layers of width 100) have performed better than shallower networks (2 hidden layers of width 100). But I think this is a product of me not doing enough hyperparameter tuning with shallower networks more so than anything else. But it would be beneficial to do a thorough network architecture survey to see what is the smallest network we can get that still performs optimally as this could cut down on training time.

The text was updated successfully, but these errors were encountered:

mlap · 2020-09-08T12:52:06Z

Variance between Training Rounds and within Agent performance: We have yet to really flesh out what is the right cocktail of training budget and hyperparameter selection that consistently trains an agent to closely approximate bang-bang solution on fishing-v1 with SAC. One could certainly start exploring this problem with other algorithms. From an intuitive sense, SAC may not be the best algorithm to use on a deterministic instantiation of v1 as SAC uses a stochastic policy.

mlap · 2020-10-07T23:31:33Z

Model-based RL: I have not touched model-based RL at all yet, and from what I have been learning recently, it seems like it should perform well especially on the deterministic version of the fishing envs. But I have not seen any packages that implement model-based rl algorithms.

mlap · 2020-10-12T19:11:50Z

Recurrent Policy: I have only briefly attempted to train an agent that uses a recurrent policy but was not able to make any significant progress. While this is an easy add-on within stable baselines, training recurrent policies is usually a bit more finnicky then vanilla feedforward networks -- in lstm for example, you need to play with the number of lstm cells. But it would be interesting to see if you could get an agent to learn what the final timestep is and fish to zero at this point.

cboettig · 2020-10-14T05:03:45Z

Thanks @mlap, this is a great list.

Another direction of exploration is in varying the model parameters. Note that the environments include a range of configurable parameters:

gym_fishing/gym_fishing/envs/fishing_cts_env.py

Lines 16 to 24 in 43f0181

    
           def __init__(self, 
        
                        K = 1.0, 
        
                        r = 0.3, 
        
                        price = 1.0, 
        
                        sigma = 0.0, 
        
                        init_state = 0.75, 
        
                        init_harvest = 0.0125, 
        
                        Tmax = 100, 
        
                        file = None):

which you can then set when calling gym.make. e.g. to add process noise, decrease growth rate r and increase carrying capacity K, you might do:

env = gym.make('fishing-v1', r = 0.2, K = 2, sigma = 0.02)

Currently environments run with a fixed parameter set. Something we might add to the existing environments is allowing model parameters to be chosen randomly, so the agent can train across simulations with a range of r, K, etc. We can then see how well such an agent performs when given different r and K values (possibly including those outside it's range -- which you might consider a trivial example of 'transfer learning'?)

Generalizing further, I've also added an environment that provides an alternative model structure: one with a tipping point instead of a logistic curve. eventually it might be nice to combine those into a meta-model so one can easily train agents across a spectrum of parametrizations...

mlap · 2020-10-14T05:30:23Z

Offline RL: It would be possible to create a dataset from the environment that we could then use with an offline RL algorithm -- i.e. RL without interaction with the environment. This would be a good project to work on as a test case because we could then work towards using offline RL on actual fishing data. I think an interesting avenue to explore here would be if off-line algorithms can recover an optimal policy from a sub-optimal agent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Other Areas for Exploration #23

Other Areas for Exploration #23

mlap commented Sep 4, 2020 •

edited

Loading

mlap commented Sep 8, 2020 •

edited

Loading

mlap commented Oct 7, 2020

mlap commented Oct 12, 2020

cboettig commented Oct 14, 2020

mlap commented Oct 14, 2020 •

edited

Loading

Other Areas for Exploration #23

Other Areas for Exploration #23

Comments

mlap commented Sep 4, 2020 • edited Loading

mlap commented Sep 8, 2020 • edited Loading

mlap commented Oct 7, 2020

mlap commented Oct 12, 2020

cboettig commented Oct 14, 2020

mlap commented Oct 14, 2020 • edited Loading

mlap commented Sep 4, 2020 •

edited

Loading

mlap commented Sep 8, 2020 •

edited

Loading

mlap commented Oct 14, 2020 •

edited

Loading