-
Notifications
You must be signed in to change notification settings - Fork 66
Wrappers
nes-py includes a full collection of OpenAI Gym wrappers for standard functionality. These functions are located in the nes_py.wrappers
module. Functionality covered includes:
- easily converting the 256 action binary space to a smaller space
- clipping rewards in {-1, 0, 1}
- downsampling frames to B&W with a smaller size via linear interpolation
- stacking the previous k frames into a frame tensor
- normalizing rewards in [-1, 1] using the L-infinity norm
- penalizing deaths using the done flag
- caching unmodified rewards for analysis
There's also a single wrap method for combining them together in a streamlined fashion.
A wrapper to define a much smaller action space. The initializer takes an environment and a list of button lists:
from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
env = BinarySpaceToDiscreteSpaceEnv(env, [
['NOP'],
['right', 'A'],
['left', 'A'],
['A'],
])
Each button list should contain at least one of:
'right'
'left'
'down'
'up'
'start'
'select'
'B'
'A'
'NOP'
It is recommended for custom environments to use this wrapper by default. If there are multiple mappings, the package should provide each list and either register a different environment for each, or make note in the documentation for end users. See gym-super-mario-bros for an example of multiple button lists in the actions.py
module.
A wrapper to clip rewards based on their sign. i.e.
reward = 1 if reward > 0
reward = -1 if reward < 0
reward = 0 if reward == 0
from nes_py.wrappers import ClipRewardEnv
env = ClipRewardEnv(env)
A wrapper to convert RGB frames to Y-channel (Black & White) and resize them. It takes an environment and an image size as a tuple of height, width. By default, the image size is set to 84, 84.
from nes_py.wrappers import DownsampleEnv
env = DownsampleEnv(env, (100, 120))
A wrapper to stack the previous k
frames into an image tensor for each step. It takes an environment and a parameter k
as the number of frames to stack.
from nes_py.wrappers import FrameStackEnv
env = FrameStackEnv(env, 4)
A wrapper to normalize reward about the environment's L-infinity norm. This wrapper only works for environments that have a finite reward_range
variable (e.g. reward_range = (-15, 15)
). It divides rewards about the value in the reward range with the largest absolute value.
from nes_py.wrappers import NormalizeRewardEnv
env = NormalizeRewardEnv(env)
A wrapper to penalize the agent for terminating episode (i.e. set the reward to a custom penalty when done is True). It takes an environment and a penalty
value in its constructor.
from nes_py.wrappers import PenalizeDeathEnv
env = PenalizeDeathEnv(env, -15)
A wrapper to cache cumulative episode rewards in a list for analysis later. This wrapper is useful to keep track of un-mutated (raw) rewards from the environment. It cumulates the reward over an episode, then caches the value in a list.
from nes_py.wrappers import RewardCacheEnv
env = RewardCacheEnv(env)
The wrapper will add a parameter episode_rewards
to the env
and append the cumulative reward at the end of each episode.
This wrapper should be the first wrapper used before any other reward mutators if you wish to observe the raw values.
The wrap
method provides a quick way to wrap an environment using any or all of the above wrappers, with exception of the BinarySpaceToDiscreteSpaceEnv
, which is special use. It takes the following parameters with default values:
Parameter | Default | Description |
---|---|---|
cache_rewards | True |
whether to use the reward cache wrapper |
image_size | (84,84) |
the image size for the downsample wrapper (None will disable it) |
death_penalty | -15 |
the value for the death penalty wrapper (None will disable it) |
clip_rewards | False |
whether to use the clip rewards wrapper |
normalize_rewards | False |
whether to use the normalize rewards wrapper |
agent_history_length | 4 |
the value for the frame stack wrapper (None will disable it) |
from nes_py.wrappers import wrap
env = wrap(env,
cache_rewards=False,
image_size=(120, 100),
death_penalty=None,
clip_rewards=True,
normalize_rewards=False,
agent_history_length=3,
)