-
Notifications
You must be signed in to change notification settings - Fork 66
Creating Environments
nes-py provides an interface to building custom OpenAI Gym environments for individual NES games in pure Python. This page provides a reference for this interface with some examples based on Super Mario Bros.
Use this stub code when defining your own interfaces. It utilizes designs that are backward compatible with python 2, which is highly recommended as nes-py is python 2 compatible.
"""An OpenAI Gym interface to the NES game <TODO: Game Name>"""
from nes_py import NESEnv
class FooGame(NESEnv):
"""An OpenAI Gym interface to the NES game <TODO: Game Name>"""
def __init__(self):
"""Initialize a new <TODO: Game Name> environment."""
super(FooGame, self).__init__('TODO: path to ROM for the game')
# setup any variables to use in the below callbacks here
def _will_reset(self):
"""Handle any RAM hacking after a reset occurs."""
# use this method to perform setup before and episode resets.
# the method returns None
pass
def _did_reset(self):
"""Handle any RAM hacking after a reset occurs."""
# use this method to access the RAM of the emulator
# and perform setup for each episode.
# the method returns None
pass
def _did_step(self, done):
"""
Handle any RAM hacking after a step occurs.
Args:
done: whether the done flag is set to true
Returns:
None
"""
pass
def _get_reward(self):
"""Return the reward after a step occurs."""
return 0
def _get_done(self):
"""Return True if the episode is over, False otherwise."""
return False
def _get_info(self):
"""Return the info after a step occurs."""
return {}
# explicitly define the outward facing API for the module
__all__ = [FooGame.__name__]
The reset lifecycle executes in order like this pseudocode:
_will_reset()
reset()
_did_reset()
obs = screen
return obs
The step lifecycle executes in order like this pseudocode:
reward = 0
done = False
info = {}
for _ in range(frameskip):
step()
reward += _get_reward()
done = done or _get_done()
info = _get_info()
_did_step()
obs = screen
return obs, reward, done, info
NESEnv features methods to directly interact with the underlying NES emulator.
The RAM behaves like any other NumPy vector.
self.ram[address]
self.ram[address] = value
self._frame_advance(action)
This method takes an action similar to step just to advance a frame. Use it in the lifecycle callbacks to skip frames that aren't meant for the agent (e.g. loading screens, cutscenes, animations, etc.)
self._backup()
This method creates a backup state that can be restored arbitrarily. It can be used to create an initial state after some proprietary steps. When a backup exists, calls to reset will restore the backup state as the initial point.
See gym-super-mario-bros for an example project for Super Mario Bros.