Gridworld environment made for navigation purposes. Check in the Description section for a detailed overview.
To install, do
git clone https://github.com/bmazoure/pixel_world.git
cd pixel_world
pip install -e .
The repo is structured as follows:
pixel_world/
|-env_utils.py
maps/
|- room1.txt
|- room2.txt
|- ...
The environment can build arbitrary defined maps. For example, room5_medium.txt is defined as
###########
# 0#
#### ####
# #
#S #
###########
The environment has four actions: {↑,→,↓,←} corresponding to integers from 0 to 3 and encoded using standard basis vectors, which are then added to the agent's position and clipped to the environment's dimensions.
Four action typed are allowed:
- 1d_discrete: up, down, left, right;
- 1d_horizontal: left, right;
- 1d_vertical: up, down;
- 2d_continuous: 2d real valued vector taking values from [-1,1].
The state space of the environment is the Euclidean coordinate tuple (x,y). Alternatively, the observation tensor n x m x 3, where n,m are the dimensions of the gridworld, can be returned with the flag state_type=image
.
Custom state types are allowed. Below are some default state types:
- #: Wall tile;
- S: Initial state;
- 0: Goal state, gives +R reward and ends the episode. Here, R~|N(0,1)|;
- ' ': Empty tile.
Each state is a DiscreteState
instance with the following attributes:
-
coords: {(x,y)|(x,y)∈ state space}, the Euclidean coordinates of the state;
-
color: {[0,0,0],...,[255,255,255]}, the RGB pixel values of the state;
-
terminal: {True/False}, whether the state is terminal, in which case the environment returns
done=True
; -
initial: {True,False}, whether the state is initial, i.e. these are the initial agent's coordinates;
-
accessible: {True/False}, whether the agent can access this state. If the state is unaccessible (e.g. a wall), then the agent stays in the previous state;
-
stochastic: {True/False}, whether the reward given upon transitionning into this state is sampled from a distribution, or is sampled at the beginning and remains fixed;
-
reward_pdf: A callable function which samples rewards from the specified distribution. Below are examples of reward densities:
lambda: np.abs(np.random.normal(loc=1,scale=0.1,size=1)).item()
- |N(0,1)|;lambda: 10
- Dirac at 10;lambda: np.random.uniform(0,1,size=1)).item()
- |U(0,1)|.
-
collectible: {True,False}, whether the rewards of the state are reset to the default value after the first visit during an episode.
This happens it three steps:
- Add the map text file following the constraints above into
maps/
; - (Optional) Add a new navigation alphabet to
pixel_world/envs/env_utils.py
with custom rewards, symbols etc; - In
pixel_world/envs/__init__.py
, register the new map with a unique name and a nagivation alphabet.
For any given map, we might be interested to move the goal state around to create a shared-dynamics setting.
The sampler=PixelWorld-Sampler
class takes in a template environment, from which it erases all terminal states. After calling sampler.generate(train,train_split,test_split)
, it generates all valid configurations of goal states, sorts the maps by Euclidean distance between the goal and the agent and assigns the first train_split% of the tasks to the training set and the last test_split% to the test set.