Skip to content

bmazoure/pixel_world

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PixelWorld

Gridworld environment made for navigation purposes. Check in the Description section for a detailed overview.


Installation

To install, do

git clone https://github.com/bmazoure/pixel_world.git
cd pixel_world
pip install -e .

Description

The repo is structured as follows:

pixel_world/
 |-env_utils.py
 maps/
  |- room1.txt
  |- room2.txt
  |- ...

The environment can build arbitrary defined maps. For example, room5_medium.txt is defined as

###########
#        0#
####   ####
#         #
#S        #
###########

The environment has four actions: {,,,} corresponding to integers from 0 to 3 and encoded using standard basis vectors, which are then added to the agent's position and clipped to the environment's dimensions.

Four action typed are allowed:

  • 1d_discrete: up, down, left, right;
  • 1d_horizontal: left, right;
  • 1d_vertical: up, down;
  • 2d_continuous: 2d real valued vector taking values from [-1,1].

The state space of the environment is the Euclidean coordinate tuple (x,y). Alternatively, the observation tensor n x m x 3, where n,m are the dimensions of the gridworld, can be returned with the flag state_type=image.

Custom state types are allowed. Below are some default state types:

  • #: Wall tile;
  • S: Initial state;
  • 0: Goal state, gives +R reward and ends the episode. Here, R~|N(0,1)|;
  • ' ': Empty tile.

Each state is a DiscreteState instance with the following attributes:

  • coords: {(x,y)|(x,y)∈ state space}, the Euclidean coordinates of the state;

  • color: {[0,0,0],...,[255,255,255]}, the RGB pixel values of the state;

  • terminal: {True/False}, whether the state is terminal, in which case the environment returns done=True;

  • initial: {True,False}, whether the state is initial, i.e. these are the initial agent's coordinates;

  • accessible: {True/False}, whether the agent can access this state. If the state is unaccessible (e.g. a wall), then the agent stays in the previous state;

  • stochastic: {True/False}, whether the reward given upon transitionning into this state is sampled from a distribution, or is sampled at the beginning and remains fixed;

  • reward_pdf: A callable function which samples rewards from the specified distribution. Below are examples of reward densities:

    • lambda: np.abs(np.random.normal(loc=1,scale=0.1,size=1)).item() - |N(0,1)|;
    • lambda: 10 - Dirac at 10;
    • lambda: np.random.uniform(0,1,size=1)).item() - |U(0,1)|.
  • collectible: {True,False}, whether the rewards of the state are reset to the default value after the first visit during an episode.

Adding new maps

This happens it three steps:

  • Add the map text file following the constraints above into maps/;
  • (Optional) Add a new navigation alphabet to pixel_world/envs/env_utils.py with custom rewards, symbols etc;
  • In pixel_world/envs/__init__.py, register the new map with a unique name and a nagivation alphabet.

Experimental: Task sampler

For any given map, we might be interested to move the goal state around to create a shared-dynamics setting.

The sampler=PixelWorld-Sampler class takes in a template environment, from which it erases all terminal states. After calling sampler.generate(train,train_split,test_split), it generates all valid configurations of goal states, sorts the maps by Euclidean distance between the goal and the agent and assigns the first train_split% of the tasks to the training set and the last test_split% to the test set.