Skip to content

Commit

Permalink
Merge pull request #6 from dylan-asmar/refactor_state_space_map
Browse files Browse the repository at this point in the history
Refactor state space using a graph approach
  • Loading branch information
dylan-asmar authored Dec 11, 2023
2 parents 91e71f7 + 8e92121 commit afe8cba
Show file tree
Hide file tree
Showing 30 changed files with 1,100 additions and 570 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
Manifest.toml
.DS_Store
docs/build/
.vscode
.vscode
policy.out
model.pomdpx
12 changes: 8 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
name = "TagPOMDPProblem"
uuid = "8a653263-a1cc-4cf9-849f-f530f6ffc800"
version = "0.1.1"
version = "0.2.0"

[deps]
Graphs = "86223c79-3864-5bf0-83f7-82e725a168b6"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
MetaGraphs = "626554b9-1ddb-594c-aa3c-2596fe9399a5"
POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7"
POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"

[compat]
julia = "1.6"
Graphs = "1.9"
LinearAlgebra = "1.6"
MetaGraphs = "0.7"
POMDPTools = "0.1"
POMDPs = "0.9"
Plots = "1.23"
POMDPTools = "0.1"
julia = "1.6"
81 changes: 61 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,12 @@
[![Build Status](https://github.com/dylan-asmar/TagPOMDPProblem.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/dylan-asmar/TagPOMDPProblem.jl/actions/workflows/CI.yml)
[![codecov](https://codecov.io/gh/dylan-asmar/TagPOMDPProblem.jl/branch/main/graph/badge.svg?token=UNYWMYUBDL)](https://codecov.io/gh/dylan-asmar/TagPOMDPProblem.jl)
[![](https://img.shields.io/badge/docs-stable-blue.svg)](https://dylan-asmar.github.io/TagPOMDPProblem.jl/stable)
[![](https://img.shields.io/badge/docs-dev-blue.svg)](https://dylan-asmar.github.io/TagPOMDPProblem.jl/dev)


The Tag [1] problem with the [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl) interface.

[1] Pineau, Joelle et al. “Point-based value iteration: An anytime algorithm for POMDPs.” in *IJCAI* 2003 ([link](https://www.ijcai.org/Proceedings/03/Papers/147.pdf))



![Tag Demo](./gifs/tag_SARSOP.gif)
![Tag Demo](./gifs/default.gif)

## Installation
Use `]` to get to the package manager to add the package.
Expand All @@ -21,16 +17,14 @@ julia> ]
pkg> add TagPOMDPProblem
```


## Problem description
The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent.
The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent.

- **States**: position of the robot and target and whether the target has been tagged or not

- **Actions**: The agent can move in the four cardinal directions or perform the tag action. When performing the `tag` action, the robot does not move. The target moves during `tag` if the robot and target are not at the same location.

- **Transition model**: The movement of the agent is deterministic based on its selected action. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. See the [transitions.jl](https://github.com/dylan-asmar/TagPOMDPProblem.jl/blob/b0100ddb39b27990a70668187d6f1de8acb50f1e/src/transition.jl#L11) for details. The transition function from the original implementation can be used by passing `orig_transition_fcn = true`.

- **Transition model**: The movement of the agent is deterministic based on its selected action. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. See the `transitions.jl` for details. The transition function from the original implementation can be used by passing `transition_option=:orig`.

- **Observation model**: The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The number of observations is one more than the number of grid squares (e.g. 30 observations for the default problem).

Expand All @@ -46,31 +40,78 @@ using SARSOP # load a POMDP Solver
using POMDPGifs # to make gifs

pomdp = TagPOMDP()

solver = SARSOPSolver(; timeout=150)
policy = solve(solver, pomdp)

sim = GifSimulator(filename="test.gif", max_steps=50)
sim = GifSimulator(;
filename="default.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

![Tag Example](./gifs/test.gif)
![Tag Example](./docs/src/gifs/default.gif)


### Larger Grid
### Larger Map
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP # load a POMDP Solver
using POMDPGifs # to make gifs
using SARSOP
using POMDPGifs

map_str = """
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(;
filename="larger.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

grid = TagGrid(;bottom_grid=(12, 4), top_grid=(6, 5), top_grid_x_attach_pt=3)
pomdp = TagPOMDP(;tag_grid=grid)
![Tag Larger Map Example](./docs/src/gifs/larger.gif)

### Map with Obstacles
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP
using POMDPGifs

map_str = """
xxxxxxxxxx
xoooooooox
xoxoxxxxox
xoxoxxxxox
xoxooooxox
xoxoxxoxox
xoxoxxoxox
xoxoxxoxox
xoooooooox
xxxxxxxxxx
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(filename="test_larger.gif", max_steps=50)
sim = GifSimulator(;
filename="boundary.gif",
max_steps=50,
rng=Random.MersenneTwister(1)
)
simulate(sim, pomdp, policy)
```

![Tag Larger Grid Example](./gifs/test_larger.gif)
![Obstacle Map Example](./docs/src/gifs/boundary.gif)
5 changes: 1 addition & 4 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
[deps]
BeliefUpdaters = "8bb6e9a1-7d73-552c-a44a-e5dc5634aac4"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
POMDPModelTools = "08074719-1b2a-587c-a292-00f91cc44415"
POMDPs = "a93abf59-7444-517b-a68a-c42f96afdd7d"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
POMDPTools = "7588e00f-9cae-40de-98dc-e0c70c48cdd7"
9 changes: 4 additions & 5 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,17 @@ using Documenter

push!(LOAD_PATH, "../src/")

using Documenter, TagPOMDPProblem
using Documenter, TagPOMDPProblem, POMDPTools

makedocs(
sitename = "TagPOMDPProblem.jl",
authors="Dylan Asmar",
modules = [TagPOMDPProblem],
format = Documenter.HTML(),


doctest=false,
checkdocs=:exports
)

deploydocs(
repo = "github.com/dylan-asmar/TagPOMDPProblem.jl.git",
devbranch = "dev",
repo = "github.com/dylan-asmar/TagPOMDPProblem.jl.git"
)
Binary file added docs/src/gifs/boundary.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/gifs/default.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/gifs/larger.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
118 changes: 114 additions & 4 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,122 @@

Tag POMDP problem using [POMDPs.jl](https://github.com/JuliaPOMDP/POMDPs.jl). Original problem was presented in Pineau, Joelle et al. “Point-based value iteration: An anytime algorithm for POMDPs.” IJCAI (2003) ([online here](https://www.ijcai.org/Proceedings/03/Papers/147.pdf)).

The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent. The agent can move in the four cardinal directions or perform the tag action. The movement of the agent is deterministic based on its selected action. A reward of `step_penalty` is imposed for each motion action and the tag action results in a `tag_reward` for a successful tag and `tag_penalty` otherwise. The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. The original transition function is available by passing `orig_transition_fcn = true` during creation of the problem.
The goal of the agent is to tag the opponent by performing the tag action while in the same square as the opponent. The agent can move in the four cardinal directions or perform the tag action. The movement of the agent is deterministic based on its selected action. A reward of `step_penalty` is imposed for each motion action and the tag action results in a `tag_reward` for a successful tag and `tag_penalty` otherwise. The agent’s position is fully observable but the opponent’s position is unobserved unless both actors are in the same cell. The opponent moves stochastically according to a fixed policy away from the agent. The opponent moves away from the agent `move_away_probability` of the time and stays in the same cell otherwise. The implementation of the opponent’s movement policy varies slightly from the original paper allowing more movement away from the agent, thus making the scenario slightly more challenging. This implementation redistributes the probabilities of actions that result in hitting a wall to other actions that result in moving away. The original transition function is available by passing `transition_option=:orig` during creation of the problem.

## Manual Outline

```@contents
```

## Installation
Use `]` to get to the package manager to add the package.
```julia
julia> ]
pkg> add TagPOMDPProblem
```

## Examples

### Default Problem
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP # load a POMDP Solver
using POMDPGifs # to make gifs

pomdp = TagPOMDP()
solver = SARSOPSolver(; timeout=150)
policy = solve(solver, pomdp)
sim = GifSimulator(;
filename="default.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

![Tag Example](./gifs/default.gif)


### Larger Map
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP
using POMDPGifs

map_str = """
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
xxooooooxxxxxxx
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
ooooooooooooooo
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(;
filename="larger.gif",
max_steps=50
)
simulate(sim, pomdp, policy)
```

![Tag Larger Map Example](./gifs/larger.gif)

### Map with Obstacles
```julia
using POMDPs
using TagPOMDPProblem
using SARSOP
using POMDPGifs

map_str = """
xxxxxxxxxx
xoooooooox
xoxoxxxxox
xoxoxxxxox
xoxooooxox
xoxoxxoxox
xoxoxxoxox
xoxoxxoxox
xoooooooox
xxxxxxxxxx
"""
pomdp = TagPOMDP(;map_str=map_str)
solver = SARSOPSolver(; timeout=600)
policy = solve(solver, pomdp)

sim = GifSimulator(;
filename="boundary.gif",
max_steps=50,
rng=Random.MersenneTwister(1)
)
simulate(sim, pomdp, policy)
```

![Obstacle Map Example](./gifs/boundary.gif)


# Exported Functions
```@docs
TagPOMDP()
TagGrid()
TagState
POMDPTools.render(::TagPOMDP, ::Any)
TagPOMDP
TagGrid
TagState
```

# Internal Functions
```@docs
TagPOMDPProblem.list_actions(::TagPOMDP)
TagPOMDPProblem.create_metagraph_from_map(::String)
TagPOMDPProblem.map_str_from_metagraph(::TagPOMDP)
TagPOMDPProblem.state_from_index(::TagPOMDP, ::Int)
TagPOMDPProblem.modified_transition(::TagPOMDP, ::TagState, ::Int)
TagPOMDPProblem.orig_transition(::TagPOMDP, ::TagState, ::Int)
TagPOMDPProblem.move_direction(::TagPOMDP, ::Int, ::Int)
```
Binary file removed gifs/tag_SARSOP.gif
Binary file not shown.
Binary file removed gifs/test.gif
Binary file not shown.
Binary file removed gifs/test_larger.gif
Binary file not shown.
40 changes: 40 additions & 0 deletions scripts/check_vs_original.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""
This script compares this implementation of the Tag POMDP with the original implementation
performance using SARSOP. The original implementation is available at:
https://bigbird.comp.nus.edu.sg/pmwiki/farm/appl/index.php?n=Main.Repository
"""

using Pkg
Pkg.add("SARSOP")
Pkg.add("StatsBase")
Pkg.add("ProgressMeter")

using POMDPs
using POMDPTools
using TagPOMDPProblem
using SARSOP
using StatsBase
using ProgressMeter

sarsop_timeout = 5
num_sims = 5000

pomdp = TagPOMDP(; transition_option=:orig)
solver = SARSOPSolver(; timeout=sarsop_timeout)
policy = solve(solver, pomdp)

sim = RolloutSimulator(; max_steps=50)

rewards = []
@showprogress dt=1 desc="Running simulations..." for ii in 1:num_sims
r = simulate(sim, pomdp, policy)
push!(rewards, r)
end

# Print out the mean and 95% confidence interval
println("Original SARSOP performance: $(-6.13) +/- $(0.12)")
println("Reward (w/ 95% CI): $(mean(rewards)) +/- $(1.96 * std(rewards) / sqrt(length(rewards)))")

Pkg.rm("SARSOP")
Pkg.rm("StatsBase")
Pkg.rm("ProgressMeter")
Loading

2 comments on commit afe8cba

@dylan-asmar
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/96863

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.2.0 -m "<description of version>" afe8cba3dc4d0196c052aba95367ca5bc369ffc2
git push origin v0.2.0

Please sign in to comment.