-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] Collision detection failure in Ant-UMaze #259
Comments
|
Yup, it looks like we have a flying ant, you gotta love RL :) The first attachment is a video showing an example of the ant jumping on top of the maze. The second attachment is a scatter plot which shows that when the ant is in the same
Edit: It looks like github isn't able to play my video, so here is a youtube link: https://youtube.com/shorts/pqJv8c8wTuU?feature=share flying_ants.mp4 |
accidentally closed the issue, we still need to figure out how to prevent the ant from jumping on top of the obstacles. |
rootz should not be above 1 meter the environment should terminate, are you sure you are checking for |
Yes, I am pretty sure i am checking for import numpy as np
import gymnasium_robotics
import gymnasium as gym
from typing import Tuple, Dict
class AntMazeInfoWrapper(gym.Wrapper):
def __init__(self, env, start_pos=None, goal_pos=None, noise_level=0.0):
super(AntMazeInfoWrapper, self).__init__(env)
self.start_pos = start_pos
self.goal_pos = goal_pos
self.noise_level = noise_level
self.unwrapped.position_noise_range = 0.
if self.goal_pos is not None:
self.unwrapped.maze._unique_goal_locations = [self.goal_pos]
if self.start_pos is not None:
self.unwrapped.maze._unique_reset_locations = [self.start_pos]
def reset(self) -> Tuple[np.ndarray, Dict]:
obs, info = self.env.reset()
return self._get_obs(obs), self._modify_info(info, obs)
def step(self, action) -> Tuple[np.ndarray, float, bool, Dict]:
obs, reward, terminated, truncated, info = self.env.step(action)
done = terminated or truncated
if terminated:
print(f'Terminated at {obs["achieved_goal"]} with reward {reward}')
return self._get_obs(obs), reward, done, self._modify_info(info, obs)
def _get_obs(self, obs: Dict):
pos = obs['achieved_goal']
return np.concatenate([pos, obs['observation']])
def _modify_info(self, info: Dict, obs: Dict):
info['start_pos'] = self.start_pos
info['goal_pos'] = self.goal_pos
info['pos'] = obs['achieved_goal']
info['goal'] = obs['desired_goal']
return info
def environment_builder(
level_name: str = 'AntMaze_UMaze-v5',
max_episode_steps: int = 800,
randomize_start_pos: bool = False,
randomize_goal_pos: bool = False,
noise_level: float = 0.0
):
env = gym.make(
level_name,
render_mode='rgb_array',
max_episode_steps=max_episode_steps,
continuing_task=False)
env = AntMazeInfoWrapper(
env,
start_pos=np.array([-4., -4.]) if not randomize_start_pos else None,
goal_pos=np.array([-4., 4.]) if not randomize_goal_pos else None,
noise_level=noise_level,
)
return env And here is the control loop I used to generate the video I posted in my previous post; as you can see, I am checking for the
|
Could it be because the AntMaze class is ignoring the terminated flag from the inner mujoco ant env?
|
It is indeed doing that Gymnasium-Robotics/gymnasium_robotics/envs/maze/ant_maze_v5.py Lines 293 to 294 in 3719d9d
As to why, I don't know it was never an issue before as far as I can tell The simplest solution would be to create new files |
@abagaria want to try implementing it |
@Kallinteris-Andreas happy to! |
Description of the bug
While using
AntMaze_UMaze-v5
alongside a pseudocount exploration algorithm, I noticed that the ant can go through the walls in the maze. Initially, this is a rare occurrence, but since I am training an novelty-based exploration algorithm, the agent is able to recreate the issue with greater reliability over time.Code example
Here is how I am creating the env:
Not that this should be important, but for context, I am using the TD3 algorithm and CFN for novelty-based intrinsic rewards.
Versioning
gymnasium_robotics
: 1.3.1gymnasium
: 1.0.0python
: 3.9.20Supporting Evidence
In the attached image, I have plotted the (x, y) coordinates of the ant (according to the states saved in the replay buffer). The color of each point denotes the novelty prediction, but that can be ignored for our purposes. The purple lines show where the walls should be (approximately), and the red circle highlights a trajectory that goes through the wall near the start state and exits near the goal state (which is at
(-4, 4)
).Checklist
The text was updated successfully, but these errors were encountered: