[Bug]: Using the start value of Discrete spaces has no effect #2052

JoshuaBluem · 2024-12-03T05:51:24Z

🐛 Bug

For example spaces.Discrete(3, start=-1) does not produce actions in {-1, 0, 1} but rather in {0, 1, 2} in both the step function and the predict function.

To Reproduce

Code Sample: Enviroment with a Discrete Action starting not from zero

from typing import Any, Optional, SupportsFloat
import gymnasium as gym
import numpy as np
from gymnasium import spaces
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

'''
Trains Agents to repeat a number
This enviroment returns '11 if current state is 1' and '10 if state is 0'
Current state is chosen randomly
Each episode is only one choice and gets terminated after
'''

class CustomEnv(gym.Env):
    def __init__(self, render_mode='human'):
        super(CustomEnv, self).__init__()
        self.observation_space = spaces.Discrete(2)  # Observation space (0 or 1)
        **self.action_space = spaces.Discrete(2, start=10)**       # Action space (10 or 11)
        self.current_state = np.random.choice([0, 1])    # Initial state (random 0 or 1)
        self.render_mode = render_mode

    def step(self, action) -> tuple[object, SupportsFloat, bool, bool, dict[str, Any]]:
        # Execute action and compute reward based on repeating numbers
        **print(f"recieved action: {action}")**
        reward = 1 if action-10 == self.current_state else 0 # create reward out of state
        self.current_state = np.random.choice([0, 1]) # create new state for next step
        
        terminated = True  # If episode ends (agent achieved the final goal or reached the end)
        truncated = False   # If current episode should be discarded, because e.g. training time reached max limit

        return self.current_state, reward, terminated, truncated, {}

    def reset(self, *,
        seed: Optional[int] = None,
        options: Optional[dict[str, Any]] = None
    ) -> tuple[object, dict[str, Any]]:
        self.current_state = np.random.choice([0, 1]) # Reset to initial state
        return self.current_state, {}

    def render(self) -> str | list[str] | None:
        if self.render_mode == 'human':
            print(f"Current state: {self.current_state}")
        elif self.render_mode == 'ansi':
            return str(self.current_state)
        else:
            print(f"unknown renderer mode: {self.render_mode}")

env = DummyVecEnv([lambda: CustomEnv(render_mode='ansi')]) 
model = PPO("MlpPolicy", env)
model.learn(total_timesteps=1000, reset_num_timesteps=False)

Relevant log output / Error message

The recieved actions should be "10's and 11's" and not "0's and 1's", due to action space starting at spaces.Discrete

System Info

OS: Windows-10-10.0.19045-SP0 10.0.19045
Python: 3.11.9
Stable-Baselines3: 2.3.2
PyTorch: 2.4.0+cu118
GPU Enabled: True
Numpy: 1.26.4
Cloudpickle: 3.0.0
Gymnasium: 0.29.1
OpenAI Gym: 0.25.1

Checklist

My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

JoshuaBluem · 2024-12-03T20:44:34Z

Hey @araffin , this is my first interaction with public git repositories. Could you help me and tell me, what I am missing on this issue, when you comment "You have checked the required items in the checklist but you didn't do whats written ...".
Also considering you flagged this issue as duplicate, could you link the issue that is related to this problem?
I would really appreciate your help!! I just want to help this project.

araffin · 2024-12-04T08:41:16Z

Hello,

a quick search in the issues will give you #913 and all the others #1295 #1378

We currently also have an explicit warning in our env checker:

stable-baselines3/stable_baselines3/common/env_checker.py

Lines 27 to 41 in 9caa168

    
           def _check_non_zero_start(space: spaces.Space, space_type: str = "observation", key: str = "") -> None: 
        
               """ 
        
               :param space: Observation or action space 
        
               :param space_type: information about whether it is an observation or action space 
        
                   (for the warning message) 
        
               :param key: When the observation space comes from a Dict space, we pass the 
        
                   corresponding key to have more precise warning messages. Defaults to "". 
        
               """ 
        
               if isinstance(space, (spaces.Discrete, spaces.MultiDiscrete)) and not _starts_at_zero(space): 
        
                   maybe_key = f"(key='{key}')" if key else "" 
        
                   warnings.warn( 
        
                       f"{type(space).__name__} {space_type} space {maybe_key} with a non-zero start (start={space.start}) " 
        
                       "is not supported by Stable-Baselines3. " 
        
                       f"You can use a wrapper or update your {space_type} space." 
        
                   )

I would really appreciate your help!! I just want to help this project.

I appreciate your motivation and that's also why we have a contributing guide to get you started: https://github.com/DLR-RM/stable-baselines3/blob/master/CONTRIBUTING.md

Currently, as written in the other issues and in the env checker, there is a simple alternative which is to use a wrapper (or to update the env to do the offset there).
However, according to your PR, it seems that it is less code changes than I expected. Currently, your PR is also missing tests and doesn't seem to cover the off-policy algorithms.

araffin · 2024-12-06T10:25:35Z

Currently, as written in the other issues and in the env checker, there is a simple alternative which is to use a wrapper (or to update the env to do the offset there).

sorry for not being clear enough, after looking again at the PR, I think it would be better to document how such wrapper would look like rather than having a more complex codebase.

I guess we should add something like:

class ShiftWrapper(gym.Wrapper):
    """Allow to use Discrete() action spaces with start!=0"""
    def __init__(self, env: gym.Env) -> None:
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.Discrete)
        self.action_space = gym.spaces.Discrete(env.action_space.n, start=0)

    def step(self, action: int):
        return self.env.step(action + self.env.action_space.start)

to our documentation, the same way we document how to use isaac sim with sb3.

JoshuaBluem · 2024-12-06T13:54:54Z

Currently, as written in the other issues and in the env checker, there is a simple alternative which is to use a wrapper (or to update the env to do the offset there).

sorry for not being clear enough, after looking again at the PR, I think it would be better to document how such wrapper would look like rather than having a more complex codebase.

I guess we should add something like:
class ShiftWrapper(gym.Wrapper):
    """Allow to use Discrete() action spaces with start!=0"""
    def __init__(self, env: gym.Env) -> None:
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.Discrete)
        self.action_space = gym.spaces.Discrete(env.action_space.n, start=0)

    def step(self, action: int):
        return self.env.step(action + self.env.action_space.start)
to our documentation, the same way we document how to use isaac sim with sb3.

I am confident that everyone will perceive the current state of the discrete action spaces in the project as a bug. If users are required to implement the ShiftWrapper themselves, it will likely be viewed as a workaround. If we truly require such a ShiftWrapper, it should have already been implemented, rather than being added individually by each user.

From my perspective, adding an additional wrapper does not improve readability, as its logic differs from the Box-Scaling logic. The Box already handles scaling, and the primary changes in my pull request are the "elif isinstance(Discrete, MultiDiscrete)" block. The logic here largely mirrors the initial "if" block, using "scale" and "unscale." Therefore, anyone who understands the first "if" statement should also be able to understand the second. In fact, they might even better understand the first "if" statement upon encountering the same logic applied to a different space.

The handling of discrete action spaces in the scale and unscale methods also follows the same logic as the Box-scaling approach: transforming between action bounds and a standard range.

araffin · 2024-12-11T11:24:09Z

I am confident that everyone will perceive the current state of the discrete action spaces in the project as a bug.

SB3 intentionally doesn't support all features from Gymnasium.
This is a choice of simplicity to keep the project maintainable (see SB3 blog post).
That's also why we have an env checker (see docs, also linked in issue template) to avoid any bad surprises for our users.

. If users are required to implement the ShiftWrapper themselves, it will likely be viewed as a workaround.
From my perspective, adding an additional wrapper does not improve readability,

It is a workaround, but a simple one. And it is not the only solution.
A better solution is to have start=0 in your env, and simply shift the action inside the step method (a two lines change vs 10+ lines in SB3).

I'm also sorry that you put work in code that might not be used at the end.
This is the reason why we require opening an issue before to discuss and agree on a solution before implementing anything (see contributing guide and PR template).

JoshuaBluem added the bug Something isn't working label Dec 3, 2024

JoshuaBluem mentioned this issue Dec 3, 2024

start value of Discrete spaces #2051

Closed

16 tasks

araffin added duplicate This issue or pull request already exists check the checklist You have checked the required items in the checklist but you didn't do what is written... labels Dec 3, 2024

JoshuaBluem mentioned this issue Dec 6, 2024

start value of Discrete spaces #2054

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Using the start value of Discrete spaces has no effect #2052

[Bug]: Using the start value of Discrete spaces has no effect #2052

JoshuaBluem commented Dec 3, 2024

JoshuaBluem commented Dec 3, 2024

araffin commented Dec 4, 2024

araffin commented Dec 6, 2024

JoshuaBluem commented Dec 6, 2024

araffin commented Dec 11, 2024

[Bug]: Using the start value of Discrete spaces has no effect #2052

[Bug]: Using the start value of Discrete spaces has no effect #2052

Comments

JoshuaBluem commented Dec 3, 2024

🐛 Bug

To Reproduce

Code Sample: Enviroment with a Discrete Action starting not from zero

Relevant log output / Error message

System Info

Checklist

JoshuaBluem commented Dec 3, 2024

araffin commented Dec 4, 2024

araffin commented Dec 6, 2024

JoshuaBluem commented Dec 6, 2024

araffin commented Dec 11, 2024