Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

observation_space does not match reset() observation, though I confirmed they are identical. #921

Closed
adiyaDalat opened this issue May 26, 2022 · 4 comments
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested

Comments

@adiyaDalat
Copy link

adiyaDalat commented May 26, 2022

Console report "The observation returned by the reset() method does not match the given observation space", though I printed the shape of observation from reset() and observation_space which are identical.

minimal code as below:

import gym
import numpy as np

from stable_baselines3 import A2C
from stable_baselines3.common.env_checker import check_env

class CustomEnv(core.Env):
    
    metadata = {'render.modes': ['human']}
    
    def __init__(self, df, window=1):
        
        self.feature = df.to_numpy()
        self.window = window
        self.feature_shape = (window, np.shape(self.feature)[1])
        
        self.current_tick = self.window
        
        self.action_space = spaces.Discrete(3)
        
        self.observation_space = spaces.Box(low=-np.inf, high=np.inf, shape=self.feature_shape, dtype=np.float32)
        
        self.step_num = 0
        self.total_reward = 0
        
    
    def reset(self):
        position = 0
        self.current_tick = self.window
        
        obs = self._get_observation(position)
        return obs
    
    def _get_observation(self, position):
        obs = self.feature[(self.current_tick-self.window):self.current_tick]
        return obs

check_env(CustomEnv(df, window=3))
Traceback (most recent call last): AssertionError: The observation returned by the `reset()` method does not match the given observation space

### Checklist

  • [v] I have read the documentation (required)
  • [v] I have checked that there is no similar issue in the repo (required)
  • [v] I have checked my env using the env checker (required)
  • [v] I have provided a minimal working example to reproduce the bug (required)

Above that, I have two more questions:

  1. How should I integrate the state of agent{0,1,2} into the observation? If I directly integrate the state as another element of the observation array, will agent take it as the important state of itself rather than the observation of the surrounding environment? I found in OpenAI gym, the document used a dict as the observation though rather than mixed them.
  2. If I want to use multiple rows of data as input observation from an array and use LSTM as model, is the coding above correct? SB3 suggested to flatten to 1D array though..
@adiyaDalat adiyaDalat added custom gym env Issue related to Custom Gym Env question Further information is requested labels May 26, 2022
@araffin
Copy link
Member

araffin commented May 26, 2022

you should check the type (and dtype)...

@adiyaDalat
Copy link
Author

adiyaDalat commented May 26, 2022

@araffin Thanks! It worked! It was resulted from this line of code from souse code of gym:

np.can_cast(x.dtype, self.dtype): dtype('float32'),dtype('float64') ->False

Could you please give some hint on these questions 🙏:

  1. How should I integrate the state of agent{0,1,2} into the observation? If I directly integrate the state as another element of the observation array, will agent take it as the important state of itself rather than the observation of the surrounding environment? I found in OpenAI gym, the document used a dict as the observation rather than simply mixed them. Any benefit?

  2. If I want to use multiple rows of data as input observation from an array and use LSTM as model, is the coding above correct? SB3 suggested to flatten to 1D array though..

Thanks a lot araffin!

@araffin
Copy link
Member

araffin commented May 29, 2022

and use LSTM as model

If you want to use recurrent policy, you don't need to change anything (Stable-Baselines-Team/stable-baselines3-contrib#53), you also should not need to do framestacking (but feedforward PPO + framestacking is usually competitive, see link to SB3 contrib with recurrent PPO benchmark).

suggested to flatten to 1D array though..

this is correct, you should flatten to 1D (mainly to avoid broadcasting issues and misinterpretation of the input).
But what you are doing look like framestacking, which is already supported by gym and SB3 (no need to re-implement that).

How should I integrate the state of agent{0,1,2} into the observation?

not sure what you mean, but sounds more like tech support/consulting which we don't do (see links in the issue template for alternatives)

@adiyaDalat
Copy link
Author

Thanks for the advice! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants