-
-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q: Why are Classic observation spaces always so sparse (e.g. one-hot, unary, etc.)? #442
Comments
Upon further consideration, I'm thinking this is just a normalization scheme to make all observation data in the range [0,1]. |
I don't believe that any formal literature on this problem exists. The intuition after talking to people though (at DeepMind etc.) though is to unambiguously specify every single state without potentially implying unnecessary relationships in a sane datastructure where illegal actions can be masked and in a [0,1] scheme, and to "let the neural network figure it out" if additional relationships exist. There is actually evidence that alternate schemes may be viable: recently RL achieved superhuman performance at Dou Dizhu (super famous 3 player game in china) for the first time using a wildly different observation space, that broke a ton of stuff for us. The discussion was scattered all over but I believe you can see the gist here and in the linked papers: datamllab/rlcard#228. However, no study of these schemes exists (and it's not clear to me how that could even be done). @benblack769 feel free to chime in here |
I'll also add an additional point: why are sparse observations inherently so bad? I don't think they are unless you're worried about embedded systems or the like. |
Thanks for the insights. I don't know that they are inherently "bad", but it's also not immediately obvious why they would be inherently "good". However, given the wide spread use in mature RL environments, it seems to imply some kind of inherent benefit. As for drawbacks to sparse observations, it can drastically increase the dimensionality of the observation space. My initial reaction is that higher dimension observation spaces are slower to learn than an equivalent lower-dimension space for non-categorical data (e.g. Discrete(5) with dim=1 would be more sample efficient than OneHot(5) with dim=5); but I don't have any empirical evidence to backup this intuition nor do I know if it would be generally applicable. I'll leave this open a bit longer in case anyone else wants to chime in, then I'll close the issue. |
I'm developing a pettingzoo-based environment that is in family with the classic environments. I am trying to decide the most effective way to encode the observation and action spaces and I noticed that many (all?) of the classic environments use very sparse observation spaces. For example, Chess uses a separate 8x8 channel for each color-piece combination. Hanabi uses unary encoding of cards, deck size, etc.
I think of using one-hot encoding for categorical data; therefore this paradigm seems to treat most (all?) observations in classic environments as categorical (which, to be fair, many of them are).
Why is it done this way? Is it simply modeled off of AlphaZero that also seemed to use sparse observation spaces or is there literature that supports the wide-spread use of sparse encoding? Is it just to be on the safe side by treating all observations as categorical? Does this not come at the cost of obscuring relational data in observations (e.g. card order in Hanabi, Texas Hold-em, etc.) and expanding the observation space dimensions which can slow learning?
This is part of a larger question on why sparse encoding seems to be so common in mature RL environments, not just PettingZoo. For example, even AlphaStar (for StarCraft II) seemed to use one-hot encoding for almost everything, even clearly non-categorical data like unit health.
The text was updated successfully, but these errors were encountered: