Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: Why are Classic observation spaces always so sparse (e.g. one-hot, unary, etc.)? #442

Closed
rallen10 opened this issue Aug 12, 2021 · 4 comments

Comments

@rallen10
Copy link
Contributor

I'm developing a pettingzoo-based environment that is in family with the classic environments. I am trying to decide the most effective way to encode the observation and action spaces and I noticed that many (all?) of the classic environments use very sparse observation spaces. For example, Chess uses a separate 8x8 channel for each color-piece combination. Hanabi uses unary encoding of cards, deck size, etc.

I think of using one-hot encoding for categorical data; therefore this paradigm seems to treat most (all?) observations in classic environments as categorical (which, to be fair, many of them are).

Why is it done this way? Is it simply modeled off of AlphaZero that also seemed to use sparse observation spaces or is there literature that supports the wide-spread use of sparse encoding? Is it just to be on the safe side by treating all observations as categorical? Does this not come at the cost of obscuring relational data in observations (e.g. card order in Hanabi, Texas Hold-em, etc.) and expanding the observation space dimensions which can slow learning?

This is part of a larger question on why sparse encoding seems to be so common in mature RL environments, not just PettingZoo. For example, even AlphaStar (for StarCraft II) seemed to use one-hot encoding for almost everything, even clearly non-categorical data like unit health.

@rallen10
Copy link
Contributor Author

rallen10 commented Aug 12, 2021

Upon further consideration, I'm thinking this is just a normalization scheme to make all observation data in the range [0,1].

@jkterry1
Copy link
Member

I don't believe that any formal literature on this problem exists. The intuition after talking to people though (at DeepMind etc.) though is to unambiguously specify every single state without potentially implying unnecessary relationships in a sane datastructure where illegal actions can be masked and in a [0,1] scheme, and to "let the neural network figure it out" if additional relationships exist. There is actually evidence that alternate schemes may be viable: recently RL achieved superhuman performance at Dou Dizhu (super famous 3 player game in china) for the first time using a wildly different observation space, that broke a ton of stuff for us. The discussion was scattered all over but I believe you can see the gist here and in the linked papers: datamllab/rlcard#228. However, no study of these schemes exists (and it's not clear to me how that could even be done).

@benblack769 feel free to chime in here

@jkterry1
Copy link
Member

I'll also add an additional point: why are sparse observations inherently so bad? I don't think they are unless you're worried about embedded systems or the like.

@rallen10
Copy link
Contributor Author

rallen10 commented Aug 12, 2021

Thanks for the insights. I don't know that they are inherently "bad", but it's also not immediately obvious why they would be inherently "good". However, given the wide spread use in mature RL environments, it seems to imply some kind of inherent benefit.

As for drawbacks to sparse observations, it can drastically increase the dimensionality of the observation space. My initial reaction is that higher dimension observation spaces are slower to learn than an equivalent lower-dimension space for non-categorical data (e.g. Discrete(5) with dim=1 would be more sample efficient than OneHot(5) with dim=5); but I don't have any empirical evidence to backup this intuition nor do I know if it would be generally applicable.

I'll leave this open a bit longer in case anyone else wants to chime in, then I'll close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants