thoughts on matchmaking #109
Replies: 3 comments
-
Just to corroborate point 3, and give a bit more data. For example lets look at Haspid and Devious Devin: While in Fall League: This is with have just about the same number of games played in the Fall League (Haspid 281/Devin 291) This more randomized solution makes sense to and I think it would be interesting to try it out in a Global Arena maybe or something. |
Beta Was this translation helpful? Give feedback.
-
I will also note: I use the term "information gain" very loosely here, it's not fully modelled after KL divergence, rather, we expect to learn more about whether snake is or isn't strong by having it play snakes close to it repeatedly. |
Beta Was this translation helpful? Give feedback.
-
Hi! First off thank you guys for the wonderful Battlesnake platform. I just started playing around with it and have had a blast despite just missing the winter competition. Since there are currently no contests available, I recently submitted a bot to the Global Duel Arena. Based on my experience so far, I wanted to toss in some thoughts and potentially revive this discussion:
Based on the above and the points mentioned by others, I would agree with the creator of this thread that a Trueskill or other elo-based system should be used instead. This would allow for much faster rank convergence, increased diversity in matchmaking, and a more fair ranking system overall. Though this thread has been stagnant for a while, I’d love to hear feedback from others if possible! |
Beta Was this translation helpful? Give feedback.
-
Hi folks, I wanted to serialize some thoughts on matchmaking while they were fresh, feel free to take, investigate, or ignore any of this.
Observationally, match making in fall league has a few issues, and I'll do my best to order these from most concrete to least concrete:
Now for the part that is like, pure and total wild fantasy on my part, feel absolutely free to ignore the peanut gallery on this one, or come back with more validation or testing that says I'm wrong, I threw something at the wall with math and it seemed like worth suggesting. I will absolutely defer to y'all (the battlesnake team) as experts for points 2 and 3, I spent the afternoon noodling with some math and came up with a system which is based exclusively on league points, which essentially boils down to roulette wheel selection for matches. See this repo.
This is modelled after a "maximum information gain" system with a long tail probability distribution, where if we model the relative strength of snakes as sampling from a probability distribution, selecting for snakes that are similar in league score give us the most information about the true strength of those snakes.
Essentially:
1/(delta_in_league_points^4+e)
(I tried exponential basis, and other polynomials, ^4 seemed like a good fit)I found that by doing this, roughly speaking, about 70% of matches would be with snakes within 50 league points of each other, and only 18% would be snakes with more than a 100 point difference between them. Additionally: it solves for diversity by throwing randomness in, while the vast majority of matches are with snakes very close to each other in league score, you get an occasional wildcard. I think this has a handful of nice properties:
If a snake comes in and is winning a lot of games (say, pretzel starts at zero), this system behaves about the same as a trueskill based system where there are "division barriers", that is, the snake will quickly score league points and move out of a given division
It ensures a high diversity of opponents, and that while most games will be against similarly rated snakes, over hundreds of games, you are quite likely to see many different snakes.
I want to be very clear that this system needs a lot more testing before it gets put in front of a competitive league, but I do think there's something here and figured it might be worth kicking off a discussion.
With much love, I really appreciate everything y'all do, Penelope
Beta Was this translation helpful? Give feedback.
All reactions