thoughts on matchmaking #109

fables-tales · 2021-09-26T21:18:47Z

fables-tales
Sep 26, 2021

Hi folks, I wanted to serialize some thoughts on matchmaking while they were fresh, feel free to take, investigate, or ignore any of this.

Observationally, match making in fall league has a few issues, and I'll do my best to order these from most concrete to least concrete:

if a division has too few snakes, matches aren't made. I might suggest if a division is smaller than some number (8? 16?) matchmaking the snakes with snakes in the division below (this ensures snakes that push up in to the next division can still get games until that division is large enough)
anecdotally, matchmaking does not appear to be providing a high diversity of opponents. I noted this when looking at 6 of my own games yesterday and found that 2/6 had exactly the same snakes in them, and 2/6 had 3 of 4 snakes exactly the same. I'd be interested to see a systemwide analysis
this one is pure speculation, but it appears that the games played so far do not quite represent true snake strengths, I went over this with @coreyja and it seems like we think our snakes should be in different places to where they are (e.g. I am not confident that demifemme should have been 140 points ahead of pruzze if scores truly represented the relative strength of snakes).

Now for the part that is like, pure and total wild fantasy on my part, feel absolutely free to ignore the peanut gallery on this one, or come back with more validation or testing that says I'm wrong, I threw something at the wall with math and it seemed like worth suggesting. I will absolutely defer to y'all (the battlesnake team) as experts for points 2 and 3, I spent the afternoon noodling with some math and came up with a system which is based exclusively on league points, which essentially boils down to roulette wheel selection for matches. See this repo.

This is modelled after a "maximum information gain" system with a long tail probability distribution, where if we model the relative strength of snakes as sampling from a probability distribution, selecting for snakes that are similar in league score give us the most information about the true strength of those snakes.

Essentially:

each round of matches pick a random snake
assign probabilities of being selected to each other snake proportional to 1/(delta_in_league_points^4+e) (I tried exponential basis, and other polynomials, ^4 seemed like a good fit)
pick 3 other random snakes
repeat until all snakes are in matches

I found that by doing this, roughly speaking, about 70% of matches would be with snakes within 50 league points of each other, and only 18% would be snakes with more than a 100 point difference between them. Additionally: it solves for diversity by throwing randomness in, while the vast majority of matches are with snakes very close to each other in league score, you get an occasional wildcard. I think this has a handful of nice properties:

If a snake comes in and is winning a lot of games (say, pretzel starts at zero), this system behaves about the same as a trueskill based system where there are "division barriers", that is, the snake will quickly score league points and move out of a given division

It ensures a high diversity of opponents, and that while most games will be against similarly rated snakes, over hundreds of games, you are quite likely to see many different snakes.

I want to be very clear that this system needs a lot more testing before it gets put in front of a competitive league, but I do think there's something here and figured it might be worth kicking off a discussion.

With much love, I really appreciate everything y'all do, Penelope

coreyja · 2021-09-26T21:47:42Z

coreyja
Sep 26, 2021
Maintainer Sponsor

Just to corroborate point 3, and give a bit more data.
I've been comparing the Global Royale Leaderboard to the Fall League Leaderboard. And snakes that are very close to me (and above me) in the Global Leaderboard are sometimes very far below me in the fall League.

For example lets look at Haspid and Devious Devin:
In Global Royale:
Haspid Rank 4 (19,802 Points)
Devin Rank 15 (8,096 Points)

While in Fall League:
Devin Rank 4 (911 Points)
Haspid Rank 39 (239 Points)

This is with have just about the same number of games played in the Fall League (Haspid 281/Devin 291)

This more randomized solution makes sense to and I think it would be interesting to try it out in a Global Arena maybe or something.

0 replies

fables-tales · 2021-09-26T22:25:32Z

fables-tales
Sep 26, 2021
Author

I will also note: I use the term "information gain" very loosely here, it's not fully modelled after KL divergence, rather, we expect to learn more about whether snake is or isn't strong by having it play snakes close to it repeatedly.

0 replies

SullivanC19 · 2021-12-23T09:35:06Z

SullivanC19
Dec 23, 2021

Hi! First off thank you guys for the wonderful Battlesnake platform. I just started playing around with it and have had a blast despite just missing the winter competition.

Since there are currently no contests available, I recently submitted a bot to the Global Duel Arena. Based on my experience so far, I wanted to toss in some thoughts and potentially revive this discussion:

With the current rating system, it takes a very long time to reach a stable rank. Since everyone starts off at rank 0 and receives a maximum of +5 it seems for a win per game, it will take 4,000 games (over a month) with a perfect win rate for a brand new snake to reach the expert level.
1a. Because of this, it can be difficult to make iterative improvements since it can take more than a week for the pool of snakes one plays against to change significantly.
The number of points gained/lost is unequal. This means that snakes whose ranks have already converged are still consistently gaining rank points and those that joined more recently have to play catch-up to compete on the ladder.
2a. It also inflates the tier system — pretty much everyone in the Global Duel Arena is now gold or higher and eventually they’ll surpass the platinum threshold as well.
2b. At the top of the ladder and despite having more than 50,000 greater points than the snake in second place, Combat Snake always plays that same snake and loses fewer ranking points on a loss than they gain during a win which seems a bit backwards.

Based on the above and the points mentioned by others, I would agree with the creator of this thread that a Trueskill or other elo-based system should be used instead. This would allow for much faster rank convergence, increased diversity in matchmaking, and a more fair ranking system overall.

Though this thread has been stagnant for a while, I’d love to hear feedback from others if possible!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Battlesnake

thoughts on matchmaking #109

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Battlesnake

thoughts on matchmaking #109

fables-tales Sep 26, 2021

Replies: 3 comments

coreyja Sep 26, 2021 Maintainer Sponsor

fables-tales Sep 26, 2021 Author

SullivanC19 Dec 23, 2021

fables-tales
Sep 26, 2021

coreyja
Sep 26, 2021
Maintainer Sponsor

fables-tales
Sep 26, 2021
Author

SullivanC19
Dec 23, 2021