-
Notifications
You must be signed in to change notification settings - Fork 0
/
MultiPlayers.html
407 lines (287 loc) · 35.3 KB
/
MultiPlayers.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-38514290-2']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Multi-players simulation environment — SMPyBandits 0.9.6 documentation</title>
<script type="text/javascript" src="_static/js/modernizr.min.js"></script>
<script type="text/javascript" id="documentation_options" data-url_root="./" src="_static/documentation_options.js"></script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/language_data.js"></script>
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
<script async="async" type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/latest.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/x-mathjax-config">MathJax.Hub.Config({"tex2jax": {"inlineMath": [["$", "$"], ["\\(", "\\)"]], "processEscapes": true, "ignoreClass": "document", "processClass": "math|output_area"}})</script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Doubling Trick for Multi-Armed Bandits" href="DoublingTrick.html" />
<link rel="prev" title="Policy aggregation algorithms" href="Aggregation.html" />
</head>
<body class="wy-body-for-nav">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search" >
<a href="index.html" class="icon icon-home"> SMPyBandits
<img src="_static/logo.png" class="logo" alt="Logo"/>
</a>
<div class="version">
0.9
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Contents:</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="README.html"><em>SMPyBandits</em></a></li>
<li class="toctree-l1"><a class="reference internal" href="docs/modules.html">SMPyBandits modules</a></li>
<li class="toctree-l1"><a class="reference internal" href="How_to_run_the_code.html">How to run the code ?</a></li>
<li class="toctree-l1"><a class="reference internal" href="PublicationsWithSMPyBandits.html">List of research publications using Lilian Besson’s SMPyBandits project</a></li>
<li class="toctree-l1"><a class="reference internal" href="Aggregation.html"><strong>Policy aggregation algorithms</strong></a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#"><strong>Multi-players simulation environment</strong></a><ul>
<li class="toctree-l2"><a class="reference internal" href="#collision-models">Collision models</a></li>
<li class="toctree-l2"><a class="reference internal" href="#more-details-on-the-code">More details on the code</a></li>
<li class="toctree-l2"><a class="reference internal" href="#policies-designed-to-be-used-in-the-multi-players-setting">Policies designed to be used in the multi-players setting</a></li>
<li class="toctree-l2"><a class="reference internal" href="#configuration">Configuration:</a></li>
<li class="toctree-l2"><a class="reference internal" href="#how-to-run-the-experiments">How to run the experiments ?</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#some-illustrations-of-multi-players-simulations">Some illustrations of multi-players simulations</a></li>
<li class="toctree-l3"><a class="reference internal" href="#fairness-vs-unfairness">Fairness vs. unfairness</a></li>
<li class="toctree-l3"><a class="reference internal" href="#scroll-license-github-license">📜 License ? </a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="DoublingTrick.html"><strong>Doubling Trick for Multi-Armed Bandits</strong></a></li>
<li class="toctree-l1"><a class="reference internal" href="SparseBandits.html"><strong>Structure and Sparsity of Stochastic Multi-Armed Bandits</strong></a></li>
<li class="toctree-l1"><a class="reference internal" href="NonStationaryBandits.html"><strong>Non-Stationary Stochastic Multi-Armed Bandits</strong></a></li>
<li class="toctree-l1"><a class="reference internal" href="API.html">Short documentation of the API</a></li>
<li class="toctree-l1"><a class="reference internal" href="About_parallel_computations.html">About parallel computations</a></li>
<li class="toctree-l1"><a class="reference internal" href="TODO.html">💥 TODO</a></li>
<li class="toctree-l1"><a class="reference internal" href="plots/README.html">Some illustrations for this project</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebooks/README.html">Jupyter Notebooks 📓 by Naereen @ GitHub</a></li>
<li class="toctree-l1"><a class="reference internal" href="notebooks/list.html">List of notebooks for SMPyBandits</a></li>
<li class="toctree-l1"><a class="reference internal" href="Profiling.html">A note on execution times, speed and profiling</a></li>
<li class="toctree-l1"><a class="reference internal" href="uml_diagrams/README.html">UML diagrams</a></li>
<li class="toctree-l1"><a class="reference internal" href="logs/README.html"><code class="docutils literal notranslate"><span class="pre">logs</span></code> files</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">SMPyBandits</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> »</li>
<li><strong>Multi-players simulation environment</strong></li>
<li class="wy-breadcrumbs-aside">
<a href="_sources/MultiPlayers.md.txt" rel="nofollow"> View page source</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<style>
/* CSS overrides for sphinx_rtd_theme */
/* 24px margin */
.nbinput.nblast,
.nboutput.nblast {
margin-bottom: 19px; /* padding has already 5px */
}
/* ... except between code cells! */
.nblast + .nbinput {
margin-top: -19px;
}
.admonition > p:before {
margin-right: 4px; /* make room for the exclamation icon */
}
/* Fix math alignment, see https://github.com/rtfd/sphinx_rtd_theme/pull/686 */
.math {
text-align: unset;
}
</style>
<div class="section" id="multi-players-simulation-environment">
<h1><strong>Multi-players simulation environment</strong><a class="headerlink" href="#multi-players-simulation-environment" title="Permalink to this headline">¶</a></h1>
<blockquote>
<div><p><strong>For more details</strong>, refer to <a class="reference external" href="https://hal.inria.fr/hal-01629733">this article</a>.
Reference: <a class="reference external" href="https://hal.inria.fr/hal-01629733">[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]</a>, presented at the <a class="reference external" href="http://www.cs.cornell.edu/conferences/alt2018/index.html#accepted">Internation Conference on Algorithmic Learning Theorey 2018</a>.</p>
</div></blockquote>
<blockquote>
<div><p>PDF : <a class="reference external" href="https://hal.inria.fr/hal-01629733/document">BK__ALT_2018.pdf</a> | HAL notice : <a class="reference external" href="https://hal.inria.fr/hal-01629733/">BK__ALT_2018</a> | BibTeX : <a class="reference external" href="https://hal.inria.fr/hal-01629733/bibtex">BK__ALT_2018.bib</a> | <a class="reference external" href="MultiPlayers.html">Source code and documentation</a>
<a class="reference external" href="http://www.cs.cornell.edu/conferences/alt2018/index.html#accepted"><img alt="Published" src="https://img.shields.io/badge/Published%3F-accepted-green.svg" /></a> <a class="reference external" href="https://bitbucket.org/lbesson/multi-player-bandits-revisited/commits/"><img alt="Maintenance" src="https://img.shields.io/badge/Maintained%3F-yes-green.svg" /></a> <a class="reference external" href="https://bitbucket.org/lbesson/ama"><img alt="Ask Me Anything !" src="https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg" /></a></p>
</div></blockquote>
<p>There is another point of view: instead of comparing different single-player policies on the same problem, we can make them play <em>against each other</em>, in a multi-player setting.</p>
<p>The basic difference is about <strong>collisions</strong> : at each time <code class="docutils literal notranslate"><span class="pre">$t$</span></code>, if two or more user chose to sense the same channel, there is a <em>collision</em>. Collisions can be handled in different way from the base station point of view, and from each player point of view.</p>
<div class="section" id="collision-models">
<h2>Collision models<a class="headerlink" href="#collision-models" title="Permalink to this headline">¶</a></h2>
<p>For example, I implemented these different collision models, in <a class="reference external" href="https://smpybandits.github.io/docs/Environment.CollisionModels.html"><code class="docutils literal notranslate"><span class="pre">CollisionModels.py</span></code></a>:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">noCollision</span></code> is a limited model <em>where</em> all players can sample an arm with collision. It corresponds to the single-player simulation: each player is a policy, compared without collision. This is for testing only, not so interesting.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">onlyUniqUserGetsReward</span></code> is a simple collision model where only the players alone on one arm sample it and receive the reward. This is the default collision model in the literature, for instance cf. <a class="reference external" href="https://arxiv.org/abs/0910.2065v3">[Shamir et al., 2015]</a> collision model 1 or cf <a class="reference external" href="https://arxiv.org/abs/0910.2065v3">[Liu & Zhao, 2009]</a>. <a class="reference external" href="https://hal.inria.fr/hal-01629733">Our article</a> also focusses on this model.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">rewardIsSharedUniformly</span></code> is similar: the players alone on one arm sample it and receive the reward, and in case of more than one player on one arm, only one player (uniform choice, chosen by the base station) can sample it and receive the reward.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">closerUserGetsReward</span></code> is similar but uses another approach to chose who can emit. Instead of randomly choosing the lucky player, it uses a given (or random) vector indicating the <em>distance</em> of each player to the base station (it can also indicate the quality of the communication), and when two (or more) players are colliding, only the one who is closer to the base station can transmit. It is the more physically plausible.</p></li>
</ul>
</div>
<hr class="docutils" />
<div class="section" id="more-details-on-the-code">
<h2>More details on the code<a class="headerlink" href="#more-details-on-the-code" title="Permalink to this headline">¶</a></h2>
<p>Have a look to:</p>
<ul class="simple">
<li><p><a class="reference external" href="https://smpybandits.github.io/docs/main_multiplayers.html"><code class="docutils literal notranslate"><span class="pre">main_multiplayers.py</span></code></a> and <a class="reference external" href="https://smpybandits.github.io/docs/configuration_multiplayers.html"><code class="docutils literal notranslate"><span class="pre">configuration_multiplayers.py</span></code></a> to run and configure the simulation,</p></li>
<li><p>the <a class="reference external" href="https://smpybandits.github.io/docs/Environment.EvaluatorMultiPlayers.html"><code class="docutils literal notranslate"><span class="pre">EvaluatorMultiPlayers</span></code></a> class that performs the simulation,</p></li>
<li><p>the <a class="reference external" href="https://smpybandits.github.io/docs/Environment.ResultMultiPlayers.html"><code class="docutils literal notranslate"><span class="pre">ResultMultiPlayers</span></code></a> class to store the results,</p></li>
<li><p>and some naive policies are implemented in the <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers/"><code class="docutils literal notranslate"><span class="pre">PoliciesMultiPlayers/</span></code></a> folder. As far as now, there is the <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.Selfish.html"><code class="docutils literal notranslate"><span class="pre">Selfish</span></code></a>, <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.CentralizedFixed.html"><code class="docutils literal notranslate"><span class="pre">CentralizedFixed</span></code></a>, <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.CentralizedCycling.html"><code class="docutils literal notranslate"><span class="pre">CentralizedCycling</span></code></a>, <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.OracleNotFair.html"><code class="docutils literal notranslate"><span class="pre">OracleNotFair</span></code></a>, <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.OracleFair.html"><code class="docutils literal notranslate"><span class="pre">OracleFair</span></code></a> multi-players policy.</p></li>
</ul>
</div>
<div class="section" id="policies-designed-to-be-used-in-the-multi-players-setting">
<h2>Policies designed to be used in the multi-players setting<a class="headerlink" href="#policies-designed-to-be-used-in-the-multi-players-setting" title="Permalink to this headline">¶</a></h2>
<ul class="simple">
<li><p>The first one I implemented is the <a class="reference external" href="https://arxiv.org/abs/1512.02866">“Musical Chair”</a> policy, from <a class="reference external" href="https://arxiv.org/abs/0910.2065v3">[Shamir et al., 2015]</a>, in <a class="reference external" href="https://smpybandits.github.io/docs/Policies.MusicalChair.html"><code class="docutils literal notranslate"><span class="pre">MusicalChair</span></code></a>.</p></li>
<li><p>Then I implemented the <a class="reference external" href="https://arxiv.org/abs/1404.5421">“MEGA”</a> policy from <a class="reference external" href="https://arxiv.org/abs/1404.5421">[Avner & Mannor, 2014]</a>, in <a class="reference external" href="https://smpybandits.github.io/docs/Policies.MEGA.html"><code class="docutils literal notranslate"><span class="pre">MEGA</span></code></a>. But it has too much parameter, the question is how to chose them.</p></li>
<li><p>The <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiplayers.rhoRand.html"><code class="docutils literal notranslate"><span class="pre">rhoRand</span></code></a> and variants are from [<a class="reference external" href="http://ieeexplore.ieee.org/document/5462144/">Distributed Algorithms for Learning…, Anandkumar et al., 2010</a>.</p></li>
<li><p>Our algorithms introduced in <a class="reference external" href="https://hal.inria.fr/hal-01629733">[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]</a> are in <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiplayers.RandTopM.html"><code class="docutils literal notranslate"><span class="pre">RandTopM</span></code></a>: <code class="docutils literal notranslate"><span class="pre">RandTopM</span></code> and <code class="docutils literal notranslate"><span class="pre">MCTopM</span></code>.</p></li>
<li><p>We also studied deeply the <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiplayers.Selfish.html"><code class="docutils literal notranslate"><span class="pre">Selfish</span></code></a> policy, without being able to prove that it is as efficient as <code class="docutils literal notranslate"><span class="pre">rhoRand</span></code>, <code class="docutils literal notranslate"><span class="pre">RandTopM</span></code> and <code class="docutils literal notranslate"><span class="pre">MCTopM</span></code>.</p></li>
</ul>
</div>
<hr class="docutils" />
<div class="section" id="configuration">
<h2>Configuration:<a class="headerlink" href="#configuration" title="Permalink to this headline">¶</a></h2>
<p>A simple python file, <a class="reference external" href="https://smpybandits.github.io/docs/configuration_multiplayers.html"><code class="docutils literal notranslate"><span class="pre">configuration_multiplayers.py</span></code></a>, is used to import the <a class="reference external" href="Arms/">arm classes</a>, the <a class="reference external" href="Policies/">policy classes</a> and define the problems and the experiments.
See the explanations given for <a class="reference internal" href="Aggregation.html"><span class="doc">the simple-player case</span></a>.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">configuration</span><span class="p">[</span><span class="s2">"successive_players"</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">CentralizedMultiplePlay</span><span class="p">(</span><span class="n">NB_PLAYERS</span><span class="p">,</span> <span class="n">klUCB</span><span class="p">,</span> <span class="n">nbArms</span><span class="p">)</span><span class="o">.</span><span class="n">children</span><span class="p">,</span>
<span class="n">RandTopM</span><span class="p">(</span><span class="n">NB_PLAYERS</span><span class="p">,</span> <span class="n">klUCB</span><span class="p">,</span> <span class="n">nbArms</span><span class="p">)</span><span class="o">.</span><span class="n">children</span><span class="p">,</span>
<span class="n">MCTopM</span><span class="p">(</span><span class="n">NB_PLAYERS</span><span class="p">,</span> <span class="n">klUCB</span><span class="p">,</span> <span class="n">nbArms</span><span class="p">)</span><span class="o">.</span><span class="n">children</span><span class="p">,</span>
<span class="n">Selfish</span><span class="p">(</span><span class="n">NB_PLAYERS</span><span class="p">,</span> <span class="n">klUCB</span><span class="p">,</span> <span class="n">nbArms</span><span class="p">)</span><span class="o">.</span><span class="n">children</span><span class="p">,</span>
<span class="n">rhoRand</span><span class="p">(</span><span class="n">NB_PLAYERS</span><span class="p">,</span> <span class="n">klUCB</span><span class="p">,</span> <span class="n">nbArms</span><span class="p">)</span><span class="o">.</span><span class="n">children</span><span class="p">,</span>
<span class="p">]</span>
</pre></div>
</div>
<ul class="simple">
<li><p>The multi-players policies are added by giving a list of their children (eg <code class="docutils literal notranslate"><span class="pre">Selfish(*args).children</span></code>), who are instances of the proxy class <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.ChildPointer.html"><code class="docutils literal notranslate"><span class="pre">ChildPointer</span></code></a>. Each child methods is just passed back to the mother class (the multi-players policy, e.g., <code class="docutils literal notranslate"><span class="pre">Selfish</span></code>), who can then handle the calls as it wants (can be centralized or not).</p></li>
</ul>
</div>
<hr class="docutils" />
<div class="section" id="how-to-run-the-experiments">
<h2><a class="reference internal" href="How_to_run_the_code.html"><span class="doc">How to run the experiments ?</span></a><a class="headerlink" href="#how-to-run-the-experiments" title="Permalink to this headline">¶</a></h2>
<p>You should use the provided <a class="reference external" href="Makefile"><code class="docutils literal notranslate"><span class="pre">Makefile</span></code></a> file to do this simply:</p>
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="c1"># if not already installed, otherwise update with 'git pull'</span>
git clone https://github.com/SMPyBandits/SMPyBandits/
<span class="nb">cd</span> SMPyBandits
make install <span class="c1"># install the requirements ONLY ONCE</span>
make multiplayers <span class="c1"># run and log the main_multiplayers.py script</span>
make moremultiplayers <span class="c1"># run and log the main_more_multiplayers.py script</span>
</pre></div>
</div>
<hr class="docutils" />
<div class="section" id="some-illustrations-of-multi-players-simulations">
<h3>Some illustrations of multi-players simulations<a class="headerlink" href="#some-illustrations-of-multi-players-simulations" title="Permalink to this headline">¶</a></h3>
<p><img alt="plots/MP__K9_M6_T5000_N500__4_algos__all_RegretCentralized____env1-1_8318947830261751207.png" src="_images/MP__K9_M6_T5000_N500__4_algos__all_RegretCentralized____env1-1_8318947830261751207.png" /></p>
<blockquote>
<div><p>Figure 1 : Regret, <code class="docutils literal notranslate"><span class="pre">$M=6$</span></code> players, <code class="docutils literal notranslate"><span class="pre">$K=9$</span></code> arms, horizon <code class="docutils literal notranslate"><span class="pre">$T=5000$</span></code>, against <code class="docutils literal notranslate"><span class="pre">$500$</span></code> problems <code class="docutils literal notranslate"><span class="pre">$\mu$</span></code> uniformly sampled in <code class="docutils literal notranslate"><span class="pre">$[0,1]^K$</span></code>. rhoRand (top blue curve) is outperformed by the other algorithms (and the gain increases with <code class="docutils literal notranslate"><span class="pre">$M$</span></code>). MCTopM (bottom yellow) outperforms all the other algorithms is most cases.</p>
</div></blockquote>
<p><img alt="plots/MP__K9_M6_T10000_N1000__4_algos__all_RegretCentralized_loglog____env1-1_8200873569864822246.png" src="_images/MP__K9_M6_T10000_N1000__4_algos__all_RegretCentralized_loglog____env1-1_8200873569864822246.png" />
<img alt="plots/MP__K9_M6_T10000_N1000__4_algos__all_HistogramsRegret____env1-1_8200873569864822246.png" src="_images/MP__K9_M6_T10000_N1000__4_algos__all_HistogramsRegret____env1-1_8200873569864822246.png" /></p>
<blockquote>
<div><p>Figure 2 : Regret (in loglog scale), for <code class="docutils literal notranslate"><span class="pre">$M=6$</span></code> players for <code class="docutils literal notranslate"><span class="pre">$K=9$</span></code> arms, horizon <code class="docutils literal notranslate"><span class="pre">$T=5000$</span></code>, for <code class="docutils literal notranslate"><span class="pre">$1000$</span></code> repetitions on problem <code class="docutils literal notranslate"><span class="pre">$\mu=[0.1,\ldots,0.9]$</span></code>. RandTopM (yellow curve) outperforms Selfish (green), both clearly outperform rhoRand. The regret of MCTopM is logarithmic, empirically with the same slope as the lower bound. The <code class="docutils literal notranslate"><span class="pre">$x$</span></code> axis on the regret histograms have different scale for each algorithm.</p>
</div></blockquote>
<p><a class="reference external" href="plots/MP__K9_M3_T123456_N100__8_algos__all_RegretCentralized_semilogy____env1-1_7803645526012310577.png">plots/MP__K9_M3_T123456_N100__8_algos__all_RegretCentralized_semilogy____env1-1_7803645526012310577.png</a></p>
<blockquote>
<div><p>Figure 3 : Regret (in logy scale) for <code class="docutils literal notranslate"><span class="pre">$M=3$</span></code> players for <code class="docutils literal notranslate"><span class="pre">$K=9$</span></code> arms, horizon <code class="docutils literal notranslate"><span class="pre">$T=123456$</span></code>, for <code class="docutils literal notranslate"><span class="pre">$100$</span></code> repetitions on problem <code class="docutils literal notranslate"><span class="pre">$\mu=[0.1,\ldots,0.9]$</span></code>. With the parameters from their respective article, MEGA and MusicalChair fail completely, even with knowing the horizon for MusicalChair.</p>
</div></blockquote>
<blockquote>
<div><p>These illustrations come from my article, <a class="reference external" href="https://hal.inria.fr/hal-01629733">[Multi-Player Bandits Revisited, Lilian Besson and Emilie Kaufmann, 2017]</a>, presented at the <a class="reference external" href="http://www.cs.cornell.edu/conferences/alt2018/index.html#accepted">Internation Conference on Algorithmic Learning Theorey 2018</a>.</p>
</div></blockquote>
</div>
<hr class="docutils" />
<div class="section" id="fairness-vs-unfairness">
<h3>Fairness vs. unfairness<a class="headerlink" href="#fairness-vs-unfairness" title="Permalink to this headline">¶</a></h3>
<p>For a multi-player policy, being fair means that on <em>every</em> simulation with <code class="docutils literal notranslate"><span class="pre">$M$</span></code> players, each player access any of the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> best arms (about) the same amount of time.
It is important to highlight that it has to be verified on each run of the MP policy, having this property in average is NOT enough.</p>
<ul class="simple">
<li><p>For instance, the oracle policy <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.OracleNotFair.html"><code class="docutils literal notranslate"><span class="pre">OracleNotFair</span></code></a> affects each of the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> players to one of the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> best arms, orthogonally, but once they are affected they always pull this arm. It’s unfair because one player will be lucky and affected to the best arm, the others are unlucky. The centralized regret is optimal (null, in average), but it is not fair.</p></li>
<li><p>And the other oracle policy <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.OracleFair.html"><code class="docutils literal notranslate"><span class="pre">OracleFair</span></code></a> affects an offset to each of the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> players corresponding to one of the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> best arms, orthogonally, and once they are affected they will cycle among the best <code class="docutils literal notranslate"><span class="pre">$M$</span></code> arms. It’s fair because every player will pull the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> best arms an equal number of time. And the centralized regret is also optimal (null, in average).</p></li>
<li><p>Usually, the <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.Selfish.html"><code class="docutils literal notranslate"><span class="pre">Selfish</span></code></a> policy is <em>not</em> fair: as each player is selfish and tries to maximize her personal regret, there is no reason for them to share the time on the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> best arms.</p></li>
<li><p>Conversely, the <a class="reference external" href="https://smpybandits.github.io/docs/Policies.MusicalChair.html"><code class="docutils literal notranslate"><span class="pre">MusicalChair</span></code></a> policy is <em>not</em> fair either, and cannot be: when each player has attained the last step, ie. they are all choosing the same arm, orthogonally, and they are not sharing the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> best arms.</p></li>
<li><p>The <a class="reference external" href="https://smpybandits.github.io/docs/Policies.MEGA.html"><code class="docutils literal notranslate"><span class="pre">MEGA</span></code></a> policy is designed to be fair: when players collide, they all have the same chance of leaving or staying on the arm, and they all sample from the <code class="docutils literal notranslate"><span class="pre">$M$</span></code> best arms equally.</p></li>
<li><p>The <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.rhoRand.html"><code class="docutils literal notranslate"><span class="pre">rhoRand</span></code></a> policy is not designed to be fair for every run, but it is fair in average.</p></li>
<li><p>Similarly for our algorithms <code class="docutils literal notranslate"><span class="pre">RandTopM</span></code> and <code class="docutils literal notranslate"><span class="pre">MCTopM</span></code>, defined in <a class="reference external" href="https://smpybandits.github.io/docs/PoliciesMultiPlayers.RandTopM.html"><code class="docutils literal notranslate"><span class="pre">RandTopM</span></code></a>.</p></li>
</ul>
</div>
<hr class="docutils" />
<div class="section" id="scroll-license-github-license">
<h3>📜 License ? <a class="reference external" href="https://github.com/SMPyBandits/SMPyBandits/blob/master/LICENSE"><img alt="GitHub license" src="https://img.shields.io/github/license/SMPyBandits/SMPyBandits.svg" /></a><a class="headerlink" href="#scroll-license-github-license" title="Permalink to this headline">¶</a></h3>
<p><a class="reference external" href="https://lbesson.mit-license.org/">MIT Licensed</a> (file <a class="reference external" href="LICENSE">LICENSE</a>).</p>
<p>© 2016-2018 <a class="reference external" href="https://GitHub.com/Naereen">Lilian Besson</a>.</p>
<p><a class="reference external" href="https://github.com/SMPyBandits/SMPyBandits/"><img alt="Open Source? Yes!" src="https://badgen.net/badge/Open%20Source%20%3F/Yes%21/blue?icon=github" /></a>
<a class="reference external" href="https://GitHub.com/SMPyBandits/SMPyBandits/graphs/commit-activity"><img alt="Maintenance" src="https://img.shields.io/badge/Maintained%3F-yes-green.svg" /></a>
<a class="reference external" href="https://GitHub.com/Naereen/ama"><img alt="Ask Me Anything !" src="https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg" /></a>
<a class="reference external" href="https://GitHub.com/SMPyBandits/SMPyBandits/"><img alt="Analytics" src="https://ga-beacon.appspot.com/UA-38514290-17/github.com/SMPyBandits/SMPyBandits/README.md?pixel" /></a>
<img alt="https://pypi.org/project/SMPyBandits" src="https://pypi.org/project/SMPyBandits" /><img alt="PyPI version" src="https://img.shields.io/pypi/v/smpybandits.svg" />
<img alt="https://pypi.org/project/SMPyBandits" src="https://pypi.org/project/SMPyBandits" /><img alt="PyPI implementation" src="https://img.shields.io/pypi/implementation/smpybandits.svg" />
<a class="reference external" href="https://pypi.org/project/SMPyBandits"><img alt="https://pypi.org/project/SMPyBandits" src="https://pypi.org/project/SMPyBandits" /><img alt="PyPI pyversions" src="https://img.shields.io/pypi/pyversions/smpybandits.svg?logo=python" />
</a>
<a class="reference external" href="https://pypi.org/project/SMPyBandits"><img alt="https://pypi.org/project/SMPyBandits" src="https://pypi.org/project/SMPyBandits" /><img alt="PyPI download" src="https://img.shields.io/pypi/dm/smpybandits.svg" /></a>
<a class="reference external" href="https://pypi.org/project/SMPyBandits"><img alt="https://pypi.org/project/SMPyBandits" src="https://pypi.org/project/SMPyBandits" /><img alt="PyPI status" src="https://img.shields.io/pypi/status/smpybandits.svg" /></a>
<a class="reference external" href="https://SMPyBandits.ReadTheDocs.io/en/latest/?badge=latest"><img alt="Documentation Status" src="https://readthedocs.org/projects/smpybandits/badge/?version=latest" /></a>
<a class="reference external" href="https://travis-ci.org/SMPyBandits/SMPyBandits"><img alt="Build Status" src="https://travis-ci.org/SMPyBandits/SMPyBandits.svg?branch=master" /></a>
<a class="reference external" href="https://GitHub.com/SMPyBandits/SMPyBandits/stargazers"><img alt="Stars of https://github.com/SMPyBandits/SMPyBandits/" src="https://badgen.net/github/stars/SMPyBandits/SMPyBandits" /></a>
<a class="reference external" href="https://github.com/SMPyBandits/SMPyBandits/releases"><img alt="Releases of https://github.com/SMPyBandits/SMPyBandits/" src="https://badgen.net/github/release/SMPyBandits/SMPyBandits" /></a></p>
</div>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="DoublingTrick.html" class="btn btn-neutral float-right" title="Doubling Trick for Multi-Armed Bandits" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="Aggregation.html" class="btn btn-neutral float-left" title="Policy aggregation algorithms" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2016-2018, Lilian Besson (Naereen)
<span class="lastupdated">
Last updated on 25 Feb 2020, 14h.
</span>
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/rtfd/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.Navigation.enable(true);
});
</script>
</body>
</html>