-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathOLD aises_7_4
2089 lines (2075 loc) · 116 KB
/
OLD aises_7_4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<style type="text/css">
table.tableLayout{
margin: auto;
border: 1px solid;
border-collapse: collapse;
border-spacing: 1px;
caption-side: bottom;
}
table.tableLayout > caption.title{
white-space: unset;
max-width: unset;
}
table.tableLayout tr{
border: 1px solid;
border-collapse: collapse;
padding: 5px;
}
table.tableLayout th{
border: 1px solid;
border-collapse: collapse;
padding: 3px;
}
table.tableLayout td{
border: 1px solid;
padding: 5px;
}
</style>
<style>
.visionbox{
border-radius: 15px;
border: 2px solid #3585d4;
background-color: #ebf3fb;
text-align: left;
padding: 10px;
}
</style>
<style>
.visionboxlegend{
border-bottom-style: solid;
border-bottom-color: #3585d4;
border-bottom-width: 0px;
margin-left: -12px;
margin-right: -12px; margin-top: -13px;
padding: 0.01em 1em; color: #ffffff;
background-color: #3585d4;
border-radius: 15px 15px 0px 0px}
</style>
<h1 id="conflict-and-cooperation">8.4 Conflict and Cooperation</h1>
<h2 id="overview">8.4.1 Overview</h2>
<p>In this chapter, we have been exploring the risks that arise from
interactions between multiple agents. So far, we have used game theory
and evolutionary theory to understand how collective behavior can
produce undesirable outcomes. In simple terms, securing morally good
outcomes without cooperation can be extremely difficult, even for
intelligent rational agents. Consequently, the potential for conflict
and the importance of cooperation has emerged as a strong theme in this
chapter. In this third section of this chapter, we examine conflict and
cooperation in more detail.</p>
<h3 id="conflict-overview">Conflict Overview</h3>
<p>We begin this section by exploring the drivers of conflict. Here, we
use the term “conflict” loosely, to describe the decision to defect
rather than cooperate in a competitive situation. This may lead to
violence in some cases, but not necessarily all. Our goal is to uncover
how, despite being costly, conflict can sometimes be a rational choice
nevertheless.<p>
Microorganisms, humans, states, and nations all cooperate and conflict
in different situations. In nature, we can observe cooperation in the
form of social insect behavior, pack hunting, symbiotic relationships,
and much more. In humans, we encounter cooperation in several areas
including coordinated disaster responses, international peace
negotiations, and community service, among many others. By contrast,
conflict in both human and non-human organisms can occur as a
consequence of resource competition, territorial disputes, mating
access, the maintenance of social dominance hierarchies, including
several other factors. Importantly, the mechanisms and factors that
motivate cooperation and conflict are prevalent in various environments.
Thus, we have good reason to suppose that AI agents will be similarly
influenced by these various mechanisms and factors as they decide to
cooperate or conflict with humans and other AIs.<p>
We will begin our discussion of conflict with concepts in bargaining
theory. We then examine some specific features of competitive situations
that make it harder to reach negotiated agreements or avoid
confrontation. We begin with five factors from bargaining theory that
can influence the potential for conflict. These can be divided into the
following two groups: <strong>Commitment problems.</strong> According to
bargaining theory, one reason bargains may fail is that some of the
agents making an agreement may have the ability and incentive to break
it. We explore three examples of commitment problems.</p>
<ul>
<li><p><em>Power shifts</em>: when there are imbalances between agents’
capabilities such that one agent becomes stronger than the other,
conflict is more likely to emerge between them.</p></li>
<li><p><em>First-strike advantages</em>: when one agent possesses the
element of surprise, the ability to choose where conflict takes place,
or the ability to quickly defeat their opponent, the probability of
conflict increases.</p></li>
<li><p><em>Issue indivisibility</em>: agents cannot always divide a good
however they please – some goods are “all or nothing” and this increases
the probability of conflict between agents.</p></li>
</ul>
<p><strong>Information problems.</strong> According to bargaining
theory, the other principal cause of a bargaining failure is that some
of the agents may lack good information. Uncertainty regarding a rival’s
capabilities and intentions can increase the probability of conflict. We
explore two information problems.</p>
<ul>
<li><p><em>Misinformation</em>: in the real world, agents frequently
have incorrect information, which can cause them to miscalculate
suitable bargaining ranges.</p></li>
<li><p><em>Disinformation</em>: agents may sometimes have incentives to
misrepresent the truth intentionally. Even the expectation of
disinformation can make it more difficult to reach a negotiated
settlement.</p></li>
</ul>
<p><strong>Factors outside of bargaining theory.</strong> Bargaining
frameworks do not encompass all possible reasons why agents may decide
to conflict with one another. We end by exploring one example:</p>
<ul>
<li><p><em>Inequality</em>: under conditions of inequality, agents may
fight for access to a larger share of available resources or a desired
social standing.</p></li>
</ul>
<h3 id="cooperation-overview">Cooperation overview</h3>
<p>Next, we turn to cooperation. We observe many forms of cooperation in
biological systems: social insect colonies, pack hunting, symbiotic
relationships, and much more. Humans perform community services,
negotiate international peace agreements, and coordinate aid for
disaster responses. Our very societies are built around
cooperation.<p>
Cooperation between AI stakeholders may be vital for counteracting the
competitive and evolutionary pressures of AI races we have explored in
this chapter. For example, the “merge-and-assist” clause of OpenAI’s
charter outlines their commitment to cease competition with—and provide
assistance to—any “value-aligned, safety-conscious” AI developer who
appears close to producing AGI, in order to reduce the risk of eroding
safety precautions. Cooperation between AI agents is also necessary for
reducing some of the multi-agent risks we have looked at: we want AIs to
cooperate, rather than defect, in Prisoner’s Dilemma scenarios.<p>
However, ensuring that AIs behave cooperatively may not be a total
solution to the collective action problems we have examined in this
chapter. By more closely examining how cooperative relationships can
come about, it is possible to see how they may backfire with disastrous
consequences for AI safety. Instead, we need a more nuanced view of the
potential benefits and risks of promoting cooperation between AI
systems. To do this, we study seven different mechanisms by which
cooperation may arise in multi-agent systems, considering the
ramifications of each for cooperation between and within human agencies
and AI agents:</p>
<ul>
<li><p><em>Direct reciprocity</em>: when individuals are likely to
encounter others in the future, they are more likely to cooperate with
them.</p></li>
<li><p><em>Indirect reciprocity</em>: when it benefits an individual’s
reputation to cooperate with others, they are more likely to do
so.</p></li>
<li><p><em>Group selection</em>: when there is competition between
groups, cooperative groups may outcompete non-cooperative
groups.</p></li>
<li><p><em>Kin selection</em>: when an individual is closely related to
others, they are more likely to cooperate with them.</p></li>
<li><p><em>Individual stakes to common stakes</em>: when individual
interests become aligned with the collective good of the group,
individuals are more likely to behave in ways that benefit the
whole.</p></li>
<li><p><em>Simon’s selection mechanism</em>: when available information
is limited, individuals may be impelled to rely on social channels that
require cooperation.</p></li>
<li><p><em>Institutional mechanisms</em>: when there are externally
imposed incentives (such as laws) that subsidize cooperation and punish
defection, individuals and groups are more likely to cooperate.</p></li>
</ul>
<h2 id="conflict">8.4.2 Conflict</h2>
<h3 id="introduction-to-conflict">Introduction to Conflict</h3>
<p>In this section, we use the term “conflict” to describe the decision
to defect rather than cooperate in competitive situations. This often,
though not always, involves some form of violence, and destroys some
amount of value. Conflict is common in nature. Organisms engage in
conflict to maintain social dominance hierarchies, to hunt, and to
defend territory. People also engage in conflict. Throughout human
history, wars are common, often occurring as a consequence of
power-seeking behavior, which inspired conflict over attempts at
aggressive territorial expansion or resource acquisition.<p>
We begin this section by discussing bargaining theory, which lays the
groundwork for understanding why it may be rational for agents to engage
in conflict. Next, we turn to the specific factors that may motivate
agents to engage in conflict with one another, even when compromise
might be the better option. We explore why AIs may be similarly affected
by these factors, such that they may view conflict as an instrumentally
rational choice in certain contexts.</p>
<p><strong>Conflict can be rational.</strong> Though humans know
conflict can be enormously costly, we often still pursue or instigate
it, even when compromise might be the better option.<p>
Consider the following example: a customer trips in a store and sues the
owner for negligence. There is a 60% probability the lawsuit is
successful. If they win, the owner has to pay them $40,000, and going to
court will cost each of them $10,000 in legal fees. There are three
options: (1) they or the owner concede, (2) they both let the matter go
to court, (3) they both reach an out-of-court settlement.<p>
</p>
<ol>
<li><p>If the owner concedes, the owner loses $40,000, and if the
customer concedes, they gain nothing.</p></li>
<li><p>If both go to court, the owner’s expected payoff is the product
of the payment to the customer and the probability that the lawsuit is
successful minus legal fees. In this case, the owner’s expected payoff
would be <span class="math inline">(−40,000×0.6) − 10, 000</span> while
the customer’s expected payoff would be <span
class="math inline">(40,000×0.6) − 10, 000</span>. As a result, the
owner loses $34,000 dollars and the customer gains $14,000
dollars.</p></li>
<li><p>An out-of-court settlement x where <span
class="math inline">14, 000 < <em>x</em> < 34, 000</span> would
enable the customer to get a higher payoff and the owner to pay lower
costs. Therefore, a mutual settlement is the best option for both if
<span class="math inline"><em>x</em></span> is in this range.</p></li>
</ol>
<p>Hence, if the proposed out-of-court settlement would be greater than
$34,000, it would make sense for the owner to opt for conflict rather
than bargaining. Similarly, if the proposed settlement were less than
$14,000, it would be rational for the customer to opt for conflict.</p>
<h3 id="bargaining-theory">Bargaining Theory</h3>
<p>Here, we begin with a general overview of bargaining theory, to
illustrate how pressures to outcompete rivals or preserve power and
resources may make conflict an instrumentally rational choice. Next, we
turn to the unitary actor assumption, highlighting that when agents view
their rivals as unitary actors, they assume that they will act more
coherently, taking whatever steps necessary to maximize their welfare.
Following this, we discuss the notion of commitment problems, which
occur when agents cannot reliably commit to an agreement or have
incentives to break it. Commitment problems increase the probability of
conflict, and are motivated by specific factors, such as power shifts,
first strike advantages, and issue indivisibility. We then explore how
information problems and inequality can also increase the probability of
conflict.</p>
<p><strong>Bargaining theory.</strong> When agents compete for something
they both value, they may either negotiate to reach an agreement
peacefully, or resort to more forceful alternatives such as violence. We
call the latter outcome “conflict,” and can view this as the decision to
defect rather than cooperate. Unlike peaceful bargaining, conflict is
fundamentally costly for winners and losers alike. However, it may
sometimes be the rational choice. <em>Bargaining theory</em> describes
why rational agents may be unable to reach a peaceful agreement, and
instead end up engaging in violent conflict. Due to pressures to
outcompete rivals or preserve their power and resources, agents
sometimes prefer conflict, especially when they cannot reliably predict
the outcomes of conflict scenarios. When rational agents assume that
potential rivals have the same mindset, the probability of conflict
increases.</p>
<p><strong>The unitary actor assumption.</strong> We tend to assume that
a group is a single entity, and that its leader is only interested in
maximizing the overall welfare of the entity. We call this the
<em>unitary actor assumption</em>, which is another name for the “unity
of purpose” assumption discussed previously in this chapter. A nation in
disarray without coherent leadership is not necessarily a unitary actor.
When we view groups and individuals as unitary actors, we can assume
they will act more coherently, so they can be more easily modeled as
taking steps necessary to maximize their welfare. When parties make this
assumption, they may be less likely to cooperate with others since what
is good for one party’s welfare may not necessarily be good for
another’s.</p>
<p><strong>The bargaining range.</strong> Whether or not agents are
likely to reach a peaceful agreement through negotiation will be
influenced by whether their bargaining ranges overlap. The bargaining
range represents the set of possible outcomes that both agents involved
in a competition find acceptable through negotiation. Recall the lawsuit
example: a bargaining settlement “<span
class="math inline"><em>x</em></span>” is only acceptable if it falls
between $14,000 and $34,000. Any settlement “<span
class="math inline"><em>x</em></span>” below $14,000 will be rejected by
the customer while any settlement “<span
class="math inline"><em>x</em></span>” above $34,000 will be rejected by
the store owner. Thus, the bargaining range is often depicted as a
spectrum with the lowest acceptable outcome for one party at one end and
the highest acceptable outcome for the other party at the opposite end.
Within this range, there is room for negotiation and potential
agreements.</p>
<figure id="fig:overview-barg">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/bargain_range.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure 8.11: The bargaining range is defined by the potential gains from cooperation.</p>
<!--<figcaption>Overview of Bargaining ranges</figcaption>-->
</figure>
<p><strong>Conflict in AIs.</strong> Let us assume that AI agents will
act rationally in the pursuit of their goals (so, at the least, we model
them as unitary actors or as having unity of purpose). In the process of
pursuing and fulfilling their goals, AI agents may encounter potential
conflict scenarios, just as humans do. In certain scenarios, AIs may be
motivated to pursue violent conflict over a peaceful resolution, for the
reasons we now explore.</p>
<h3 id="sec:commitment problems">Commitment problems</h3>
<p>Many conflicts occur over resources, which are key to an agent’s
power. Consider a bargaining failure in which two agents bargain over
resources in an effort to avoid war. If agents were to acquire these
resources, they could invest them into military power. As a result,
neither can credibly commit to use them only for peaceful purposes. This
is one instance of a <em>commitment problem</em> <span class="citation"
data-cites="fearon1995rationalist">[1]</span>, which is when agents
cannot reliably commit to an agreement, or when they may even have
incentives to break an agreement. Commitment problems are usually
motivated by specific factors, such as power shifts, first-strike
advantages, and issue indivisibility, which may make conflict a rational
choice. It is important to note that our discussion of these commitment
problems assumes anarchy: we take for granted that contracts are not
enforceable in the absence of a higher governing authority.</p>
<p><strong>Power shifts overview.</strong> When there are imbalances
between parties’ capabilities such that one party becomes stronger than
the other, <em>power shifts</em> can occur. Such imbalances can arise as
a consequence of several factors including technological and economic
advancements, increases in military capabilities, as well as changes in
governance, political ideology, and demographics. In the context of AIs,
power could shift if AIs become more intelligent or have a change in
resources. Individual parties may initially be able to avoid violent
conflict by arriving at a peaceful and mutually beneficial settlement
with their rivals. However, if they or their rival’s power increases
after this settlement has been made, the stronger party may end up
benefiting from it more than the weaker party. Thus, we encounter the
following commitment problem: the rising power cannot commit not to
exploit their advantage in the future, incentivizing the declining power
to opt for conflict in the present.</p>
<p><strong>Example: The US vs China.</strong> China has been investing
heavily in its military. This has included the acquisition or expansion
of its capabilities in technologies such as nuclear and supersonic
missiles, as well as drones. The future is uncertain, but if this trend
continues, it increase the risk of conflict. If China were to gain a
military advantage over the US, this would shift the balance of power.
This possibility undermines the stability of bargains struck today
between the US and China, because China’s expected outcome from conflict
may increase in the future if they become more powerful. The US may
expect that agreements made with China about cooperating on AI
regulation could lose enforceability later if there is a significant
power shift.<p>
This situation can be modeled using the concept of “Thucydides’ Trap.”
The ancient Greek historian Thucydides suggested that the contemporary
conflict between Sparta and Athens might have been the result of Athens’
increasing military strength, and Sparta’s fear of the looming power
shift. Though this analysis of the Peloponnesian War is now
much-contested, this concept can nevertheless serve to understand how a
rising power threatening the position of an existing superpower in the
global order can increase the potential for conflict rather than
peaceful bargaining.</p>
<p><strong>Effect on the bargaining range.</strong> Consider two agents,
A and B. A is always weaker than B, but relative to the time period, A
is weaker in the future than it is in the present. A will always have a
lower bargaining range, so B will be unlikely to accept any settlements,
especially as B’s power increases. It makes sense for A to prefer
conflict, because if it waits, B’s bargaining range will shift further
and further away, eliminating any overlap between the two. Therefore, A
prefers to gamble on conflict even if the probability that A wins is
lower than B; the costs of war do not outweigh the benefits of a
peaceful but unreasonable settlement. Consider the 1956 Suez Crisis.
Egypt was seen as a rising power in the Middle East, having secured
control over the Suez canal. This threatened the interests of the
British and French governments in the region, who responded by
instigating war. To safeguard their diminishing influence, the British
and French launched a swift and initially successful military
intervention.</p>
<p><strong>Power shifts and AI.</strong> AIs could shift power as they
gain greater intelligence and more access to resources. Recall the
chapter on , where we saw that an agent’s power is highly related to the
efficiency with which they can exploit resources for their benefit,
which often depends on their level of intelligence. The power of future
AI systems is largely unpredictable; we do not know how intelligent or
useful they will be. This could give rise to substantial uncertainty
regarding how powerful potential adversaries using AI might become. If
this is the case, there might be reason to engage in conflict to prevent
the possibility of adversaries further increasing their power.</p>
<p><strong>First strike advantage overview.</strong> If an agent has a
<em>first-strike advantage</em>, they will do better to launch an attack
than respond to one. This gives rise to the following commitment
problem: an offensive advantage may be short-lived, so it is best to act
on it before the enemy does instead. Some ways in which an agent may
have a first strike advantage include:</p>
<ol>
<li><p>As explored above, anticipating a future power shift may motivate
an attack on the rising power to prevent it from gaining the upper
hand.</p></li>
<li><p>The costs of conflict might be lower for the attacker than they
are for the defender, so the attacker is better off securing on
offensive advantage while the defender is still in a position of
relative weakness.</p></li>
<li><p>The odds of victory may be higher for whichever agent attacks
first. The attacker might possess the element of surprise, the ability
to choose where conflict takes place, or the potential to quickly defeat
their opponent. For instance, a pre-emptive nuclear strike could be used
to target an enemy’s nuclear arsenal, thus diminishing their ability to
retaliate.</p></li>
</ol>
<p><strong>Examples: IPOs, patent Infringement, and Pearl
Harbor.</strong> When a company goes public, it can release an IPO,
allowing members of the general public to purchase company shares.
However, company insiders, such as executives and early investors, often
have access to valuable information not available to the general public;
this gives insiders a first-strike advantage. Insiders may buy or sell
shares based on this privileged information, leading to potential
regulatory conflicts or disputes with other investors who do not have
access to the same information. Alternatively, when a company develops a
new technology and files a patent application, they gain a first-strike
advantage by ensuring that their product will not be copied or
reproduced by other companies. If a rival company does create a similar
technology and later files a patent application, conflict can emerge
when the original company claims patent infringement.<p>
On the international level, we note similar dynamics, such as in the
case of Pearl Harbor. Though Japan and the US were not at war in 1941,
their peacetime was destabilized by a commitment problem: if one nation
were to attack the other, they would have an advantage in the ensuing
conflict. The US’ Pacific fleet posed a threat to Japan’s military plans
in Southeast Asia. Japan had the ability to launch a surprise long-range
strategic attack. Thus, neither the US nor Japan could credibly commit
not to attack the other. In the end, Japan struck first, bombing the US
battleships at the naval base at Pearl Harbor. The attack was successful
in securing a first-strike advantage for Japan, but it also ensured the
US’s entry into WWII.<p>
</p>
<br>
<table class="tableLayout">
<caption style="overflow: hidden; white-space: nowrap;">Table 8.6: A pay-off matrix for competitors choosing whether to defend or preemptively attack.</caption>
<thead>
<tr class="header">
<th style="text-align: left;"></th>
<th style="text-align: center;">defend</th>
<th style="text-align: center;">preempt</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: left;">defend</td>
<td style="text-align: center;">2,2</td>
<td style="text-align: center;">0,3</td>
</tr>
<tr class="even">
<td style="text-align: left;">preempt</td>
<td style="text-align: center;">3,0</td>
<td style="text-align: center;">1,1</td>
</tr>
</tbody>
</table>
<br>
<p><strong>Effect on the bargaining range.</strong> When the advantages
of striking first outweigh the costs of conflict, it can shrink or
destroy the bargaining range entirely. For any two parties to reach a
mutual settlement through bargaining, each must be willing to freely
communicate information with the other. However, in doing so, each party
might have to reveal offensive advantages, which would increase their
vulnerability to attack. The incentive to preserve and therefore conceal
an offensive advantage from opponents’ pressures agents to defect from
bargaining.</p>
<p><strong>First Strike Advantage and AIs.</strong> One scenario in
which an AI may be motivated to secure a first strike advantage is
cyberwarfare. An AI might hack servers for a variety of reasons to
secure an offensive advantage. AIs may want to disrupt and degrade an
adversary’s capabilities by attacking and destroying critical
infrastructure. Alternatively, an AI might gather sensitive information
regarding a rival’s capabilities, vulnerabilities, and strategic plans
to leverage potential offensive advantages.<p>
AIs may provide first strike advantages in other ways, too. Sudden and
dramatic progress in AI capabilities could motivate one party to take
offensive action. For example, if a nation very rapidly develops a much
more powerful AI system than its military enemies, this could present a
powerful first strike advantage: by attacking immediately, they may hope
to prevent their rivals from catching up with them, which would lose
them their advantage. Similar incentives were likely at work when the US
was considering a nuclear strike on the USSR to prevent them from
developing nuclear weapons themselves.<p>
Reducing the possibility of first-strike advantages is challenging,
especially with AI. However, we can lower the probability that they
arise by ensuring that there is a balance between the offensive and
defensive capabilities of potential rivals. In other words, defense
dominance can facilitate peace because attempted attacks between rivals
are likely to be unsuccessful or result in mutually assured destruction.
Therefore, we might reduce the probability that AIs are motivated to
pursue a first-strike advantage by ensuring that humans maintain defense
dominance, for instance, by requiring that advanced AIs have a built-in
incorruptible fail-safe mechanism, such as a manual “off-switch.”</p>
<p><strong>Issue indivisibility overview.</strong> Settlements that fall
within bargaining range will always be preferable to conflict, but this
assumes that whatever issues agents bargain over are divisible. For
instance, two agents can divide a territory in an infinite amount of
ways insofar as the settlement they arrive at falls within the
bargaining range, satisfying both their interests and outweighing the
individual benefits of engaging in conflict. Unfortunately, however,
some goods are indivisible, which inspires the following commitment
problem: parties cannot always divide a good however they please—-some
goods are “all or nothing.” When parties encounter <em>issue
indivisibility</em> <span class="citation"
data-cites="fearon1995rationalist">[1]</span>, the probability of
conflict increases. Indivisible issues include monarchies, small
territories like islands or holy sites, national religion or pride, and
sovereign entities such as states or human beings, among several
others.</p>
<p><strong>Examples: shopping, organ donation, and
co-parenting.</strong> Imagine two friends that go out for a day of
shopping. For lunch, they stop at their favorite deli and find that it
only has one sandwich left: they decide to share this sandwich between
themselves. After lunch, they go to a clothing store, and both come
across a jacket they love, but of which there is only one left. They
begin arguing over who should get the jacket. Simply put, sandwiches can
be shared and jackets can’t. Issue indivisibility can give rise to
conflict, often leaving all parties involved worse off.<p>
The same can be true in more extreme cases, such as organ donation.
Typically, the available organ supply does not meet the transplant needs
of all patients. Decisions as to who gets priority for transplantation
may favor certain groups or individuals and allocation systems may be
unfair, giving rise to conflict between doctors, patients, and
healthcare administrations. Finally, we can also observe issue
indivisibility in co-parenting contexts. Divorced parents sometimes
fight for full custody rights over their children. This can result in
lengthy and costly legal battles that are detrimental to the family as a
whole.</p>
<p><strong>Effect on the bargaining range.</strong> When agents
encounter issue indivisibilities, they cannot arrive at a reasonable
settlement through bargaining. Sometimes, however, issue indivisibility
can be resolved through side payments. One case in which side payments
were effective was during the Spanish-American War of 1898, fought
between Spain and the United States over the territory of the
Philippines. The conflict was resolved when the United States offered to
buy the Philippines from Spain for 20 million dollars. Conversely, the
Munich Agreement at the dawn of WWII represents a major case where side
payments were ineffective. In an attempt to appease Hitler and avoid
war, the British and French governments reached an agreement with
Germany, allowing them to annex certain parts of Czechoslovakia. This
agreement involved side payments in the form of territorial concessions
to Germany, but it ultimately failed, as Hitler’s aggressive
expansionist ambitions were not satisfied, leading to the outbreak of
World War II. Side payments can only resolve issue indivisibility when
the value of the side payments outweighs the value of the good.<p>
</p>
<figure id="fig:first-strike">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/bargain_range_2.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure 8.12: A first strike can create an advantage by changing the bargaining range.</p>
<!--<figcaption>First strike advantage</figcaption>-->
</figure>
<p><strong>Issue indivisibility and AIs.</strong> Imagine there is a
very powerful AI training system, and that whoever has access to this
system will eventually be able to dominate the world. In order to reduce
the chance of being dominated, individual parties may compete with one
another to secure access to this system. If parties were to split the
AI’s compute up between themselves, it would no longer be as powerful as
it was previously, perhaps not more powerful than their existing
training systems. Since such an AI cannot be divided up among many
stakeholders easily, it may be rational for parties to conflict over
access to it, since doing so ensures global domination.</p>
<h3 id="information-problems">Information problems</h3>
<p>Misinformation and disinformation both involve the spread of false
information, but they differ in terms of intention. Misinformation is
the dissemination of false information, without the intention to
deceive, due to a lack of knowledge or understanding. Disinformation, on
the other hand, is the deliberate spreading of false or misleading
information with the intent to deceive or manipulate others. Both of
these types of information problem can cause bargains to fail,
generating conflict.</p>
<table class="tableLayout">
<thead>
<tr class="header">
<th style="text-align: center;"></th>
<th style="text-align: center;">Distinguish</th>
<th style="text-align: center;">Defect</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Distinguish</td>
<td style="text-align: center;"><span
class="math inline"><em>b</em> − <em>c</em></span></td>
<td style="text-align: center;"><span
class="math inline"> − <em>c</em>(1−<em>a</em>)</span></td>
</tr>
<tr class="even">
<td style="text-align: center;">Defect</td>
<td style="text-align: center;"><span
class="math inline"><em>b</em>(1−<em>a</em>)</span></td>
<td style="text-align: center;">0</td>
</tr>
</tbody>
</table>
<br>
<p>The term <span class="math inline"><em>a</em></span> is the
probability of a player knowing the strategy of its partner. Relevant
for AI since it might reduce uncertainty (though still chaos and
incentives to conceal or misrepresent information or compete).</p>
<p><strong>Misinformation overview.</strong> Uncertainty regarding a
rival’s power or intentions can increase the probability of
conflict<span class="citation"
data-cites="fearon1995rationalist">[1]</span>. Bargaining often requires
placing trust in another not to break an agreement. This harder to
achieve when one agent believes something false about the other’s
preferences, resources, or commitments. This lack of shared, accurate
information can lead to mistrust and a breakdown in negotiations.</p>
<p><strong>Example: Russian invasion of Ukraine.</strong> Incomplete
information may lead overly optimistic parties to make too large
demands, whereas rivals that are tougher than expected reject those
demands and instigate conflict. Examples of misinformation problems
generating conflict may include Russia’s 2022 invasion of Ukraine.
Russian President Putin reportedly miscalculated Ukraine’s willingness
to resist invasion and fight back. With more accurate information
regarding Ukraine’s abilities and determination, Putin may have been
less likely to instigate conflict.</p>
<p><strong>Effect on the bargaining range.</strong> Misinformation can
prevent agents from finding a mutually-agreeable bargaining range, as
shown in Figure 9.14. For example, if each agent believes themself to be
the more powerful party, each may therefore want more than half the
value they are competing for. Thus, each may reject any bargain offer
the other makes, since they expect a better if they opt for conflict
instead.</p>
<figure id="fig:information-problems">
<img src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/bargaining_range_3.png" class="tb-img-full" style="width: 70%"/>
<p class="tb-caption">Figure 8.13: Misinformation can shift players’ perceptions of the bargaining range. If players’
estimates of Green winning are too high, then they will agree on deals more favorable to Green.</p>
<!--<figcaption>Information problems</figcaption>-->
</figure>
<p><strong>Misinformation and AI.</strong> AI technologies may produce
misinformation directly. Examples of this include large language models
hallucinating false facts. Less directly, AI development could also
exacerbate misinformation problems by increasing uncertainty. For
example, military AI applications may make wars more uncertain, and this
may increase the probability of conflict. AI weaponry innovation
presents an opportunity for states to gain power. However, AI
capabilities advances are often highly uncertain—it may be unclear how
powerful a model trained on an order of magnitude of compute would be,
or how far behind adversaries are in their effort to create powerful
models. As automated warfare technologies become more widespread and
sophisticated, nations may struggle to predict their probability of
victory in any given conflict accurately. This increased potential for
miscalculation may make warfare more likely.<p>
There are other ways reducing information problems can reduce AI risk.
If there are substantial existential risks from AIs but this is not
widely agreed on, improving understanding of these risks could help make
different actors (such as the US and China) get better estimates of the
payoff matrix. With better understanding of AI risk, they may recognize
that it is in their self-interest to cooperate (slow down AI development
and militarization) instead of defecting (engaging in an AI race).
Similarly, creating information channels such as summits can increase
understanding and coordination; even if countries do not agree on shared
commitments, the discussions on the sidelines can reduce
misunderstandings and the risk of conflict.</p>
<p><strong>Disinformation overview.</strong> Unlike misinformation,
where false information is propagated without deceptive intention,
disinformation is the <em>deliberate</em> spreading of false
information: the intent is to mislead, deceive or manipulate. Here, we
explore why competitive situations may motivate agents to try to mislead
others or misrepresent the truth, and how this can increase the
probability of conflict.<p>
<strong>Examples: employment and the real estate industry.</strong>
Throughout labor markets, employers and job seekers often encounter
disinformation problems. Employers may intentionally withhold
information about the salary range or offer lower wages than what the
market standard suggests in order to secure lower employment costs. On
the other hand, job seekers might exaggerate their qualifications or
professional experience to increase their chances of getting hired. Such
discrepancies can lead to legal conflicts and high turnover rates.
Alternatively, in the real estate market, disinformation problems can
emerge between sellers and buyers. Sellers sometimes withhold critical
information about the property’s condition to increase the probability
that the property gets purchased. Buyers, on the other hand, may be
incentivized to misrepresent their budget or willingness to pay to
pressure sellers to lower their price. Oftentimes, this can result in
legal battles or disputes as well as the breakdown of property
transactions.</p>
<p><strong>Effect on the bargaining range.</strong> Consider two agents:
A, which is stronger, and B, which is weaker. B demands “X” amount for a
bargaining settlement, but A, as the stronger agent, will not offer this
to avoid being exploited by B. In other words, A thinks B is just trying
to get more for themself to “bait” A or “bluff” by implying that the
bargaining range is lower. But B might not be bluffing and A might not
be as strong as they think they are. Consider the Sino-Indian war in
this respect. At the time, India had perceived military superiority
relative to China. But in 1962, the Chinese launched an attack on the
Himalayan border with India, which demonstrated China’s superior
military capabilities, and triggered the Sino-Indian war. Thus, stronger
parties may prefer conflict if they believe rivals are bluffing.
Whereas, weaker parties may prefer conflict if they believe rivals are
not as powerful as they believe themselves to be.<p>
<strong>Disinformation and AI.</strong> AIs themselves may have
incentives to misrepresent the facts. For example, the agent “Cicero,”
developed by Meta, is capable of very high performance in the board
wargame “Diplomacy.” Its success requires it to misrepresent certain
information to the other players in strategic fashion. We have seen many
other examples of AIs producing disinformation for a variety of reasons,
such as large language models successfully persuading users that they
are conversing with a human. The ability for AIs to misrepresent
information successfully is only likely to increase in future. This
could exacerbate disinformation problems, and thus contribute to greater
risk of conflict by eroding the potential for peaceful negotiation.</p>
<h3 id="factors-outside-of-bargaining-theory">Factors outside of
bargaining theory</h3>
<p><strong>Inequality is another factor that is highly predictive of
conflict.</strong> Crime is a form of conflict. Income and educational
inequality are robust predictors of violent crime <span class="citation"
data-cites="kelly2000inequality">[2]</span>, with an elasticity in
excess of 0.5 (elasticity measures how sensitive one variable is to
changes in another variable) even when controlling for variables such as
race and family composition. Similarly, individuals and families with a
yearly income below $15,000 are three times more likely to be the
victims of violent crime than are individuals and families with a yearly
income over $75,000 <span class="citation"
data-cites="victimrates2011">[3]</span>. Moreover, economists from the
World Bank have also highlighted that the effects of inequality on both
violent and property crime are robust between countries, finding that
when economic growth improves in a country, violent crime rates decrease
substantially <span class="citation"
data-cites="fajnzylber2002inequality">[4]</span>. This is consistent
with evidence at the national level; in the US, for example, the Bureau
of Justice reports that households below the federal poverty level have
a rate of violent victimization that is more than twice as high as the
rate for households above the federal poverty level. Moreover, these
effects were largely consistent between both rural and urban areas where
poverty was prevalent, further emphasizing the robust relationship
between inequality and conflict.</p>
<p><strong>Inequality and relative deprivation.</strong> Relative
deprivation is the perception or experience of being deprived or
disadvantaged in comparison to others. It is a subjective measure of
social comparison, not an objective measure of deprivation based on
absolute standards. People may feel relatively deprived when they
perceive that others possess more resources, opportunities, or social
status than they do. This can lead to feelings of resentment. For
example, “Strain theory,” proposed by sociologist Robert K. Merton,
suggests that individuals experience strain or pressure when they are
unable to achieve socially approved goals through legitimate means.
Relative deprivation is a form of strain, which may lead individuals to
resort to various coping mechanisms, one of which is criminal behavior.
For example, communities with a high prevalence of relative deprivation
can evolve a subculture of violence <span class="citation"
data-cites="horne2009effect">[5]</span>. Consider the emergence of
gangs, in which violence becomes a way to establish dominance, protect
territory, and retaliate against rival groups, providing an alternative
path for achieving a desired social standing.</p>
<p><strong>AIs and relative deprivation.</strong> Advanced future AIs
and widespread automation may propel humanity into an age of abundance,
where many forms of scarcity have been largely eliminated on the
national, and perhaps even global scale. Under these circumstances, some
might argue that conflict will no longer be an issue; people would have
all of their needs met, and the incentives to resort to aggression would
be greatly diminished. However, as previously discussed, relative
deprivation is a subjective measure of social comparison, and therefore,
it could persist even under conditions of abundance.<p>
Consider the notion of a “hedonic treadmill,” which notes that
regardless of what good or bad things happen to people, they
consistently return to their baseline level of happiness. For instance,
reuniting with a loved one or winning an important competition might
cultivate feelings of joy and excitement. However, as time passes, these
feelings dissipate, and individuals tend to return to the habitual
course of their lives. Even if individuals were to have access to
everything they could possibly need, the satisfaction they gain from
having their needs fulfilled is only temporary.<p>
Abundance becomes scarcity reliably. Dissatisfied individuals can be
favored by natural selection over highly content and comfortable
individuals. In many circumstances, natural selection could disfavor
individuals who stop caring about acquiring more resources and expanding
their influence; natural selection favors selfish behavior (for more
detail, see the section "Levels of Selection and Selfish Behaviour"
of <em>Evolutionary Pressures</em>). Even under conditions of abundance,
individuals may still compete for resources and influence because they
perceive the situation as a zero-sum game, where resources and power
must be divided among competitors. Individuals that acquire more power
and resources could incur a long-term fitness advantage over those that
are “satisfied” with what they already have. Consequently, even with
many resources, conflict over resources could persist in the evolving
population.<p>
Relatedly, in economics, the law of markets, also known as “Say’s Law,”
proposes that production of goods and services generates demand for
goods and services. In other words, supply creates its own demand.
However, if supply creates demand, the amount of resources required to
sustain supply to meet demand must also increase accordingly. Therefore,
steady increases in demand, even under resource-abundant conditions will
reliably result in resource scarcity.</p>
<p><strong>Conflict over social standing and relative power may
continue.</strong> There will always be scarcity of social status and
relative power, which people will continue to compete over. Social envy
is a fundamental part of life; it may persist because it tracks
differential fitness. Motivated by social envy, humans establish and
identify advantageous traits, such as the ability to network or climb
the social ladder. Scarcity of social status motivates individuals to
compete for social standing when doing so enables access to larger
shares of available resources. Although AIs may produce many forms of
abundance, there would still be dimensions on which to compete.
Moreover, AI development could itself exacerbate various forms of
inequality to extreme levels. We discuss this possibility in Chapter 9
Governance; Section 3 - Distribution.</p>
<h3 id="summary">Summary</h3>
<p>Throughout this section, we have discussed some of the major factors
that drive conflict. When any one of these factors is present, agents’
incentives to bargain for a peaceful settlement may shift such that
conflict becomes an instrumentally rational choice. These factors
include power shifts, first strike advantages, issue indivisibility,
information problems and incentives to misrepresent, as well as
inequality.<p>
In our discussion of these factors, we have laid the groundwork for
understanding the conditions under which decisions to instigate conflict
may be considered instrumentally rational. This knowledge base allows us
to better predict the risks and probability of AI-driven conflict
scenarios.<p>
First, we covered how power shifts can incentivize AI agents to pursue
conflict, to maintain strategic advantages or deter potential attacks
from stronger rivals, especially in the context of military AI
use.<p>
Second, we explored how the short-lived nature of offensive advantages
may incentivize AIs to pursue first-strike advantages, to degrade or
identify vulnerabilities in adversaries’ capabilities, as may be the
case in cyberwarfare.<p>
Third, we discussed issue indivisibility, imagining a future scenario in
which individual parties must compete for access to a world-dominating
AI. We reasoned that since dividing this AI between many stakeholders
would reduce its power, parties may find it instrumentally rational to
conflict for access to it.<p>
Fourth, we discussed how AIs may make wars more uncertain, increasing
the probability of conflict. We expect that AI weaponry innovation will
present an opportunity for superpowers to consolidate their dominance,
whereas weaker states may be able to quickly increase their power by
taking advantage of these technologies early on. This dynamic may create
a future in which power shifts are uncertain, which may lead states to
incorrectly expect that there is something to gain from going to
war.<p>
Finally, we explored the relationship between inequality and conflict.
We considered how even under conditions of abundance facilitated by
widespread automation and advanced AI implementation, relative
deprivation, and therefore conflict may persist. We also explored the
possibility that AIs may be motivated by social envy to compete with
other humans or AIs for desired social standing. This may result in a
global landscape in which the majority of humanity’s resources are
controlled by selfish, power-seeking AIs.<p>
Though AIs could evolve cooperative tendencies similarly to humans and
other animals, the possibility that they pursue interests and goals that
promote conflict, no matter how small, could pose catastrophic risks to
humans. It is thus important that we understand the drivers of conflict,
especially in the context of advanced future AIs.<p>
</p>
<h2 id="cooperation">8.4.3 Cooperation</h2>
<h3 id="direct-reciprocity">Direct Reciprocity</h3>
<p><strong>Direct reciprocity overview.</strong> One way agents may
cooperate is through <em>direct reciprocity</em>: when one agent
performs a favor for another because they expect the recipient to return
this favor in the future <span class="citation"
data-cites="trivers1971evolution">[6]</span>. We capture this core idea
in idioms like “quid pro quo,” or “you scratch my back, I’ll scratch
yours.” Direct reciprocity requires repeated interaction between the
agents: the more likely they are to meet again in the future, the
greater the incentive for them to cooperate in the present. We have
already encountered this in the iterated Prisoner’s Dilemma: how an
agent behaves in a present interaction can influence the behavior of
others in future interactions . Game theorists sometimes refer to this
phenomenon as the “shadow of the future.” When individuals know that
future cooperation is valuable, they have increased incentives to behave
in ways that benefit both themselves and others, fostering trust,
reciprocity, and cooperation over time. Cooperation can only evolve as a
consequence of direct reciprocity when the probability, <span
class="math inline"><em>w</em></span>, of subsequent encounters between
the same two individuals is greater than the cost-benefit ratio of the
helpful act. In other words, if agent A decided to help agent B at some
cost to themselves, they will only do so when the expected benefit of
agent B returning the favor outweighs the cost of Agent A’s initially
providing it. Thus, we have the rule <span
class="math inline"><em>w</em> > <em>c</em>/<em>b</em></span>; see
Table 8.7 below.<p>
</p>
<br>
<div id="tab:reciprocity">
<table class="tableLayout">
<caption>Table 8.7: Payoff matrix for direct reciprocity games.</caption>
<thead>
<tr class="header">
<th style="text-align: center;"></th>
<th style="text-align: center;">Cooperate</th>
<th style="text-align: center;">Defect</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">Cooperate</td>
<td style="text-align: center;"><span
class="math inline"><em>b</em> − <em>c</em>/(1−<em>w</em>)</span></td>
<td style="text-align: center;"><span
class="math inline"> − <em>c</em></span></td>
</tr>
<tr class="even">
<td style="text-align: center;">Defect</td>
<td style="text-align: center;"><span
class="math inline"><em>b</em></span></td>
<td style="text-align: center;"><span class="math inline">0</span></td>
</tr>
</tbody>
</table>
</div>
<br>
<p><strong>Natural examples of direct reciprocity.</strong> Trees and
fungi have evolved symbiotic relationships where they exchange sugars
and nutrients for mutual benefit. Dolphins use cooperative hunting
strategies where one dolphin herds schools of fish while the others form
barriers to encircle them. The dynamics of the role reversal are decided
by an expectation that other dolphins in the group will reciprocate this
behavior during subsequent hunts. Similarly, chimpanzees engage in
reciprocal grooming, where they exchange grooming services with one
another with the expectation that they will be returned during a later
session <span class="citation"
data-cites="schino2007grooming">[7]</span>.<p>
Direct reciprocity in human society. Among humans, one prominent example
of direct reciprocity is commerce. Commerce is a form of direct
reciprocity "which offers positive-sum benefits for both parties and
gives each a selfish stake in the well-being of the other" <span
class="citation" data-cites="pinker2012better">[8]</span>; commerce can
be a win-win scenario for all parties involved. For instance, if Alice
produces wine and Bob produces cheese, but neither Alice nor Bob has the
resources to produce what the other can, both may realize they are
better off trading. Different parties might both need the good the other
has when they can’t produce it themselves, so it is mutually beneficial
for them to trade, especially when they know they will encounter each
other again in the future. If Alice and Bob both rely on each other for
wine and cheese respectively, then they will naturally seek to prevent
harm to one another because it is in their rational best interest. To
this point, commerce can foster <em>complex interdependencies</em>
between economies, which enhances the benefits gained through mutual
exchange while decreasing the probability of conflict or war.</p>
<p><strong>Direct reciprocity and AIs.</strong> The future may contain
multiple AI agents, many of which might interact with one another to
achieve different functions in human society. Such AI agents may
automate parts of our economy and infrastructures, take over mundane and
time-consuming tasks, or provide humans and other AIs with daily
assistance. In a multi-AI agent system, where the probability that
individual AIs would meet again is high, AIs might evolve cooperative
behaviors through direct reciprocity. If one AI in this system has
access to important resources that other AIs need to meet their
objectives, it may decide to share these resources accordingly. However,
since providing this favor would be costly to the given AI, it will do
so only when the probability of meeting the recipient AIs (those that
received the favor) outweighs the cost-benefit ratio of the favor
itself.</p>
<p><strong>Direct reciprocity can backfire: AIs may disfavor cooperation
with humans.</strong> AIs may favor cooperation with other AIs over
humans. As AIs become substantially more capable and efficient than
humans, the benefit of interacting with humans may decrease. It may take
a human several hours to reciprocate a favor provided by an AI, whereas
it may take an AI only seconds to do so. It may therefore become
extremely difficult to formulate exchanges between AIs and humans that
benefit AIs more than exchanges with other AIs would. In other words,
from an AIs perspective, the cost-benefit ratio for cooperation with
humans is not worth it.</p>
<p><strong>Direct reciprocity may backfire: offers of AI cooperation may
undermine human alliances.</strong> The potential for direct reciprocity
can undermine the stability of other, less straightforward cooperative
arrangements within a larger group, thereby posing a collective action
problem. One example of this involves “bandwagoning.” In the section of
the chapter, we discussed the idea of “balancing” in international
relations: state action to counteract the influence of a threatening
power, such as by forming alliances with other states against their
common adversary <span class="citation"
data-cites="mearsheimer2007structural">[9]</span>. However, some
scholars argue that states do not always respond to threatening powers
by trying to thwart them. Rather than trying to prevent them from
becoming too strong, states may instead “bandwagon”: joining up with and
supporting the rising power to gain some personal benefit.<p>
For instance, consider military coups. Sometimes, those attempting a
takeover will offer their various enemies incentives to join forces with
them, promising rewards to whoever allies with them first. If one of
those being made this offer believes that the usurpers are ultimately
likely to win, they may consider it to be in their own best interests to
switch sides early enough to be on the “right side of history.” When
others observe their allies switching sides, they may see their chances
of victory declining and so in turn decide to defect. In this way,
bandwagoning can escalate via positive feedback.<p>
Bandwagoning may therefore present the following collective action
problem: people may be motivated to cooperate with powerful and
threatening AI systems via direct reciprocity, even though it would be
in everyone’s collective best interest if none were to do so. Imagine
that a future AI system, acting autonomously, takes actions that cause a
large-scale catastrophe. In the wake of this event, the international
community might agree that it would be in humanity’s best interest to
constrain or roll back all autonomous AIs. Powerful AI systems might
then offer some states rewards if they ally with them (direct
reciprocity). This could mean protecting the AIs by simply allowing them
to intermingle with the people, making it harder for outside forces to
target the AIs without human casualties. Or the state could provide the
AIs with access to valuable resources. Instead of balancing (cooperating
with the international community to counteract this threatening power),
these states may choose to bandwagon, defecting to form alliances with
AIs. Even though the global community would all be better off if all
states were to cooperate and act together to constrain AIs, individual
states may benefit from defecting. As before, each defection would shift
the balance of power, motivating others to defect in turn.</p>
<h3 id="indirect-reciprocity">Indirect Reciprocity</h3>
<p><strong>Indirect reciprocity overview.</strong> When someone judges
whether to provide a favor to someone else, they may consider the
recipient’s reputation. If the recipient is known to be generous, this
would encourage the donor (the one that provides the favor) to offer
their assistance. On the other hand, if the recipient has a stingy or
selfish reputation, this could discourage the donor from offering a
favor. In considering whether to provide a favor, donors may also
consider the favor’s effect on their own reputation. If a donor gains a
“helpful and trustworthy” reputation by providing a favor, this may
motivate others to cooperate with them more often. We call this
reputation-based mechanism of cooperation <em>indirect reciprocity</em>
<span class="citation" data-cites="nowak1998evolution">[10]</span>.
Agents may cooperate to develop and maintain good reputations since
doing so is likely to benefit them in the long-term. Indirect
reciprocity is particularly useful in larger groups, where the