-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathaises_7_6
99 lines (95 loc) · 6.23 KB
/
aises_7_6
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
<h1 id="conclusion">7.6 Conclusion</h1>
<p>In this chapter, we considered a variety of multi-agent dynamics in
biological and social systems. Our underlying thesis was that these
dynamics might produce undesirable outcomes with AI, mirroring patterns
observable in nature and society.<p>
</p>
<h3 id="game-theory">Game theory</h3>
<p>We began with a simple game, the Prisoner’s Dilemma, observing how
even rational agents may reach equilibrium states that are detrimental
to all. We then proceeded to build upon this. We considered how the
dynamics may change when the game is iterated and involves more than two
agents. We found that uncertainty about the future could foster rational
cooperation, though defection remains the dominant strategy when the
number of rounds of the game if fixed and known.<p>
We used these games to model collective action problems in the real
world, like anthropogenic climate change, public health emergency
responses, and the failures of democracies. The collective endeavors of
multi-agent systems are often vulnerable to exploitation by free riders.
We drew parallels between these natural dynamics and the development,
deployment, and adoption of AI technologies. In particular, we saw how
AI races in corporate and military contexts can exacerbate AI risks,
potentially resulting in catastrophes such as autonomous economies or
flash wars. We ended this section by exploring the emergence of
extortion as a strategy that illustrated a grim possibility for future
AI systems: AI extortion could be a source of monumental disvalue,
particularly if it were to involve morally valuable digital minds.
Moreover, AI extortion might persist stably throughout populations of AI
agents, which could make it difficult to eradicate, especially if AIs
learn to deceive or manipulate humans to obscure their true
intentions.<p>
</p>
<h3 id="cooperation">Cooperation</h3>
We then moved to an investigation of cooperation. Drawing from biological systems and human societies, we illustrated an array of mechanisms that may promote cooperation between AIs. For each mechanism, however, we also highlighted some associated risks. These risks included nepotism, in-group favoritism, extortion, and the incentives to behave ruthlessly. Thus, we found that merely ensuring that AIs behave cooperatively may not be a total solution to our collective action problems. Rather, we need a more nuanced view of the potential benefits and risks of promoting cooperative AI via particular mechanisms.<p>
<h3 id="conflict">Conflict</h3>
<p>We next turned to a closer examination of the drivers of conflict in the real world, and exploring how future AI systems may
interact with these forces. Using the framework
of bargaining theory, we discussed understand why, despite being costly for all
involved, rational agents may sometimes opt for conflict over peaceful
bargaining. We illustrated this idea by looking at how various factors
can affect competitive dynamics, including commitment problems (such as
power shifts, first-strike advantages, and issue indivisibility),
information problems, and inequality. These factors may drive AIs to
instigate, promote, or exacerbate conflicts, with potentially
catastrophic effects.<p>
</p>
<h3 id="evolutionary-pressures">Evolutionary pressures</h3>
<p>We began this section by examining generalized Darwinism: the idea
that Darwinian mechanisms are a useful way to explain many phenomena
outside of biology. We explored examples of evolution by natural
selection operating in non-biological domains, from culture, academia,
and industry. By formalizing this idea using Lewinton’s conditions and
the Price equation, we saw how AIs and their development may be subject
to Darwinian forces.<p>
We then turned to the ramifications of natural selection operating on
AIs. We first looked at what AI traits or strategies natural selection
may tend to favor. Using an information’s eye view of evolution by
natural selection, we found that internal conflict can arise where the
interests of the propagating information clash with those of the larger
entity that contains it. Intrasystem goal conflict could arise in AI
systems, distorting or subverting goals even when human operators have
specified them correctly. Moreover, Darwinian forces strongly favor
selfish traits over altruistic ones. Although on the level of an
individual organism, individuals may behave altruistically under
specific conditions (such as genetic relatedness), on the level of
information, evolution by natural selection tends to produce
selfishness. Thus, we might expect a future shaped by natural selection
to be dominated by selfish behavior.<p>
</p>
<h3 id="concluding-remarks">Concluding remarks</h3>
<p>In summary, this chapter explored various kinds of collective action
problems: intelligent agents, despite acting rationally and in
accordance with their own self-interest, can collectively produce
outcomes that none of them wants, even when they could seemingly have
achieved preferable alternative outcomes. Even when we as individuals
share similar goals, system-level dynamics can override our intentions
and create undesirable results.<p>
This insight is of vital importance when envisioning a future with
powerful AI systems. AIs, individual humans, and human agencies will all
conduct their actions in light of how others are behaving and how each
expects others to behave in future. The total risk of this multi-agent
system greater the sum of its individual parts. Dynamics between
multiple human agencies generate races in corporate and military
settings. Dynamics between multiple AIs may generate evolutionary
pressure for immoral behaviors, particularly selfishness, free-riding,
deception, conflict, and extortion. We cannot address all the risks
posed by AI simply by focusing on the outcomes of agents acting in
isolation. The safety of AI systems will not be guaranteed solely by
aligning each AI agent to well-intentioned operators. It is an essential
component of ensuring our safety, and a valuable future, that we
consider these multi-agent dynamics carefully. These dynamics represent
a common problem—clashes between individual and collective interests. We
must find innovative, system-level solutions to ensure that the
development and interaction of AI agents lead to beneficial outcomes for
all.<p>
</p>