-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathaises_4_8
231 lines (226 loc) · 15.3 KB
/
aises_4_8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
<h1 id="conclusion">4.8 Conclusion</h1>
<h2 id="summary">4.8.1 Summary</h2>
<p>In this chapter, we have explored various methods of analyzing and
managing risks inherent in systems. We began by looking at how we can
break risk down into two components: the probability and severity of an
accident. We then went into greater detail, introducing the factors of
exposure and vulnerability, showing how each affects the level of risk
we calculate. By decomposing risk in this way, we can identify measures
we can take to reduce risks. We also considered the concept of ability
to cope and how it relates to risk of ruin.<p>
Next, we described a metric of system reliability called the "nines of
reliability". This metric refers to the number of nines at the beginning
of a system’s percentage or decimal reliability. We found that adding
another nine of reliability is equivalent to reducing the probability of
an accident by a factor of 10, and therefore results in a tenfold
increase in expected time before failure. A limitation of the nines of
reliability is that they only contain information about the probability
of an accident, but not its severity, so they cannot be used alone to
calculate risk.<p>
We then listed several safe design principles, which can be incorporated
into a system from the design stage to reduce the risk of accidents. In
particular, we explored redundancy, separation of duties, the principle
of least privilege, fail-safes, antifragility, negative feedback
mechanisms, transparency, and defense in depth.<p>
To develop an understanding of how accidents occur in systems, we next
explored various accident models, which are theories about how accidents
happen and the factors that contribute to them. We reviewed three
component failure accident models: the Swiss cheese model, the bow tie
model, and fault tree analysis, and considered their limitations, which
arise from their chain-of-events style of reasoning. Generally, they do
not capture how accidents can happen due to interactions between
components, even when nothing fails. Component failure models are also
unsuited to modeling how the numerous complex interactions and feedback
loops in a system can make it difficult to identify a root cause, and
how it can be more fruitful to look at diffuse causality and systemic
factors than specific events.<p>
After highlighting the importance of systemic and human factors, we
delved deeper into some examples of them, highlighting regulations,
social pressure, competitive pressures, safety costs, and safety
culture. We then moved on to look at systemic accident models that
attempt to take these factors into consideration. Normal Accident Theory
states that accidents are inevitable in complex and tightly coupled
systems. On the other hand, HRO theory points to certain high
reliability organizations as evidence that it is possible to reliably
avoid accidents by following five key management principles:
preoccupation with failure, reluctance to simplify interpretations,
sensitivity to operations, commitment to resilience, and deference to
expertise. While these features can certainly contribute to a good
safety culture, we also looked at the limitations and the difficulties
in replicating some of them in other systems.<p>
Rounding out our discussion of systemic factors, we outlined three
accident models that are grounded in complex systems theory. Rasmussen’s
Risk Management Framework (RMF) identifies six hierarchical levels
within a system, identifying actors at each level who share
responsibility for safety. The RMF states that a system’s operations
should be kept within defined safety boundaries; if they migrate outside
of these, then the system is in a state where an event at the sharp end
could trigger an accident. However, the factors at the blunt end are
also responsible, not just the sharp-end event.<p>
Similarly, STAMP and the related STPA analysis method view safety as
being an emergent property of an organization, detailing different
levels of organization within a system and defining the safety
constraints that each level should impose on the one below it.
Specifically, STPA builds models of: the organizational safety
structure; the dynamics and pressures that can lead to deterioration of
this structure; the models of the system that operators must have, and
the necessary communication to ensure these models remain accurate over
time; and the broader social and political context the organization
exists within.<p>
Finally, Dekker’s Drift into Failure (DIF) model emphasizes
decrementalism: the way that a system’s processes can deteriorate
through a series of minor changes, potentially causing the system’s
migration to an unsafe state. This model warns that each change may seem
insignificant alone, so organizations might make these changes one at a
time in isolation, creating a state of higher risk once enough changes
have been made.<p>
As a final note on the implications of complexity for AI safety, we
considered the broader societal context within which AI technologies
will function. We discussed how, in this uncontrolled environment,
different, seemingly lower-level risks could interact to produce
catastrophic threats, while chaotic circumstances may increase the
likelihood of AI-related accidents. For these reasons, it makes sense to
consider a wide range of different threats of different magnitudes in
our approach to mitigating catastrophic risks, and we may find that
broader interventions are more fruitful than narrowly targeted
ones.<p>
In the second half of this chapter, we focused in on a particular class
of events called tail events and black swans, and explored what they
mean for risk analysis and management. We began this discussion by
contrasting long-tailed and thin-tailed distributions; long-tailed
distributions are subject to the possibility of rare, highly extreme
events that can dominate the overall impact, while in thin-tailed
distributions, most events are around the same order of magnitude and no
single event can determine the outcome. We also looked at some general
characteristics of different systems, finding that highly connected
systems can give rise to multiplicative phenomena, allowing events to
scale more extremely and become tail events.<p>
Next, we described black swans as a subset of tail events that are not
only rare and high-impact, but also particularly difficult to predict.
These events seem to happen largely “out of the blue” for most people,
and may indicate that our understanding of a situation is inaccurate or
incomplete. These events are also referred to as unknown unknowns, which
we contrasted with known unknowns, which we may not fully understand,
but are at least aware of.<p>
We examined how tail events and black swans can pose particular
challenges for some traditional approaches to evaluating and managing
risk. Certain methods of risk estimation and cost-benefit analysis rely
on historical data and probabilities of different events. However, tail
events and black swans are rare, so we may not have sufficient data to
accurately estimate their likelihood, and even a small change in
likelihood can lead to a big difference in expected outcome.<p>
We also considered the delay fallacy, showing that waiting for more
information before acting might mean waiting until it is too late. We
discussed how an absence of evidence of a risk cannot necessarily be
taken as evidence that the risk is absent. By looking at hypothetical
situations where catastrophes are avoided thanks to safety measures, we
explained how the preparedness paradox can make these measures seem
unnecessary, when in fact they are essential.<p>
Having explored the importance of taking tail events and black swans
into consideration, we identified some circumstances that indicate we
may be at risk of these events. We concluded that it is reasonable to
believe AI technologies may pose such a risk, due to the complexity of
AI systems and the systems surrounding them, the highly connected nature
of the social systems they are likely to be embedded in, and the fact
that they are relatively new, meaning we may not yet fully understand
all the ways they might interact with their surroundings.<p>
Finally, we summarized some of the actions that can be taken to reduce
the risk of black swan events transpiring. These include techniques for
putting more black swans on our radar, or turning them into known
unknowns, such as conducting exercises to improve our safety
imagination, engaging in horizon scanning, and employing red teams
tasked with finding ways to sabotage a system. We reiterated the
importance of incorporating safe design principles and improving general
systemic factors for improving safety.<p>
We also described how following the precautionary principle can reduce
our exposure to risks, and how improving decision-making processes
around the use of AI and changing researchers’ incentives can improve
prospects.</p>
<h2 id="key-takeaways">4.8.2 Key takeaways</h2>
<p><strong>Tail events and black swans require a different approach to managing risks.</strong> <span
class="citation" data-cites="Marsden2017blackswan">[1]</span> Some decisions require vastly more caution than others: for instance,
paraphrasing Richard Danzig, you should not ``need evidence'' that a gun is loaded to avoid playing Russian roulette <span class="citation"
data-cites="danzig2018technology">[2]</span>. Instead, you should need evidence of safety. In situations where we are subject to the possibility
of tail events and black swans, this evidence might be impossible to find.</p>
<p>One element of good decision making when dealing with long-tailed scenarios is to exercise more caution than we would otherwise. In the case of new technologies such
as AI systems, this might mean not prematurely deploying them on a large scale. In some situations, we can be extremely wrong and things can still
end up being fine; in others, we can be just slightly wrong but suffer disastrous consequences. We must also be cautious while trying to solve our problems.
For example, while climate change poses a serious threat, many experts believe it would be unwise to attempt to fix it quickly by rushing into geoengineering
solutions like spraying sulfur particles into the atmosphere. There may be an urgent need to solve the problem, but we should take care that we are not pursuing
solutions that could cause many other problems.</p>
<p>Although tail events may be challenging to predict, there are a variety of techniques discussed in this chapter that can help with this, such as expanding our
safety imagination, conducting horizon scanning exercises, and red-teaming.</p>
<p><strong>Incorporating safe design principles can improve general
safety.</strong> Following the safe design principles listed earlier in this chapter can
be a good first step towards reducing systemic risks from AI, with the caveat that we should think carefully
about which defense features are appropriate, and avoid too much
complexity. In particular, focusing on increasing the controllability of
the system might be a good idea. This can be done by adding loose
coupling into the system, by supporting human operators to notice
hazards and act on them early, and by devising negative feedback
mechanisms that will down-regulate processes if control is lost.<p>
Consider in detail the principle of least privilege. For one, it tells
us that we should be cautious about giving AIs too much power, to limit
the extent to which we are exposed to their tail risks. We might be
concerned that AIs become enmeshed within society with the capacity to
make big changes in the world when they do not need such access to
perform their assigned duties. Additionally, for particularly powerful
AI systems that have useful capabilities, it might be reasonable to keep
them relatively isolated from wider society, and accessible only to
verified individuals who have demonstrable and specific needs for such
AIs. In general, being conservative about if and how we unleash
technologies can reduce our exposure to black swans.</p>
<p><strong>Targeting systemic factors is an important approach to reducing overall
risk.</strong> As we discussed, tackling systemic safety issues can be
more effective than focusing on details in complex systems. This can
reduce the risk of both foreseeable accidents and black swans.</p>
<p>Raising general awareness of risks associated with technologies can produce
social pressures, and bring organizations operating those technologies
under greater scrutiny. Developing and enforcing industry regulations
can help ensure organizations maintain appropriate safety standards, as
can encouraging best practices that improve safety culture. If there are
ways of reducing the safety costs (e.g. through technical research), this can make it more likely that an
organization will adopt them, also improving general safety.<p>
<p>Other systemic factors to pay attention to include competitive
pressures. These can undermine general safety by compelling management
and employees to cut corners, whether to increase rates of production or
to reach a goal before competitors. If there are ways of reducing these
pressures and encouraging organizations to prioritize safety, this could
substantially lessen overall risk.</p>
<p><strong>Improving the incentives of decision-makers and reducing moral hazard can help to address systemic risks.</strong>
We might want to influence the incentives of researchers developing AI. Researchers might
currently be focused on increasing profits and reaching goals before
competitors, pursuing scientific curiosity and a desire for
rapid technological acceleration, or developing the best capabilities in deep
learning models to find out what is possible. In this sense, these
researchers might be somewhat disconnected from the risks they could be
creating and the externalities they are imposing on the rest of society,
creating a moral hazard. Encouraging more consideration of the possible
risks, perhaps by making researchers liable for any consequences of the
technologies they develop, could therefore improve general safety.</p>
<p>
Similarly, we might be able to improve decision-making by changing who has a say in
decisions, perhaps by including citizens in decision-making processes,
not only officials and scientists <span class="citation"
data-cites="Marsden2017blackswan">[1]</span>. This reduces moral hazard
by including the stakeholders that have “skin in the game.” It can also
lead to better decisions in general due to the wisdom of crowds, the
phenomenon where crowds composed of diverse individuals make much better
decisions collectively than most members within it, when the conditions
are right.</p>
<p>In summary, while AI poses novel challenges, there is much we can learn from existing approaches to safety engineering and risk management in order to reduce the risk of catastrophic outcomes.</p>
<br>
<br>
<h3>References</h3>
<div id="refs" class="references csl-bib-body" data-entry-spacing="0"
role="list">
<div id="ref-Marsden2017blackswan" class="csl-entry" role="listitem">
<div class="csl-left-margin">[1] E.
Marsden, <span>“Black swans: The limits of probabilistic
modelling.”</span> Accessed: Jul. 31, 2017. [Online]. Available: <a
href="https://risk-engineering.org/black-swans/">https://risk-engineering.org/black-swans/</a></div>
</div>
<div id="ref-danzig2018technology" class="csl-entry" role="listitem">
<div class="csl-left-margin">[2] R.
Danzig, <span>“Technology roulette.”</span> Center for a new American
Security, 2018.</div>