-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathaises_4_3
259 lines (248 loc) · 12.2 KB
/
aises_4_3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
<style type="text/css">
table.tableLayout{
margin: auto;
border: 1px solid;
border-collapse: collapse;
border-spacing: 1px;
caption-side: bottom;
}
table.tableLayout > caption.title{
white-space: unset;
max-width: unset;
}
table.tableLayout tr{
border: 1px solid;
border-collapse: collapse;
padding: 5px;
}
table.tableLayout th{
border: 1px solid;
border-collapse: collapse;
padding: 3px;
}
table.tableLayout td{
border: 1px solid;
padding: 5px;
}
</style>
<style>
.visionbox{
border-radius: 15px;
border: 2px solid #3585d4;
background-color: #ebf3fb;
text-align: left;
padding: 10px;
}
</style>
<style>
.visionboxlegend{
border-bottom-style: solid;
border-bottom-color: #3585d4;
border-bottom-width: 0px;
margin-left: -12px;
margin-right: -12px; margin-top: -13px;
padding: 0.01em 1em; color: #ffffff;
background-color: #3585d4;
border-radius: 15px 15px 0px 0px}
</style>
<h1 id="nines-of-reliability">4.3 Nines of Reliability</h1>
<p>In the above discussion of risk evaluation, we have frequently
referred to the probability of an adverse event occurring. When
evaluating a system, we often instead refer to the inverse of this—the
system’s reliability, or the probability that an adverse event will not
happen, usually presented as a percentage or decimal. We can relate
system reliability to the amount of time that a system is likely to
function before failing. We can also introduce a new measure of
reliability that conveys the expected time before failure more
intuitively.</p>
<p><strong>The more often we use a system, the more likely we are to
encounter a failure.</strong> While a system might have an inherent
level of reliability, the probability of encountering a failure also
depends on how many times it is used. This is why, as discussed above,
increasing exposure to a hazard will increase the associated level of
risk. An autonomous vehicle, for example, is much more likely to make a
mistake during a journey where it has to make 1000 decisions, than
during a journey where it has to make only 10 decisions.<p>
</p>
<table class="tableLayout">
<thead>
<tr class="header">
<th style="text-align: center;">% reliability of system</th>
<th style="text-align: center;">% risk of mistake</th>
<th style="text-align: center;">Nines of Reliability</th>
<th style="text-align: center;">A mistake is likely to occur <br> by decision number...</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">0</td>
<td style="text-align: center;">100</td>
<td style="text-align: center;">0</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">50</td>
<td style="text-align: center;">50</td>
<td style="text-align: center;">0.3</td>
<td style="text-align: center;">2</td>
</tr>
<tr class="odd">
<td style="text-align: center;">75</td>
<td style="text-align: center;">25</td>
<td style="text-align: center;">0.6</td>
<td style="text-align: center;">4</td>
</tr>
<tr class="even">
<td style="text-align: center;">90</td>
<td style="text-align: center;">10</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">10</td>
</tr>
<tr class="odd">
<td style="text-align: center;">99</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">2</td>
<td style="text-align: center;">100</td>
</tr>
<tr class="even">
<td style="text-align: center;">99.9</td>
<td style="text-align: center;">0.1</td>
<td style="text-align: center;">3</td>
<td style="text-align: center;">1000</td>
</tr>
<tr class="odd">
<td style="text-align: center;">99.99</td>
<td style="text-align: center;">0.01</td>
<td style="text-align: center;">4</td>
<td style="text-align: center;">10,000</td>
</tr>
</tbody>
</table>
<caption> Table 4.1: From each level of system reliability, we can infer its probability of mistake, “nine or
reliability,” and expected time before failure. </caption>
<p>
</p>
<p>For a given level of reliability, we can calculate an expected time
before failure. Imagine that we have several autonomous vehicles with
different levels of reliability, as shown in Table Nines of Reliability. Reliability is
the probability that the vehicle will get any given decision correct.
The second column shows the complementary probability: the probability
that the AV will get any given decision wrong. The fourth column shows
the number of decisions within which the AV is expected to make one
mistake. This can be thought of as the AV’s expected time before
failure.</p>
<p>
</p>
<figure id="fig:reliability-vs-time">
<embed src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/nines_v2.png" class="tb-img-full" style="width: 75%"/>
<p class="tb-caption">Figure 4.2: Halving the probability of a mistake doubles the expected time before failure. Therefore,
the relationship between system reliability and expected time before failure is non-linear.</p>
<!--<figcaption>System reliability vs expected time before-->
<!--failure</figcaption>-->
</figure>
<p><strong>Expected time before failure does not scale linearly with
system reliability.</strong> We plot the information from the table in
Figure 4.2. From looking at this
graph, it is clear that the expected time before failure does not scale
linearly with the system’s reliability. A 25% change that increases the
reliability from 50% to 75%, for example, doubles the expected time
before failure. However, a 9% change increasing the reliability from 90%
to 99% causes a ten-fold increase in the expected time before failure,
as does a 0.9% increase from 99% to 99.9%.<p>
The closer we get to 100% reliability, the more valuable any given
increment of improvement will be. However, as we get closer to 100%
reliability, we can generally expect that an increment of improvement
will become increasingly difficult to obtain. This is usually true
because it is hard to perfectly eliminate the possibility of any adverse
event. Additionally, there may be risks that we have not considered.
These are called unknown unknowns and will be discussed extensively
later in this chapter.<p>
<strong>A system with 3 “nines of reliability” is functioning 99.9% of
the time.</strong> As we get close to 100% reliability, it gets
inconvenient to use long decimals to express how reliable a system is.
The third column in table “Nines of Reliability” gives us
information about a different metric: the nines of reliability <span
class="citation" data-cites="Tao2021">[1]</span>. Informally, a system
has nines of reliability equal to the number of nines at the beginning
of its decimal or percentage reliability. One nine of reliability means
a reliability of 90% in percentage terms or 0.9 in decimal terms. Two
nines of reliability mean 99%, or 0.99. We can denote a system’s nines
of reliability with the letter <span
class="math inline"><em>k</em></span>; if a system is 90% reliable, it
has one nine of reliability and so <span
class="math inline"><em>k</em> = 1</span>; if it is 99% reliable, it has
two nines of reliability, and so <span
class="math inline"><em>k</em> = 2</span>. Formally, if <span
class="math inline"><em>p</em></span> is the system’s reliability
expressed as a decimal, we can define <span
class="math inline"><em>k</em></span>, the nines of reliability a system
possesses, as:<p>
<span
class="math display"><em>k</em> = − log<sub>10</sub>(1−<em>p</em>).</span></p>
<figure>
<embed src="https://raw.githubusercontent.com/WilliamHodgkins/AISES/main/images/log_nines_v2.png"
style="width:75%" class="tb-img-full"/>
<p class="tb-caption">Nines of reliability vs expected time before failure </p>
<!--<figcaption>Nines of reliability vs<p>-->
<!--expected time before failure</figcaption>-->
</figure>
<p><strong>Adding a ‘9’ to reliability gives a tenfold increase in
expected time before failure.</strong> If it is 99% reliable, it has a
1% probability of failure. If it is 99.9% reliable, it has a 0.1%
probability of failure. Therefore, adding another consecutive 9 to the
reliability corresponds to a tenfold reduction in risk, and therefore a
tenfold increase in the expected time before failure, as we can see in
the graph. This means that the relationship between system reliability
and expected time before failure is not linear. However, the
relationship between nines of reliability and the logarithm of expected
time before failure is linear. Note that if we have a system where a
failure would mean permanent societal-scale destruction, then the
expected time before failure is essentially the <em>expected
lifespan</em> of human civilization. Increasing such a system’s
reliability by one nine would cause a ten-fold increase in the expected
lifespan of human civilization.</p>
<p><strong>The nines of reliability metric can provide a relatively
intuitive sense of the difference between various levels of
reliability.</strong> Looking at the table, we can think of this metric
as a series of levels, where going up one level means a tenfold increase
in expected time before failure. For example, if a system has four nines
of reliability, we can expect it to last 100 times longer before failing
than if it has two. This is an advantage of using the nines of
reliability: thinking logarithmically can give us a better understanding
of the actual value of an improvement than if we say we’re improving the
reliability from 0.99 to 0.9999. In the latter case, the numbers begin
to look the same, but, in terms of expected time before failure, this
improvement is actually more meaningful than going from 0.4 to 0.8.</p>
<p><strong>The nines of reliability are only a measure of probability,
not risk.</strong> In the framing of the classic risk equation, the
nines of reliability only contain information about the probability of a
failure, not about what its severity would be. This metric is therefore
incomplete for evaluating risk. If an AI has three nines of reliability,
for example, we know that it is expected to make 999 out of 1000
decisions correctly. However, three nines of reliability tells us
nothing about how much damage the agent will do if it makes an incorrect
decision, so we cannot calculate the risk involved in using the system.
A game playing AI will present a lot less risk than an autonomous
vehicle even if both systems have three nines of reliability.</p>
<p><strong>Summary.</strong> A system’s nines of reliability indicate
the number of consecutive nines at the beginning of its percentage or
decimal reliability. An additional nine of reliability represents a
reduction of probability of failure by a factor of 10, and thus a
tenfold increase of expected time before failure. Nines of reliability
tell us about the probability of an accident, but do not contain any
information about the severity of an accident, and can therefore not be
used alone to calculate risk. In the case of a risk of ruin, an
additional nine of reliability means a tenfold increase in expected
lifespan.</p>
<br>
<br>
<h3>References</h3>
<div id="refs" class="references csl-bib-body" data-entry-spacing="0"
role="list">
<div id="ref-Tao2021" class="csl-entry" role="listitem">
<div class="csl-left-margin">[1] T.
Tao, <span>“Nines of safety: A proposed unit of measurement of
risk.”</span> [Online]. Available: <a
href="https://terrytao.wordpress.com/2021/10/03/nines-of-safety-a-proposed-unit-of-measurement-of-risk/">https://terrytao.wordpress.com/2021/10/03/nines-of-safety-a-proposed-unit-of-measurement-of-risk/</a></div>
</div>
</div>