-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy pathcs189-convexity.tex
289 lines (246 loc) · 12.7 KB
/
cs189-convexity.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
\term{Convexity} is a term that pertains to both sets and functions.
For functions, there are different degrees of convexity, and how convex a function is tells us a lot about its minima: do they exist, are they unique, how quickly can we find them using optimization algorithms, etc.
In this section, we present basic results regarding convexity, strict convexity, and strong convexity.
\subsubsection{Convex sets}
\begin{figure}
\centering
\begin{subfigure}[b]{0.45\linewidth}
\includegraphics[width=\linewidth]{convex-set}
\caption{A convex set}
\end{subfigure}
\begin{subfigure}[b]{0.45\linewidth}
\includegraphics[width=\linewidth]{nonconvex-set}
\caption{A non-convex set}
\end{subfigure}
\caption{What convex sets look like}
\label{fig:convexset}
\end{figure}
A set $\calX \subseteq \R^d$ is \term{convex} if
\[t\x + (1-t)\y \in \calX\]
for all $\vec{x}, \vec{y} \in \calX$ and all $t \in [0,1]$.
Geometrically, this means that all the points on the line segment between any two points in $\calX$ are also in $\calX$.
See Figure \ref{fig:convexset} for a visual.
Why do we care whether or not a set is convex?
We will see later that the nature of minima can depend greatly on whether or not the feasible set is convex.
Undesirable pathological results can occur when we allow the feasible set to be arbitrary, so for proofs we will need to assume that it is convex.
Fortunately, we often want to minimize over all of $\R^d$, which is easily seen to be a convex set.
\subsubsection{Basics of convex functions}
In the remainder of this section, assume $f : \R^d \to \R$ unless otherwise noted. We'll start with the definitions and then give some results.
A function $f$ is \term{convex} if
\[f(t\vec{x} + (1-t)\vec{y}) \leq t f(\vec{x}) + (1-t)f(\vec{y})\]
for all $\vec{x}, \vec{y} \in \dom f$ and all $t \in [0,1]$.
If the inequality holds strictly (i.e. $<$ rather than $\leq$) for all $t \in (0,1)$ and $\x \neq \y$, then we say that $f$ is \term{strictly convex}.
A function $f$ is \term{strongly convex with parameter $m$} (or \term{$m$-strongly convex}) if the function
\[\x \mapsto f(\x) - \frac{m}{2}\|\x\|_2^2\]
is convex.
These conditions are given in increasing order of strength; strong convexity implies strict convexity which implies convexity.
%\begin{proposition}
%If $f$ is strictly convex, then $f$ is convex.
%\end{proposition}
%\begin{proof}
%Suppose $\x, \y \in \dom f$ and $t \in [0,1]$. We break it down by cases:
%\begin{enumerate}
%\item $\x \neq \y$ and $t \in (0,1)$: the convexity condition is a clear consequence of the strict convexity condition.
%\item $\x = \y$ and $t \in [0,1]$: we have
%\[f(t\x + (1-t)\y) = f(t\x + (1-t)\x) = f(\x) = f(\x) + t f(\x) - t f(\y) = t f(\x) + (1-t)f(\y)\]
%so the condition holds.
%\item $t = 0$: then
%\[f(t\x + (1-t)\y) = f(\y) = t f(\x) + (1-t)f(\y)\]
%\item $t = 1$: similar to the $t = 0$ case.
%\end{enumerate}
%Hence $f$ is convex.
%\end{proof}
%
%\begin{proposition}
%If $f$ is $m$-strongly convex, then $f$ is strictly convex.
%\end{proposition}
%\begin{proof}
%To-do.
%\end{proof}
\begin{figure}
\centering
\includegraphics[width=\linewidth]{convex-function}
\caption{What convex functions look like}
\label{fig:convexfunction}
\end{figure}
Geometrically, convexity means that the line segment between two points on the graph of $f$ lies on or above the graph itself.
See Figure \ref{fig:convexfunction} for a visual.
Strict convexity means that the graph of $f$ lies strictly above the line segment, except at the segment endpoints.
(So actually the function in the figure appears to be strictly convex.)
\subsubsection{Consequences of convexity}
Why do we care if a function is (strictly/strongly) convex?
Basically, our various notions of convexity have implications about the nature of minima.
It should not be surprising that the stronger conditions tell us more about the minima.
\begin{proposition}
Let $\calX$ be a convex set.
If $f$ is convex, then any local minimum of $f$ in $\calX$ is also a global minimum.
\end{proposition}
\begin{proof}
Suppose $f$ is convex, and let $\x^*$ be a local minimum of $f$ in $\calX$.
Then for some neighborhood $N \subseteq \calX$ about $\x^*$, we have $f(\x) \geq f(\x^*)$ for all $\x \in N$.
Suppose towards a contradiction that there exists $\xye \in \calX$ such that $f(\xye) < f(\x^*)$.
Consider the line segment $\x(t) = t\x^* + (1-t)\xye, ~ t \in [0,1]$, noting that $\x(t) \in \calX$ by the convexity of $\calX$.
Then by the convexity of $f$,
\[f(\x(t)) \leq tf(\x^*) + (1-t)f(\xye) < tf(\x^*) + (1-t)f(\x^*) = f(\x^*)\]
for all $t \in (0,1)$.
We can pick $t$ to be sufficiently close to $1$ that $\x(t) \in N$; then $f(\x(t)) \geq f(\x^*)$ by the definition of $N$, but $f(\x(t)) < f(\x^*)$ by the above inequality, a contradiction.
It follows that $f(\x^*) \leq f(\x)$ for all $\x \in \calX$, so $\x^*$ is a global minimum of $f$ in $\calX$.
\end{proof}
\begin{proposition}
Let $\calX$ be a convex set.
If $f$ is strictly convex, then there exists at most one local minimum of $f$ in $\calX$.
Consequently, if it exists it is the unique global minimum of $f$ in $\calX$.
\end{proposition}
\begin{proof}
The second sentence follows from the first, so all we must show is that if a local minimum exists in $\calX$ then it is unique.
Suppose $\x^*$ is a local minimum of $f$ in $\calX$, and suppose towards a contradiction that there exists a local minimum $\xye \in \calX$ such that $\xye \neq \x^*$.
Since $f$ is strictly convex, it is convex, so $\x^*$ and $\xye$ are both global minima of $f$ in $\calX$ by the previous result.
Hence $f(\x^*) = f(\xye)$.
Consider the line segment $\x(t) = t\x^* + (1-t)\xye, ~ t \in [0,1]$, which again must lie entirely in $\calX$.
By the strict convexity of $f$,
\[f(\x(t)) < tf(\x^*) + (1-t)f(\xye) = tf(\x^*) + (1-t)f(\x^*) = f(\x^*)\]
for all $t \in (0,1)$.
But this contradicts the fact that $\x^*$ is a global minimum.
Therefore if $\xye$ is a local minimum of $f$ in $\calX$, then $\xye = \x^*$, so $\x^*$ is the unique minimum in $\calX$.
\end{proof}
It is worthwhile to examine how the feasible set affects the optimization problem.
We will see why the assumption that $\calX$ is convex is needed in the results above.
Consider the function $f(x) = x^2$, which is a strictly convex function.
The unique global minimum of this function in $\R$ is $x = 0$.
But let's see what happens when we change the feasible set $\calX$.
\begin{enumerate}[(i)]
\item $\calX = \{1\}$: This set is actually convex, so we still have a unique global minimum.
But it is not the same as the unconstrained minimum!
\item $\calX = \R \setminus \{0\}$: This set is non-convex, and we can see that $f$ has no minima in $\calX$.
For any point $x \in \calX$, one can find another point $y \in \calX$ such that $f(y) < f(x)$.
\item $\calX = (-\infty,-1] \cup [0,\infty)$: This set is non-convex, and we can see that there is a local minimum ($x = -1$) which is distinct from the global minimum ($x = 0$).
\item $\calX = (-\infty,-1] \cup [1,\infty)$: This set is non-convex, and we can see that there are two global minima ($x = \pm 1$).
\end{enumerate}
\subsubsection{Showing that a function is convex}
Hopefully the previous section has convinced the reader that convexity is an important property.
Next we turn to the issue of showing that a function is (strictly/strongly) convex.
It is of course possible (in principle) to directly show that the condition in the definition holds, but this is usually not the easiest way.
\begin{proposition}
Norms are convex.
\end{proposition}
\begin{proof}
Let $\|\cdot\|$ be a norm on a vector space $V$. Then for all $\x, \y \in V$ and $t \in [0,1]$,
\[\|t\x + (1-t)\y\| \leq \|t\x\| + \|(1-t)\y\| = |t|\|\x\| + |1-t|\|\y\| = t\|\x\| + (1-t)\|\y\|\]
where we have used respectively the triangle inequality, the homogeneity of norms, and the fact that $t$ and $1-t$ are nonnegative.
Hence $\|\cdot\|$ is convex.
\end{proof}
\begin{proposition}
Suppose $f$ is differentiable. Then $f$ is convex if and only if
\[f(\y) \geq f(\x) + \angle{\nabla f(\x), \y - \x}\]
for all $\x, \y \in \dom f$.
\end{proposition}
\begin{proof}
To-do.
\end{proof}
\begin{proposition}
Suppose $f$ is twice differentiable.
Then
\begin{enumerate}[(i)]
\item $f$ is convex if and only if $\nabla^2 f(\x) \succeq 0$ for all $\x \in \dom f$.
\item If $\nabla^2 f(\x) \succ 0$ for all $\x \in \dom f$, then $f$ is strictly convex.
\item $f$ is $m$-strongly convex if and only if $\nabla^2 f(\x) \succeq mI$ for all $\x \in \dom f$.
\end{enumerate}
\end{proposition}
\begin{proof}
Omitted.
\end{proof}
\begin{proposition}
If $f$ is convex and $\alpha \geq 0$, then $\alpha f$ is convex.
\end{proposition}
\begin{proof}
Suppose $f$ is convex and $\alpha \geq 0$. Then for all $\x, \y \in \dom(\alpha f) = \dom f$,
\begin{align*}
(\alpha f)(t\x + (1-t)\y) &= \alpha f(t\x + (1-t)\y) \\
&\leq \alpha\left(tf(\x) + (1-t)f(\y)\right) \\
&= t(\alpha f(\x)) + (1-t)(\alpha f(\y)) \\
&= t(\alpha f)(\x) + (1-t)(\alpha f)(\y)
\end{align*}
so $\alpha f$ is convex.
\end{proof}
\begin{proposition}
If $f$ and $g$ are convex, then $f+g$ is convex.
Furthermore, if $g$ is strictly convex, then $f+g$ is strictly convex, and if $g$ is $m$-strongly convex, then $f+g$ is $m$-strongly convex.
\end{proposition}
\begin{proof}
Suppose $f$ and $g$ are convex. Then for all $\x, \y \in \dom (f+g) = \dom f \cap \dom g$,
\begin{align*}
(f+g)(t\x + (1-t)\y) &= f(t\x + (1-t)\y) + g(t\x + (1-t)\y) \\
&\leq tf(\x) + (1-t)f(\y) + g(t\x + (1-t)\y) & \text{convexity of $f$} \\
&\leq tf(\x) + (1-t)f(\y) + tg(\x) + (1-t)g(\y) & \text{convexity of $g$} \\
&= t(f(\x) + g(\x)) + (1-t)(f(\y) + g(\y)) \\
&= t(f+g)(\x) + (1-t)(f+g)(\y)
\end{align*}
so $f + g$ is convex.
If $g$ is strictly convex, the second inequality above holds strictly for $\x \neq \y$ and $t \in (0,1)$, so $f+g$ is strictly convex.
If $g$ is $m$-strongly convex, then the function $h(\x) \equiv g(\x) - \frac{m}{2}\|\x\|_2^2$ is convex, so $f+h$ is convex.
But
\[(f+h)(\x) \equiv f(\x) + h(\x) \equiv f(\x) + g(\x) - \frac{m}{2}\|\x\|_2^2 \equiv (f+g)(\x) - \frac{m}{2}\|\x\|_2^2\]
so $f+g$ is $m$-strongly convex.
\end{proof}
\begin{proposition}
If $f_1, \dots, f_n$ are convex and $\alpha_1, \dots, \alpha_n \geq 0$, then
\[\sum_{i=1}^n \alpha_i f_i\]
is convex.
\end{proposition}
\begin{proof}
Follows from the previous two propositions by induction.
\end{proof}
\begin{proposition}
If $f$ is convex, then $g(\vec{x}) \equiv f(\A\x + \vec{b})$ is convex for any appropriately-sized $\A$ and $\b$.
\end{proposition}
\begin{proof}
Suppose $f$ is convex and $g$ is defined like so. Then for all $\x, \y \in \dom g$,
\begin{align*}
g(t\x + (1-t)\y) &= f(\A(t\x + (1-t)\y) + \b) \\
&= f(t\A\x + (1-t)\A\y + \b) \\
&= f(t\A\x + (1-t)\A\y + t\b + (1-t)\b) \\
&= f(t(\A\x + \b) + (1-t)(\A\y + \b)) \\
&\leq tf(\A\x + \b) + (1-t)f(\A\y + \b) & \text{convexity of $f$} \\
&= tg(\x) + (1-t)g(\y)
\end{align*}
Thus $g$ is convex.
\end{proof}
\begin{proposition}
If $f$ and $g$ are convex, then $h(\vec{x}) \equiv \max\{f(\vec{x}), g(\vec{x})\}$ is convex.
\end{proposition}
\begin{proof}
Suppose $f$ and $g$ are convex and $h$ is defined like so. Then for all $\x, \y \in \dom h$,
\begin{align*}
h(t\x + (1-t)\y) &= \max\{f(t\x + (1-t)\y), g(t\x + (1-t)\y)\} \\
&\leq \max\{tf(\x) + (1-t)f(\y), tg(\x) + (1-t)g(\y)\} \\
&\leq \max\{tf(\x), tg(\x)\} + \max\{(1-t)f(\y), (1-t)g(\y)\} \\
&= t\max\{f(\x), g(\x)\} + (1-t)\max\{f(\y), g(\y)\} \\
&= th(\x) + (1-t)h(\y)
\end{align*}
Note that in the first inequality we have used convexity of $f$ and $g$ plus the fact that $a \leq c, b \leq d$ implies $\max\{a,b\} \leq \max\{c,d\}$.
In the second inequality we have used the fact that $\max\{a+b, c+d\} \leq \max\{a,c\} + \max\{b,d\}$.
Thus $h$ is convex.
\end{proof}
\subsubsection{Examples}
A good way to gain intuition about the distinction between convex, strictly convex, and strongly convex functions is to consider examples where the stronger property fails to hold.
Functions that are convex but not strictly convex:
\begin{enumerate}[(i)]
\item $f(\x) = \w\tran\x + \alpha$ for any $\w \in \R^d, \alpha \in \R$.
Such a function is called an \term{affine function}, and it is both convex and concave.
(In fact, a function is affine if and only if it is both convex and concave.)
Note that linear functions and constant functions are special cases of affine functions.
\item $f(\x) = \|\x\|_1$
\end{enumerate}
Functions that are strictly but not strongly convex:
\begin{enumerate}[(i)]
\item $f(x) = x^4$.
This example is interesting because it is strictly convex but you cannot show this fact via a second-order argument (since $f''(0) = 0$).
\item $f(x) = \exp(x)$.
This example is interesting because it's bounded below but has no local minimum.
\item $f(x) = -\log x$.
This example is interesting because it's strictly convex but not bounded below.
\end{enumerate}
Functions that are strongly convex:
\begin{enumerate}[(i)]
\item $f(\x) = \|\x\|_2^2$
\end{enumerate}