E1
251
Linear
and
Nonlinear
Op2miza2on
Chapter
4:
Convex
and
quadra2c
func2ons
1
4.1
Defini2on:
Convex
func2on
A funtion f (x) is called a convex function
if, for every x and y in n , and for every θ in [0,1],
we have f (θ x + (1 − θ )y) ≤ θ f (x) + (1 − θ ) f (y)
(1) (2)
A funtion f (x) is convex
(2) if and only if for every vector
d, the 1D function
gd (α ) = f (x + α d) is convex.
(1)
x θ x + (1 − θ )y y 2
Func2ons
taxonomy
Convex
Non-‐Convex
Linear
Affine
Convex
quadra2c
Others
T
b x b x+c
T f (x) = xT Qx
− 2bT x + c
Q : Positive semidefinite
3
Local
minimum
is
global
minimum
Theorem 4.1A
For a convex function f (x), if x* is a local minimum
in ! n , it is also a global minimum in ! n .
Proof:
1) if x* is not a global minimum, then there exists a y, such that
f (y) < f (x* ).
2) For any point in the line {α y + (1 − α )x*; 0 < α < 1}, we have
by convexity of f (x) that
f (α y + (1 − α )x* ) ≤ α f (y) + (1− α ) f (x* )
!## #"### $ !###"###$
T1 T2
4
3) Since f (y) < f (x* ) by assumption, T2 < f (x* ). This implies
T1 = f (α y + (1 − α )x* ) < f (x* ).
4) For α aribitrarily small, this contradicts the the fact that
x* is a local minimum.
5) Hence, for a convex function f (x) having a local minimum
at x* , there is no such y satisfying f (y) < f (x* ).
Hence, a local minimum is also a global minimum.
5
4.2
Deriva2ve
based
characteriza2on
Theorem
4.2A
A funtion f (x) is convex in ! n if and only if for
every pair of vectors x and y in ! n , we have
f (y) ≥ f (x) + (y − x) ∇f (x)
T
• First order Taylor's approximation is a global under
estimator for all points
• ∇f (x ) = 0 gives global minimizer
*
6
f (x)
(y, f (y))
f (y) + (x − y)T f ′(y)
f (x) ≥ f (y) + (x − y)T f ′(y)
7
Theorem
4.2B
A function f (x) is convex if and only if its Hessian F(x) is positive
semi-definite for all x.
Implica2ons
of
Theorems
4.2A
and
4.2B
in
Unconstrained
op2miza2on
1) The point satisfying ∇f (x) = 0 is a global minimizer
2) If two point satisfy ∇f (x) = 0, then the function value
at the points should be equal
3) If FONC-Min is satisfied, SONC-Min is also satisfied
8
4.3
Examples
of
convex
func2ons
Single
variable
func2ons
f1 (x) = x 2
f2 (x) = x 4
⎧ 0, for x ≤ a
⎪⎪
f3 (x) = ⎨ (x − a)2 , for x > a
⎪
⎪⎩ (x + a) 2
, for x < −a
f3 (x) = eax
f4 (x) = x a
f5 (x) = log(x), for x < 0
f6 (x) = x log(x)
9
Mul2variable
func2ons
f1 (x1 , x2 ) = (x1 + x2 )2 + (x1 − x2 )3
f2 (x1 , x2 ) = (x1 + x2 )2 + (x1 − x2 )4
f3 (x1 , x2 ) = x / x2
2
1
10
Convexity
of
norms
A function f (x) is a norm if it satisfies the following
(1) Positivity: f (x) ≥ 0, f (x) = 0 if and only if x = 0.
(2) Homogeneity: f (rx) = r f(x), r ∈
(3) Triangle inequality: f (x + y) ≤ f (x) + f (y)
Proof of convexity:
f (θ x + (1− θ )y) ≤ f (θ x) + f ((1− θ )y) (Triangle inequality)
≤ θ f (x) + (1− θ ) f (y) (Homogeneity)
11
Convexity
of
Max
func2on
f (x) =max(x1 ,...., xn )
f (θ x + (1 − θ )y) = max (θ x1 + (1 − θ )y1 ,....,θ xn + (1 − θ )yn )
≤ max (θ x1 ,....,θ xn ) +
max ((1 − θ )y1 ,....,(1 − θ )yn )
≤ θ max (x1 ,...., xn ) +
(1-θ )max (y1 ,...., yn )
12
4.4
The
quadra2c
func2on
in
standard
form
4.4.1
Defini2on
f (x) = 0.5xT Qx − bT x + c
x : n × 1 vector of optimization variable
b : n × 1 constant vector
Q : n × n symmetric matrix
c: scalar constant
If Q is not symmetric let Q′ = ( Q + QT ) , and let
1
2
f ′(x) = 0.5xT Q′x − bT x + c
f (x) and f ′(x) are identical.
13
4.4.2
Proper2es
of
symmetric
matrices
(1)
Eigen
values
are
real
(2)
Eigen
vectors
can
be
assumed
to
be
real
(3)
Eigen
vectors
are
orthogonal
(4)
Are
diagonalizable
5) If Q is symmetric, then Q can be expressed
as Q = UΛU T , where U is an orthogonal
matrix.
14
6)
Posi2ve
definiteness
(a) An n × n symmetric matrix, Q, is called positive
definite if v T Qv > 0, for every v ∈! n
(b) Since Q is symmetric, we have Q = UΛU T .
Also any v can be expressed as v = Us, for some s.
(a) and (b) imply
An n × n symmetric matrix, Q, is called positive
definite if and only if s T Λs > 0, for every s ∈! n
which in turn implies
an n × n symmetric matrix, Q, is called positive definite
if and only if all eigen values are postive.
15
4.5
Gradient
and
Hessian
of
the
quadra2c
form
Exercise
Compute
the
gradient
and
Hessian
of
the
following
func2ons:
T
⎡ 4 2 ⎤ ⎡ 1 ⎤
f (x) = x ⎢ T
⎥x − ⎢ ⎥ x+8
⎣ 2 3 ⎦ ⎣ 2 ⎦
T
⎡ 4 1 ⎤ ⎡ 1 ⎤
f (x) = x ⎢ T
⎥x − ⎢ ⎥ x+8
⎣ 3 3 ⎦ ⎣ 2 ⎦
16
4.5.1
Gradient
of
the
general
quadra2c
form
Direct
approach
∂ T
1) show that
∂xl
( x Qx ) = ql T x + ql x
∂ ∂
2) Then
∂xl
( 0.5x T
Qx ) −
∂xl
( b T
x ) = 0.5 ( q l x + q l x ) − bi
T
3) Hence ∇ ( 0.5xT Qx ) − ∇ ( b T x ) = 0.5 ( QT x + Qx ) − b
4) Because of the symmetry of Q, we have
∇f (x) = Qx − b
17
Alterna2ve
approach
f (x) = 0.5x Qx − b x + c
T T
g[x,d ] (α ) = f (x + α d)
=0.5xT Qx + 0.5α 2 dT Qd + α dT Qx
− bT x − α bT d + c
We know that
′ ] (α ) = dT ∇f (x + α d)
g[x,d
′ ] (0) = dT ∇f (x)..........................(E1)
g[x,d
18
′ ] (0) directly:
We can also compute g[x,d
g[x,d ] (α )=0.5xT Qx + 0.5α 2 dT Qd + α dT Qx
− bT x − α bT d + c
′ ] (α )=α dT Qd + dT Qx − bT d
g[x,d
′ ]
g[x,d (0)=d T
Qx − b T
d.....................(E2)
Comparing (E1) and (E2) we get
∇f (x) = Qx − b
19
4.5.2
Hessian
of
the
quadra2c
form
Direct
approach
∇ ( f (x)) = Qx − b ................. (E1)
{F(x)}i, j =
∂2
∂xi ∂x j
f (x) =
∂
∂xi
({∇f (x)} )j
n
From (E1) we get {∇f (x)} j = ∑ Q j,k xk − b j
k=1
Hence {F(x)}i, j =
∂2
∂xi
({∇f (x)} ) = Qj j,i
⇒ F(x) = Q 20
Alterna2ve
approach
f (x) = 0.5xT Qx − bT x + c
g[x,d ] (α ) = f (x + α d)
=0.5x Qx + 0.5α d Qd + α d Qx
T 2 T T
− bT x − α bT d + c
′′ ] (α ) = dT Qd.......................(A)
g[x,d
We know from chapter 2 that
′′ ]
g[x,d (α ) = d T
H(x + α d)d...........(B)
(A) and (B) imply
H(x) = Q.
21
Summary:
Gradient
and
Hessian
of
quadra2c
form
f (x) = 0.5x Qx − b x + c
T T
Gradient:
∇ ( f (x)) = Qx − b
Hessian:
F(x) = Q
22
4.6
Minimiza2on
of
quadra2c
forms
(1) FONC
∇ ( f (x* )) = Qx* − b = 0 is solvable.
⇒ Qx* = b is solvable.
(2a) SONC
the Hessian, F(x) = Q is postive semi-definite.
(2b) SOSC-SMin
Q is postive definite.
23
Write
the
following
func2on
in
quadra2c
form
and
comment
on
the
existence
and
uniqueness
of
minima.
f1 (x1 , x2 ) = x + x 1
2 2
2
f2 (x1 , x2 ) = x + x + x1 x2
2
1
2
2
f3 (x1 , x2 ) = x − x 2
1
2
2
f4 (x1 , x2 ) = x1 x2
f5 (x1 , x2 ) = x + x + 2x1 x22
1
2
2
f6 (x1 , x2 ) = x + x + 2x1 x2 + x1 + x2
2
1
2
2
f7 (x1 , x2 ) = x + x + 2x1 x2 + x1 − x2
2
1
2
2 24
n n
f1 (x) = ∑ ∑xx j k
j=1 k= j+1
n n
f2 (x) = ∑ ∑ x j xk
j=1 k=1
n n n
f3 (x) = ∑ ∑ x j xk + ∑ xk
j=1 k=1 k=1
n n n
f4 (x) = ∑ ∑ x j xk + ∑ (−1) xk n
j=1 k=1 k=1
25
Least squares solution of Ax = b
The least square solution is defined as the
minimizer of
2
f (x) = Ax − b
= ( Ax − b ) ( Ax − b )
T
=xT A T Ax − 2bT Ax + bT b
The minimum is given by ∇f (x* ) = 2A T Ax* − 2A T b = 0
⇒ A T Ax* = A T b, for which the solution always
exists (why ?)
The Hessian F(x) = A T A, which is always postive
semi-definite (why ?)
26
Suppose A has full column rank. Then A T A is invertible.
Then x* = (A T A)−1 A T b.
! = A(A T A)−1 A T b is the least squares approximation
Then b
of b in the column space of A.
! leaves it unchanged.
(1) Applying this approximation on b
i.e., P2 = P, where P = A(A T A)−1 A T .
2) P is called the projection matrix, and least squares
approximation is also called least squares projection (onto the
column space of A).
27
3) The projection error b! = b − b̂ = (I − P)b is orthogonal to
b̂ as a consequence of properties P2 = P and PT = P (verify).
4) Hence least squares projection is synonymous to orthogonal
projection.
5) The converse is also true: any matrix P sastifying P2 = P
and PT = P corresponds to a least squares projection onto a
subspace.
6) R(P) = R(A) and N(P) = N(A T ). Hence R(P)⊥ = N(P).
28
7) Suppose B is a matrix such that its columns span the
the null space of A T . Then P′ = B(BT B)−1 BT is an
orthogonal projection onto the column space of B.
8) P′ = I − P (verify)
9) R(P′ ) = R(B) and N(P′ ) = N(BT ). Hence R(P′ )⊥ = N(P′ ).
10) Combine the relations in (6), (8) and (9) by using the
relationship between B and A.
29
Least squares solution of Ax = b
The regularized least square solution is defined as the
2
minimizer of f (x) = Ax − b + λ x
2
= ( Ax − b ) ( Ax − b ) + λ x
T 2
=x T ( A T A + λ I ) x − 2bT Ax + bT b
The minimum is given by
∇f (x ) = 2 ( A A + λ I ) x − 2A b = 0
* T * T
⇒ ( A T A + λ I ) x* = A T b, for which the solution always
exists.
The Hessian F(x) = ( A A + λ I ) , which is always postive
T
30
definite.
Exercise:
Polynomial
approxima2on
We have samples of a function g(x) at locations
{si ; i = 1,, N }
given by { g(si ) = gi ; i = 1,, N } . Find an
approximating function
n
of the form h(x) = ∑ ak x k−1 , such that
k=1
N
f (a1 ,,an ) = ∑ ( h(si ) − gi )
2
i=1
is minimized.
31
Let a = [a1 an ]T h = [h(s1 )h(sN )]T g = [g(s1 )g(sN )]T
Then
h = Sa,
where {S}i, j = (si ) j−1
Then f (a1 ,,an ) = h − g = Sa − g = ( Sa − g ) ( Sa − g ) .
2 2 T
=aT (ST S)a − 2aT ST g + gT g
32
Exercise:
Polynomial
approxima2on
II
We have samples of a function g(x) at locations
{si ; i = 1,, N }
given by { g(si ) = gi ; i = 1,, N } . Find an
approximating function
n
of the form h(x) = ∑ ak x k−1 , such that
k=1
N n
f (a1 ,,an ) = ∑ ( h(si ) − gi ) + λ ∑ ai2
2
i=1 i=1
is minimized.
Then f (a1 ,,an ) = aT (ST S + λ I)a − 2aT ST g + gT g
33
Exercise:
Polynomial
approxima2on
III
We have samples of a function g(x) at locations
{si ; i = 1,, N } given by { g(si ) = gi ; i = 1,, N } .
Find an approximating function of the form
n
h(x) = ∑ ak x k−1 , such that
k=1
N
f (a1 ,,an ) = ∑ ( h(si ) − gi ) +λ ∫ h 2 (x) dx
2 1
is minimized.
0
i=1
f (a1 ,,an ) =aT (ST S + λ P)a − 2aT ST g + gT g
1
{P}i, j = ∫ (x)i+ j−2 dx
0
34
Exercise:
Polynomial
approxima2on
IV
We have samples of a function g(x) at locations
{si ; i = 1,, N } given by { g(si ) = gi ; i = 1,, N } .
Find an approximating function of the form
n
h(x) = ∑ ak x k−1 , such that
k=1
2
N
⎛ d
2
⎞
f (a1 ,,an ) = ∑ ( h(si ) − gi )
1
+λ ∫ ⎜ 2 h(x)⎟ dx
2
0 ⎝ dx ⎠
i=1
is minimized.
35