0% found this document useful (0 votes)
10 views6 pages

Convex Optimization Concepts Explained

This document provides a comprehensive overview of convex optimization, detailing definitions, theorems, and properties related to convex functions and sets. Key concepts include convexity, strict convexity, subgradients, and the relationship between convex functions and their epigraphs. Theorems presented establish conditions for minimization, continuity, and the behavior of convex combinations within convex sets.

Uploaded by

Moqiu Liang
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views6 pages

Convex Optimization Concepts Explained

This document provides a comprehensive overview of convex optimization, detailing definitions, theorems, and properties related to convex functions and sets. Key concepts include convexity, strict convexity, subgradients, and the relationship between convex functions and their epigraphs. Theorems presented establish conditions for minimization, continuity, and the behavior of convex combinations within convex sets.

Uploaded by

Moqiu Liang
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

CONVEX OPTIMIZATION SUMMARY 2.

Convexity
Definition. f : Rn → R is convex if and only if
ANDREW TULLOCH f ((1 − τ )x + τ y) ≤ (1 − τ )f (x) + τ f (y) (2.1)
for all x, y ∈Rn , τ ∈ (0, 1).
A set C ⊆ Rn is convex if and only if δC is convex if and only if
(1 − τ )x + τ y ∈ C for all x, y ∈ C, τ ∈ (0, 1).
f : Rn → R is strictly convex if and only if f is convex and the in-
1. Existence
equality is strict for all x 6= y and τ ∈ (0, 1).
Definition. For C ⊆ Rn , define δC as Pm
( Definition. Let x0 , . . .P, xm ∈ Rn and λ0 , . . . , λm ≥ 0, i=0 λi = 1.
m
0 x∈C The linear combination i=0 λi xi is a convex combination of the points
δC (x) = (1.1)
∞ x∈/C x0 , . . . , xm .

Note x0 minimizes f over C if and only if x0 minimizes f + δC over Theorem. (i) f : Rn → R is convex if and only if
Rn . Xm m
X
f( λ i xi ) ≤ λi f (xi ) (2.2)
Definition. (i) dom f = {x ∈ Rn |f (x) < ∞}, i=0 i=0
(ii) for all m ≥ 0, xi ∈ Rn , λ 0, m
P
( i ≥ λi = 1.
i=0
∅ f ≡∞ (ii) C ⊆ Rn is convex if and only if C contains all convex combinations
arg min f = (1.2)
{x ∈ Rn |f (x) = inf f } f <∞ of its elements.
(iii) f is proper if and only if dom f 6= ∅ and f (x) > −∞ for all Theorem. f : Rn → R is convex implies dom f is convex.
x ∈ Rn .
Theorem. (i) f : Rn → R is convex if and only if epi f is convex in
Definition. For f : Rn → R, denote Rn × R.
(ii) f : Rn → R is strictly convex if and only if {(x, α) ∈ Rn ×R|f (x) <
epi f = {(x, α) ∈ Rn × R|f (x) ≤ α} (1.3)
α} is convex in Rn × R.
Theorem. A set C is an epigraph if and only if for every x there is an (iii) f : Rn → R is convex implies lev≤α f is convex for all α ∈ R.
α ∈ R such that C ∩ (x × R) = [α, ∞] - so all vertical one-dimensional
Theorem. Assume f : Rn → R is convex. Then
sections must be closed upper half-lines. If f is proper then epi f is not
empty and does not include a complete vertical line. (i) arg min f is convex.
(ii) x is a local minimizer of f implies x is a global minimize of f .
Definition. For f : Rn → R, define (iii) f is strictly convex and proper implies f has at most one global
lim inf f (x) = lim inf f (x) = lim inf f (x) (1.4) minimizer.
x→x0 δ↓0 kx−x0 k2 ≤δ k→∞ kx−x0 k2 ≤ 1
k
Theorem. Let I be an arbitrary index set. Then
f is lower semicontinuous at x0 if and only if f (x0 ) ≤ lim inf x→x0 f (x). (i) fi , i ∈ I convex implies supi∈I fi (x) is convex.
f is lower semicontinuous if f is lower semicontinuous at every x0 ∈ Rn . (ii) fi , i ∈ I strictly convex, I finite implies supi∈I fi (x) is strictly
Theorem. convex.
(iii) Ci , i ∈ I is convex implies ∩i∈I Ci is convex.
lim inf f (x) = min{α ∈ R|∃(xk ) → x0 : f (xk ) → α} (1.5) (iv) fk , k ∈ N is convex implies lim supk→∞ fk (x) is convex.
x→x0

In particular, f is lower semi-continuous at x0 if and only if f (x0 ) ≤ Theorem. Assume C ⊆ Rn is open and convex, and f : C → R is
lim inf k→∞ f (xk ) for all convergence sequences xk → x0 . differentiable. Then the following are equivalent:
(i) f is [strictly] convex
Theorem. Let f : Rn → R. Then the following are equivalent: (ii) hy − x, ∇ f (y) − ∇ f (x)i ≥ 0 for all x, y ∈ C [and > 0 if x 6= y]
(i) f is lsc on Rn (iii) f (x) + hy − x, ∇ f (x)i ≤ f (y) for all x, y ∈ C [and < f (y) if x 6= y]
(ii) epi f is closed in Rn × R (iv) If f is additionally twice differentiable, then ∇2 f (x) is positive
(iii) The sub level sets lev≤α f = {x ∈ Rn |f (x) ≤ α} are closed in Rn semidefinite for all x ∈ C.
for all α ∈ R.
Proof. Reduce to one-dimensional sections g(t) = f (x + t(x − y)). Take
g(y)−g(x)
Proof. epi f can only be not closed along vertical lines.  g(x + h), use convexity to show g 0 (x) ≤ y−x
.
Definition. f : Rn → R is level bounded if and only if lev≤α f is bounded Use g(x) = supy∈dom g g(y) + (y − x)g 0 (y) to show the supremum over
for all α ∈ R. convex functions is convex. 

Theorem. f : Rn → R is level-bounded if and only if f (xk ) → ∞ for all Theorem. (i) Assume f1 , . . . , fm : Rn → R are convex, λ1 , . . . , λm ≥
0. Then f = m
P
sequences (xk ) satisfying kxk k2 → ∞. i=1 i fi is convex. If at least one of the fi with
λ
λi > 0 is strictly convex, then f is strictly convex.
Proof. K(α) such that xk ∈ / lev≤α f for k ≥ K(α) by boundedness. In (ii) Assume fi : Rni → R are convex. Then
reverse, find α with lev≤α f unbounded and choose xk in this set with m
X
f (xk ) ≤ α.  f : Rn1 × . . . Rnm → Rf (x1 , . . . , xm ) = fi (xi ) (2.3)
i=1
Theorem. Assume f : Rn
→ R is lsc, level-bounded, and proper. Then
is convex. If all fi are strictly convex, then f is strictly convex.
inf f (x) ∈ (−∞, +∞) and arg min f is nonempty and compact.
(iii) If f : Rm → R is convex, A ∈ Rm×n , b ∈ Rm . Then g(x) =
Proof. Consider ∩α∈R,a>∈f = arg min f , then this is countable intersec- f (Ax + b) is convex.
tion of nonempty, compact sets, so intersection is nonempty. Theorem. (i) C1 , . . . , Cm convex implies C1 × · · · × Cm is convex.
Finiteness of inf is shown as for any x ∈ arg min f 6= ∅, must have (ii) C ⊆ Rn convex, A ∈ Rm×n , b ∈ Rm , L(x) = Ax + b implies L(C)
f (x) = −∞ contradicting properness.  is convex.
Theorem. (i) f, g lsc, proper implies f + g is lsc. (iii) C ⊆ Rm convex, A ∈ Rm×n , b ∈ Rm , L(x) = Ax + b implies
(ii) f lsc, λ ≥ 0 implies λf is lsc L−1 (C) is convex.
(iii) f : Rn → R is lsc and g : Rm → Rn is continuous implies f ◦ g is (iv) C1 , C2 is convex implies C1 + C2 is convex.
lsc. (v) C convex, λ ∈ R implies λC is convex.
1
CONVEX OPTIMIZATION SUMMARY 2

Definition. For a set S ⊆ Rn and x ∈ Rn , define the projection of x 4. Subgradients


onto S as
Definition. For any f : Rn → R and x ∈ Rn ,
Π(y) = arg min kx − yk2 . (2.4)
S x∈S ∂f (x) = {v ∈ R |f (x) + hv, y − xi ≤ f (y)∀y ∈ Rn }
n
(4.1)

Theorem. Assume C ⊆ Rn is convex, closed, and C 6= ∅. Then ΠC is the set of subgradients of f at x.


is single-valued - that is, the projection of x onto C is unique for every Theorem. Assume f, g : Rn → R are convex. Then
x ∈ Rn . (i) If f is differentiable at x, then ∂f (x) = {∇ f (x)}
(ii) If f is differentiable at x and g(x) ∈ R, then ∂(f +g)(x) = ∂g(x)+
Proof. f is lsc, level-bounded, and proper, so arg min f 6= ∅. f is strictly
∇ f (x).
convex so has at most one minimizer. 
Theorem. Assume f : Rn → R is proper. Then x ∈ arg min f ⇐⇒ 0 ∈
Definition. For an arbitrary set S ⊆ Rn ,
∂f (x).
\
con S = C (2.5) Proof.
C convex, S ⊆ C 0 ∈ ∂f (x) ⇐⇒ f (x) ≤ f (y)∀y ⇐⇒ x ∈ arg min f (4.2)
is the convex hull of S. It is the smallest convex set that contains S. where properness was required since arg min +∞ = ∅ by definition. 
Definition. For a convex set C ⊆ Rn and x ∈ C, the normal cone NC (x)
Theorem. con S = { pi=0 λi xi |xi ∈ S, λi ≥ 0, pi=0 λi = 1, p ≥ 0} = D
P P
at x is
Proof. D ⊆ con S as a convex set contains all convex combinations of NC (x = {v ∈ Rn |hv, y − xi ≤ 0∀y ∈ C}) (4.3)
points in S, thus D ⊆ con S. NC (x) = ∅ for x ∈
/ C.
con S ⊆ D as taking linear combination of x, y ∈ D, we have a convex Note that NC (x) is a cone if x ∈ C.
combination of elements in D, which is in D, so con S ⊆ D.
Theorem. Assume C ⊆ Rn is convex with C 6= ∅. Then ∂δC (x) = NC (x).

Proof. For x ∈ C, this follows easily, and for x ∈
/ C, choosing y ∈ C shows
Theorem. We have that ∂δC (x) = ∅. 
(i) cl C = {x ∈ Rn |∀ open neighborhoods N of x, N ∩ C 6= ∅}
Theorem. Assume C ⊆ Rn is closed and convex with C 6= ∅ and x ∈ Rn .
(ii) int C = {x ∈ RRn |∃ open neighborhood N of x such that N ⊆ C}
Then y ∈ ΠC (x) ⇐⇒ x − y ∈ NC (y).
(iii) bnd C = cl C\ C.
1
T Proof. Follows from y being the unique minimizer of f (y 0 ) = 2
ky 0 − xk22 +
Theorem. cl C = S closed, C ⊆ S S. δC (y 0 ). 
Theorem. Assume f : Rn
→ R is proper and convex. Then
3. Cones and Generalized Inequalities (
∅ x∈/ dom f
Definition. K ⊆ Rn is a cone if and only if 0 ∈ K and λx ∈ K for all ∂f (x) = (4.4)
{v ∈ Rn |(v, −1) ∈ Nepi f (x, f (x))} x ∈ dom f
x ∈ K, λ ≥ 0.
A cone K is pointed if and only if m
P If x ∈ dom f then Ndom f (x) = {v ∈ Rn |(v, 0) ∈ Nepi f (x, f (x))}.
0=1 xi = 0, xi ∈ K implies xi = 0
for all i. Proof. x ∈ / dom f : choose f (y) < ∞, then ∂f (x) = ∅.
x ∈ dom f : v T (y − x) + (−1)(−f (x)) ≤ 0∀(y, α) ∈ epi f ⇐⇒ (v, −1) ∈
Theorem. Let K ⊆ Rn be arbitrary. Then the following are equivalent:
Nepi f (x, f (x)). 
(i) K is a convex cone.
Definition. For any set C ⊆ Rn , define the affine hull and relative inte-
PmK + K ⊆ K.
(ii) K is a cone and
rior by
(iii) K 6= ∅ and i=0 αi xi ∈ K for all xi ∈ K and αi ≥ 0 (not
necessarily summing to 1). aff C = ∩A affine, C ⊆ A A (4.5)

Proof. x = m
P Pm
Pai x ∈ K, which is a rint C = {x ∈ Rn |∃ open neighborhood N of x with N ∩ aff C ⊆ C}.
i=0 αi xi ∈ K, ai ≥ 0 ⇐⇒ i=0 α j
j (4.6)
convex combination. 
Theorem. (i) Assume f : Rn → R is convex. Then
Theorem. Assume K is a convex cone. Then K is pointed if and only if
K ∩ −K = {0}. g(x) = f (x + y) ⇒ ∂g(x) = ∂f (x + y) (4.7)
g(x) = f (λx) ⇒ ∂g(x) = λ∂f (λx), λ 6= 0 (4.8)
Proof. x1 +x2 +· · ·+xm ∈ K, x1 6= 0, x2 +· · ·+xm = −x1 , x2 +· · ·+xm ∈
K, so x1 ∈ K ∩ −K.  g(x) = λf (x) ⇒ ∂g(x) = λ∂f (x), λ > 0 (4.9)
(4.10)
Theorem. For a closed convex cone K ⊆ Rn we define the generalized
inequality (ii) Assume f : Rn → R is proper and convex, and A ∈ Rn×m is such
that
x ≤K y ⇐⇒ x − y ∈ K (3.1)
{Ay|y ∈ Rm } ∩ rint dom f (4.11)
Then If x ∈ dom(f ◦ A) = {y ∈ Rm |Ay ∈ dom f }, then ∂(f ◦ A)(x) =
(i) x ≤K x AT ∂f (Ax).
(ii) x ≤K y, y ≤K z ⇒ x ≤K z (iii) Assume f0 , . . . , fm : Rn → R are proper and convex,
Pand rint dom f0 ∩
(iii) x ≤K y ⇒ −y ≤K −x ·P· · ∩ rint dom fm 6= ∅. If x ∈ dom f , then ∂( m i=0 fi )(x) =
m
(iv) x ≤K y, λ ≥ 0 ⇒ λx ≥K λy i=0 ∂fi (x).
(v) x ≥K y, x0 ≥K y 0 ⇒ x + x0 ≥K y + y 0
5. Conjugate Functions
(vi) xk → x, y k → y with xk ≥K y K for all k ∈ N, then x ≥K y.
(vii) x ≥K y, y ≥K ⇒ x = y (antisymmetry) holds if and only if K is Definition. For f : Rn → R, con f (x) = supg ≤ f , g convex is the convex
pointed. hull of f . con f is the greatest convex function majorized by f .

Definition (Kn LP ). For any pointed, closed, convex cone K ⊆ Rm , a Definition. For f : Rn → R, the lower closure cl f is defined as (cl f )(x) =
matrix A ∈ R m×n , and vectors c ∈ Rn , b ∈ Rm , define the conic problem lim inf y→x f (y). Alternatively, (cl f )(x) = supg ≤ f , g lsc g(x).
inf x cT [Link] ≥K b.
Theorem. For f : Rn → R, we have epi(cl f ) = cl(epi f ). Moreover, if f
SOCP ).
Definition (Kn is convex then cl f is convex.
q Proof. (x, α) ∈ cl(epi f ) ⇐⇒ ∃xk → x, αk → α, f (xk ≤ αk ) ⇐⇒
SOCP
Kn = {x ∈ Rn }|xn ≥ x21 + · · · + x2n−1 . (3.2) lim inf y→x f (y) ≤ α ⇐⇒ (x, α) ∈ epi(cl f ). 
CONVEX OPTIMIZATION SUMMARY 3

Theorem. Assume C ⊆ Rn is closed and convex. Then For cones, show g is positively homogeneous lsc convex proper indicator
\ function ⇐⇒ g = δK for Ka convex closed cone.
C= Hb,β (5.1) ? is an indicator function,
Taking any such indicator function δK , then δK
(b,β),C⊆Hb,β ? ?
with δK (v) < ∞ ⇐⇒ v ∈ K . 
where Bb,β = {x ∈ Rn |hx, bi − β ≤ 0}
Definition. For any set S ⊆ Rn define the support function suppS (v) =
Proof. Separating hyperplane theorem - x ∈ C then x is the intersection. ? )(v)
supx∈S hv, xi = (δS
If x ∈
/ C consider the separating hyperplane of C and {x}. 
Definition. A function f : Rn → R is said to be positively homogeneous
Theorem. Assume f : Rn → R is proper, lsc, and convex. Then f (x) = if and only if 0 ∈ dom f and f (λx) = λf (x) for all x ∈ Rn and λ > 0.
supg affine, f ≤ f g(x).
Theorem. The set of positively homogeneous proper lsc convex functions
Proof. f lsc implies epi f is closed, f is convex implies epi f is convex, so and the set of closed convex nonempty sets are in one-to-one correspon-
epi f is the intersection of all half spaces containing it. Show g affine ⇐⇒ dence through the Legendre-Fenchel transform:
there exists (b, c), β with c < −, epi g = H(b,c),β .
δC ↔? supp (5.11)
Then as epi(supg≤f g(x)) = ∩g≤f epi g where g is affine and g ≤ C
f ⇐⇒ epi f ⊆ epi g, we need only show I1 = ∩(b,c),β∈S H(b,c),β = x ∈ ∂ supp(v) ⇐⇒ x ∈ C (5.12)
∩(b,c),β∈S,c<0 H(b,c),β = I2 .  C
supp(v) = hv, xi ⇐⇒ v ∈ NC (x). (5.13)
Definition. Let f : Rn → R, then C

f ? : Rn → R (5.2) In particular, the set of closed convex cones is in one-to-one correspon-


? dence with itself - for any cone K define the polar cone or dual cone as
f (v) = sup hv, xi − f (x) (5.3)
x∈Rn K ? = {v ∈ Rd |hv, xi ≤ 0∀x ∈ K}. Then
is the conjugate to f . The mapping f 7→ f ? is the Legendre-Fenchel δK ↔? δK ? (5.14)
transform. x ∈ NK ? (v) ⇐⇒ v ∈ NK (x) (5.15)
Theorem. Assume f : Rn → R. Then f ? = (con f )? = (cl f )? = TODO: fill next condition/implication in.
(cl con f )? and f ?? = (f ? )? ≤ f . If con f is proper, then f ? and f ??
are proper, lsc, and convex, and f ?? = cl con f . If f : Rn → R is proper, 6. Duality in Optimization
lsc, and convex, then f ?? = f .
Definition. Assume f : Rn × Rm → R is proper, lsc, and convex. Define
Proof. (v, β) ∈ epi f ? ⇐⇒ (v, x) − β ≤ f (x)∀x so (v, β) ∈ epi f ? define the primal problem as
all affine functions majorized by f . Thus for every affine function h, h ≤ inf φ(x), φ(x) = f (x, 0), (6.1)
f ⇐⇒ h ≤ cl f ⇐⇒ h ≤ con f ⇐⇒ h ≤ cl con f . Which gives our x∈Rn
result. the dual problem as
For the inequality, expand f ?? (y) = supv hv, yi + inf x (f (x) − hv, xi) ≤
sup ψ(y), ψ(y) = −f ? (0, y) (6.2)
f (y) at y = x. y∈Rn
As con f is proper, then cl con f is proper, lsc, and convex, so cl con f = and the inf-projections
supg≤cl con f g(x) where g is affine,which equals f ?? by definition.
Finally, if f is convex, then con f = f , so con f is proper, and con f = f p(u) = inf f (x, u) (6.3)
x
is lsc, so f ?? = cl con f = cl f = f .  q(v) = inf f ? (v, y) = − sup(−f ? (v, y)) (6.4)
y y
Theorem. Assume f : Rn → R. Then
f is sometimes called a perturbation function for ψ, and p the associated
(i) con f is not proper implies f ? ≡ +∞ or f ? ≡ −∞.
marginal function.
(ii) In particular, f ? proper implies con f is proper.
Theorem. Assume f : Rn × Rm → R is proper, lsc, and convex. Then
Proof. If con f = ∞ , then f ? (v) = −∞. If con f (x0 ) = −∞, then
f ? (v) = (con f )? (v) ≥ +∞.  (i) φ and −ψ are lsc and convex.
(ii) p, q are convex.
Theorem. Assume f : Rn → R is proper, lsc, and convex. Then ∂f ? = (iii) p(0) and p?? (0) are the optimal values of the primal and dual
(∂f )−1 , specifically, problems -
v ∈ ∂f (x) ⇐⇒ f (x) + f ? (v) = hv, xi ⇐⇒ x ∈ ∂f ? (v). (5.4) p(0) = inf φ(x), p?? (0) = sup ψ(y). (6.5)
x y
Moreover,
(iv) The primal and dual problems are feasible if and only if their
∂f (x) = arg max v 0 , x − f ? (v 0 )∂f ? (x) = arg max v, x0 − f (x) (5.5) associated marginal function contains 0:
v0 x0

Proof. f (x) + f ? (v) = hv, xi iff x ∈ arg maxx0 hv, x0 i − f (x0 ) ⇐⇒ v ∈ inf φ(x) < ∞ ⇐⇒ 0 ∈ dom p (6.6)
x
∂f (x).  sup ψ(y) > −∞ ⇐⇒ 0 ∈ dom q (6.7)
y
Theorem. For a proper, lsc, convex f : Rn → R, we have
Proof. f proper lsc convex implies f ? is proper lsc convex implies π, ψ lsc
(f (·) − ha, ·i)? = f ? (· + a) (5.6) convex.
(f (· + b))? = f ? (·) − h·, bi, (5.7) p, q are convex from the strict epigraph set, with E = {(u, a) ∈ Rm ×
? ?
(f (·) + c) = f (·) − c (5.8) R|p(u) = inf x∈Rn f (x, u) < α} = A(E 0 ), where A is the linear projection
· mapping A(x, u, α) = (u, α), and E 0 is the strict epigraph of f and thus
(λf (·))? = λf star ( ), λ > 0 (5.9) convex, so A(E 0 ) is convex.
λ
· p? (y) = −ψ(y), so p?? (0) = supy ψ(y) .
(λf ( ))? = λf ? (·), λ > 0 (5.10) 0 ∈ dom p ⇐⇒ p(0) < ∞ ⇐⇒ inf x f (x, 0) < ∞ ⇐⇒ inf ψ < ∞. 
λ
Theorem. Let fi : Rni → R, i = 0, . .P . , m be proper and f (x0 , . . . , xm ) = Theorem. Assume f : Rn × Rm → R is proper, lsc, and convex. Then
Pm ? m ?
i=0 fi (xi ). Then f (v1 , . . . , vm ) = i=0 fi (vi ).
weak duality always holds,

Proof. For support functions, f ? (x) = σC (x), so C nonempty closed con- inf φ(x) ≥ sup ψ(y), (6.8)
x y
vex implies f ? is proper lsc convex with f ? (0) = 0 and f ? (λv) = λf ? (v),
so f ? = σC is positively homogeneous. and under certain conditions the infimum and supremum are equal and
If g is positively homogeneous lower semicontinuous, g ? (x) ∈ {0, ∞}, finite - strong duality
so g ? is an indicator function, which must be convex and nonempty by p(0) ∈ R, p lsc in 0 ⇐⇒ inf φ(x) = sup ψ(y) ∈ R. (6.9)
x y
properness lsc convexity of g ? .
One to one follows from δC ?? = δ .
C The difference inf φ − sup ψ is the duality gap.
CONVEX OPTIMIZATION SUMMARY 4

Proof. inf x φ(x) = p(0) ≥ p?? (0) = supy ψ(y) so must show p(0) ∈ R and Proof. Consider g(x, y, u) = f (x, u) − hy, ui, which is proper lsc convex.
p lsc in 0 if and only if p(0) = p?? (0) ∈ R. Then l(·, y) = inf u g(·, y, u) is [Link] fx (y) = f (x, y), then −l(x, ·) =
(⇒) follows as p?? (0) ≤ cl p(0) ≤ p(0), so lim inf y→0 p(y) = cl p(0) = fx? (·) so fx is either +∞ or proper lsc convex, and −l(x, ·) is either −∞ or
p(0) ∈ R, so p is lsc in 0. proper lsc convex, but always lsc convex.
(⇐) follows from the claim that if it holds then cl p is proper lsc con- By subgradient definition, (v, y) ∈ ∂f (x, u) ⇐⇒ inf u0 f (x0 , u0 ) −
vex. Convexity, lsc is clear, and an improper convex lsc function is always hy, u0 i ≥ f (x, u) − hy, ui + hv, x0 − xi∀x0 , which evaluated at x0 = x implies
constant ∞ or −∞, contradiction p(0) ∈ R. So (p? )? (0) = ((cl p)? )? (0) = inf u0 f (x, u0 ) − hy, u0 i = f (x, u) − hy, ui.
cl p(0) = p(0).  Continuing, we obtain (v, y) ∈ ∂f (x, u) ⇐⇒ y ∈ ∂u f (x, u), ∈ ∂x l(x, y),
and the first condition is equivalent to u ∈ ∂fx? (y), or u ∈ ∂(−l(x, ·))?? =
Theorem. Assume f : Rn × Rm → R is proper, lsc, and convex. Then ∂y (−l)(x, y) as required. 
we have the primal-dual optimality conditions,
Definition. For any function l : Rn × Rm → R we say that (x0 , y 0 ) is a
(0, y 0 ) ∈ ∂f (x0 , 0) (6.10) saddle point of l if l(x, y 0 ) ≥ l(x0 , y 0 ) ≥ l(x0 , y) for all x, y. The set of all
⇐⇒ {x0 ∈ arg min φ(x), y 0 ∈ arg max ψ(y), inf ψ(x) = sup ψ(y)} saddle points is denoted by sp l.
x y x y Evquivalently, (x0 , y 0 ) ∈ sp l if inf x l(x, y 0 ) = l(x0 , y 0 ) = supy l(x0 , y).
(6.11)
Theorem. Assume f is proper, lsc, and convex with associated Lagrangian
⇐⇒ (x0 , 0) ∈ ∂f ? (0, y 0 ). (6.12) l. Then φ(x) = supy l(x, y), and ψ(y) = inf x l(x, y), and the primal prob-
The set of primal-dual optimal points (x0 , y 0 ) satisfying this equa- lem is inf x φ(x) = inf x supy l(x, y), and the dual problem is supy ψ(y) =
tion is either empty or equal to (arg min φ) × (arg max ψ). supy inf x l(x, y). Moreover, the optimality condition
{x0 ∈ arg min φ(x), y 0 ∈ arg max ψ(y), inf φ(x) = sup ψ(y)} (6.20)
Proof. Follows from invertibility of subgradient in terms of conjugate func- x y x y
tions, showing f (x0 , 0) = −f ? (0, y 0 ) ⇐⇒ φ(x0 ) = ψ(y 0 ) ∈ R, and equality ⇐⇒ (x0 , y 0 ) ∈ sp l (6.21)
with infinite value is explicitly excluded by definition of arg min.  0 0 0 0
⇐⇒ {0 ∈ ∂x l(x , y ), 0 ∈ ∂y (−l)(x , y )} (6.22)
Theorem. Assume f : Rn × Rm → R is proper, lsc, and convex. Then
Theorem. Assume X ⊆ Rn , Y ⊆ Rm are nonempty, closed, convex, and
(i) 0 ∈ int dom p or 0 ∈ int dom q implies inf x φ(x) = supy ψ(y). (S 0 ) L : X × Y → R is a continuous function with L(·, y) is convex for every y
R
(ii) 0 ∈ int dom p and 0 int dom q implies inf x φ(x) = supy ψ(y) ∈ R. and −L(x, ·) convex for every x. Then l(x, y) = L(x, y) + δX (x) − δY (y)
(S) R with the convention +∞ − ∞ = +∞ on the right, is the Lagrangian to
(iii) 0 ∈ dom p and inf x φ(x) ∈ R if and only if arg maxy ψ(y) is f (x, u) = supy l(x, y) + hu, yi = (−l(x, ·))? (u).
nonempty
R and bounded. (P ) f is proper, lsc, and convex, so the previous result applies with primal
(iv) 0 ∈ dom q and supy ψ(y) ∈ R if and only if arg minx φ(x) is and dual problems φ(x) = δX (x) + supy∈Y L(x, y), ψ(y) = −δY (y) +
R
nonempty and bounded. (D)
x∈X L(x, y). Moreover, if X and Y are bounded, then sp l is nonempty
In particular, if any of S, P, D hold, then strong duality holds - inf φ = and bounded.
sup ψ ∈ R. If S, or (P and D) hold, then there exists x0 , y 0 satisfying the Proof. For a fixed y, inf x l(x, y) = −f ? (0, y) = ψ(y), and for a fixed x,
primal-dual optimality conditions. Also, P implies ∂p(0) = arg maxy ψ(y), supy l(x, y) = f (x, 0) = fx?? (0) = f (x, 0) = φ(x) as fx is either proper lsc
and D implies ∂q(0) = arg minx φ(x). convex or +∞.
The optimality condition is equivalent to (0, y 0 ) ∈ f (x0 , 0) ⇐⇒ 0 ∈
Proof. (i): If p(0) = −∞, then p?? (0) ≤ p(0) = −∞ Use cl p = p on
R ∂x l(x0 , y 0 ), 0 ∈ ∂y (−l)(x0 , y 0 ), which is the saddle point condition. 
dom p, so sup ψ = inf ψ = −∞. Otherwise, p(0) ∈ R so as cl p = p on
dom p, we have p is lsc in 0 and so p?? (0) = p(0) ∈ R.
R
7. Numerical Optimality
This followsRsymmetrically 0 ?
R on f (x, y) = f (y, x).
If both 0 ∈ dom p, 0 ∈ dom q, then +∞ > p(0) ≥ p?? (0) = sup ψ = Definition. For φ : Rn → R, a point x is an -optimal solution if φ(x) −
−q(0) > −∞, which is finite. inf φ ≤ .
R
Nonemptyness and boundedness follows from 0 ∈ dom p if and only Definition. Assume (xk , y k ) is a primal-dual feasible pair - so xk ∈
if ψ is proper lsc convex and level bounded. dom φ and y k ∈ dom ψ. Then φ(xk ) ≥ ψ(y k ), and 0 ≤ φ(xk ) − inf φ ≤
Subdifferential: if (iii) then p(0) ∈ R, so cl p(0) = p(0) ∈ R, cl p is then k
φ(xk ) − ψ y = γ(xk , y k ) := γ. γ is the numerical primal-dual gap. If
proper and lsc convex, so ∂(cl p)(0) = arg max ψ, but ∂p(0) = ∂(cl p)(0).
γ <  then xk is an -optimal solution with optimality certificate y k .
 φ(xk )−ψ(y k )
The normalized gap is γ = γ(xk , y k ) = ψ(y k )
.
Theorem. Assume k : Rn → R and h : Rm → R are both proper, lsc, P np
convex, and A ∈ Rm×n , b ∈ Rm , c ∈ Rn . For f (x, u) = hc, xi + k(x) + Definition. Assume φ(x) = φ0 (x) =
P nd i=1 δgi (x)≤0 , ψ(y) = ψ0 (y) −
δ where dom φ = dom ψ = n and g : Rn → R, h : Rm →
h(Ax − b + u), the primal and dual problems are of the form i=1 hi (x)≤0 0 0 R i i
R are suitable continuous real-valued convex functions, so the primal and
inf φ(x), φ(x) = hc, xi + k(x) + h(Ax − b) (6.13) dual constraints are of the form gi (x) ≤ 0, hi (y) ≤ 0. Then the primal
? ?
sup ψ(y), ψ(y) = −hb, yi − h (y) − k (−A y − c) T
(6.14) and dual infeasibilities are defined as ηp = max{0, g1 (xk ), . . . , gnp (xk )}
y
and ηd = max{0, h1 (y k ), . . . , hnd (y k )}.
with
Z Z 8. First-Order Methods
dom p = (dom h − A dom k) + b (6.15) Definition. For f : Rn → R, we define
Z Z (i) The forward step, Fτk f (xk ) = (I − τk ∂f )xk
dom q = (dom k? − (−AT ) dom h? ) + c (6.16) (ii) The backward step, Bτk f (xk ) = (I + τk ∂f )−1 xk .

and optimality conditions Theorem. If f : Rn → R is proper lsc convex with τ > 0, then the
backward step is Bτ f (x) = arg miny { 21 ky − xk22 + τ f (y)} and is therefore
{−AT y 0 − c ∈ ∂k(x0 ), y 0 ∈ ∂h(Ax0 − b)} (6.17) unique.
⇐⇒ {x0 ∈ arg min φ(x), y 0 ∈ arg max ψ(y), inf φ(x) = sup ψ(y)} (6.18) Proof. y ∈ Bτ f (x) ⇐⇒ 0 ∈ y − x + τ ∂f (y) ⇐⇒ y ∈ arg miny0 { 12 ky 0 −
x y x y
xk22 + τ f (y 0 )} 
⇐⇒ {Ax0 − b ∈ ∂h? (y 0 ), x0 ∈ ∂k? (−AT y 0 − c)} (6.19)
Theorem. Assume f is proper lsc convex and arg min f 6= ∅. The for-
Theorem. Assume f : Rn ×Rm → R is proper, lsc, convex. Define the as- ward step is xk+1 ∈ Fτk f (xk ). The sequence is not unique, can get stuck
sociated Lagrangian as l(x, y) = −f (x, ·)? (y), so l(x, y) = inf u (f (x, u) − if xk is infeasible.
hy, ui). Then l(·, y) is convex for every y, −l(x, ·) is lsc and convex for The backward step is xk+1 = Bτk f (xk ) - which is a unique sequence,
every x, y and f (x, ·) = (−l(x, ·))? , and (v, y) ∈ ∂f (x, u) ⇐⇒ v ∈ and cannot get stuck. Sub-steps are as hard as the original problem (but
∂x l(x, y) and u ∈ ∂y (−l)(x, y). strictly convex).
CONVEX OPTIMIZATION SUMMARY 5

(i) Forward stepping: xk+1 ∈ Fτk f (xk ): Proof. If we linearize ∇ F (Axk+1 −b) = ∇ F (Axk −b+A∆x) ≈ ∇ F (Axk −
(ii) Backward stepping: xk+1 = Bτk f (xk ). b)+∇2 F (Axk −b)A∆x, and solve the constraints AT (y) = c, ty+∇ F (Ax−
If f = g + h, ∂f = ∂g + ∂h with f, g, h proper lsc convex, arg min f 6= ∅, b) = 0. 
we can do: Theorem. Assume 0 < ρ ≤ κ < 10 1
, tk > 0 fixed, and z k = (xk , y k )
(i) Backward-Backward stepping: xk+1 = Bτk h Bτk g (xk ). strictly feasible, so Ax −b ∈ dom F , y ∈ dom f , such that dist(z k , z(tk )) <
k k

(ii) Forward-Backward stepping: xk+1 ∈ Bτk h Fτk g (xk ). If f (x) = κ.


g(x) + δC (x), g differentiable, C 6= ∅ closed and convex, then If we apply a full Newton step with τk = 1 and tk+1 = (1 + √ρ )tk to
θF
xk+1 ∈ arg minx { 12 ky − (xk − τk ∆g(xk ))k22 + δC (x)} = ΠC (xk −
generate z k+1 , then xk+1 , y k+1 are strictly primal and dual feasible, and
τk ∆g(xk )), which is a gradient projection.
dist(z k+1 , z(tk+1 )) > κ as well.
9. Interior-Point Methods 10. Support Vector Machines
Definition. For a cone K we define the canonical barriers F = FK and Definition. The primal formulation of an SVM is
associated parameters θF .
1
inf kwk22
LP = {x ∈ Rn |x , . . . , x ≥ 0}, F (x) =
Pn
(i) K = Kn 1 n i=1 − log xi ,
(10.1)
w,b 2
θF = n.
such that 1 ≤ y i ( w, xi + b) for all 1 ≤ i ≤ n.
q
SOCP = {x ∈ Rn |x x21 + · · · + x2n−1 }, F (x) =
(ii) K = Kn n ≥
The dual formulation is
− log(x2n − x21 − · · · − x2n−1 ), θF = 2. n
SDP = {X ∈ Rn×n |X symmetric positive semidefinite}, 1 X i i 2
(iii) K = Kn infn k y x zi k2 + eT z (10.2)
F (x) = − log det X, θF = n. z∈R 2 i=1
(iv) K = K 1 × K 2 , then FK (x1 , x2 ) = FK 1 (x1 ) + FK 2 (x2 ), with such that zi ≤ 0,
Pn
y i zi = 0.
i=1
θF = θF 1 + θF 2 .
11. Total Variation and Applications
Theorem. If F is a canonical barrier for K, then F is smooth on dom F =
Definition. For u ∈ L1 (Ω, Rm ), the total variation of u is defined as
R
K and strictly convex, F (tx) = F (x) − θF log t for all x ∈ dom F , and
for x ∈ dom F , we have Z
T V (u) = sup hu, div vidx (11.1)
(i) − ∇ F (x) ∈ dom F v∈Cc1 (Ω,Rm×n ),kvk∞ ≤1 Ω
(ii) h∇ F (x), xi = −θF ,
(iii) − ∇ F (− ∇ F (x)) = x, Theorem. Assume A ⊆ Ω is a set so that its boundary is C 1 and satisfies
(iv) − ∇ F (tx) = − 1t ∇ F (x) Hn−1 (Ω ∩ ∂A) < ∞. Define
(
Proof. Differentiate with respect to t and let t = 1.  1 x∈A
1A (x) = (11.2)
0 x∈ /A
Theorem. Consider the problem inf hc, xi s.t. Ax − b ≥K 0. The dual
problem is sup h−b, yi s.t. −AT y = c, y ≥K ? 0. Replacing y by −y and then T V (1A ) = Hn−1 (Ω ∩ ∂A).
assuming K is self-dual K ? = K we obtain the dual as sup hb, yi such that
Proof. The lower bound follows from
AT y = c, y ≥K 0. Z
The primal central path is the mapping T V (1A ) = sup hv, nids (11.3)
v∈Cc1 (Ω,Rn ),kvk∞ ≤1 ∂A
t 7→ x(t) = arg min{−thb, yi + F (y) + δAT y=c } (9.1)
by Gauss’s theorem. 
The dual central path is the mapping t 7→ y(t) = arg min{−thb, yi +
F (y) + δAT y=c } Theorem (Coarea formula). If u ∈ BV (Ω), then T V (1u(x)>t ) < ∞ for
L1 -a.e. t ∈ R, and T V (u) = R T V (1u>t )dt.
R
The primal-dual central path is the mapping t 7→ z(t) = (x(t), y(t))
for some t > 0, if and only if
Definition. For Ω ⊆ Rd and k ≥ 1, define the space BV k (Ω) as BV k =
Ax − b ∈ dom F (9.2) {u ∈ W k−1,1 | ∇k−1 u ∈ BV (Ω, Rd
k−1
)} and the higher-order total varia-
AT y = c (9.3) tion as
Z
ty + ∇ F (Ax − b) = 0 (9.4) T V k (u) = sup u divk vdx = T V (∇k−1 u)
v∈Cck (Ω,Symk (Rd )),kvk∞ ≤1 Ω
Proof. y = − y1 ∆f (Ax − b) ∈ dom F as Ax − b ∈ dom F and dom F is a (11.4)
cone. We also have that if x is on the primal path, then AT y = c, so y is
feasible. 12. Relaxation
For dual optimality, we need 0 ∈ −tb + ∇ F (y) + NAT y=c . As −tb + Definition. Consider the Chan-Vese model
∇ F (y) = −tAx ∈ range A, y is the unique dual solution. Multiplying the Z Z
dual optimality result by AT gives the optimality condition for the primal fCV (C, c1 , c2 ) = (g − c21 dx + = (g − c2 )2 dx + λHd−1 (C)). (12.1)
C Ω\C
central point. 
Theorem. Let c1 , c2 be fixed, and consider inf u:Ω→[0,1],u∈BV (Ω) f (u), f (u) =
Theorem. For feasible x, y (so Ax − b ∈ K, AT y = c, y ∈ K), the duality hu, siL1 + λT V (u). Then if u is a minimizer of f , and u(x) ∈ {0, 1}a.e.,
gap is φ(x) − ψ(y) = hy, Ax − bi. Moreover, for points (x(t), y(t)) on the then C is a minimizer of fCV (·,c1 ,c2 ) .
central path, the duality gap is φ(x(t)) − ψ(y(t)) = θtF .
Proof. Follows by definitions - must have u = 1C . 
Proof.
Definition. Let C = BV (Ω, [0, 1]) = {u ∈ BV (Ω)|u(x) ∈ [0, 1]a.e.}. Then
φ(x) − ψ(y) = hc, xi − hb, yi (9.5) for u ∈ C, α ∈ [0, 1]. Define uα = 1{u>α} . Then f : C → R satisfies the
= hy, Ax − bi (9.6) generalized coarea condition if and only if
  Z 1
1
= − ∇ F (Ax(t) − b), Ax(t) − b (9.7) f (u) = f (uα )dα (12.2)
t 0
θF for all u ∈ C.
= . (9.8)
t Theorem. Let f ∗u = T V (u), the condition is the cooarea formula. As the

condition is additive, we need only show Ω s(x)u(x) = 01 Ω s(x)1{u(x)>α} dα,
R R R
1 ∞
Theorem. We define kvk?x = (v T ∇2 F (Ax − b)−1 v) , z = (x, y), so z(t)
2 where we use Fubini due to s ∈ L (Ω).
is the primal-dual central path, and dist(z, z(t)) = kty + ∇ F (Ax − b)k?x .
Theorem. Assume f : C → R satisfies the generalized coarea condition,
Then for Ax − b ∈ dom F , y ∈ dom F , AT y = c, we have dist(z, z(t)) ≤ 1
and u? satisfies u? ∈ arg minu∈C f (u). Then for almost every α ∈ [0, 1],
implies φ(x) − ψ(y) ≤ 2(φ(x(t)) − ψ(y(t))) = 2 θtF . the thresholded function satisfies u?α ∈ arg minu∈BV (Ω,{0,1}) f (u).
CONVEX OPTIMIZATION SUMMARY 6

Proof. Follows by considering the set S = {α ∈ [0, 1]|f (u? ) ≤ f (u?α ) − }


for some  > 0, and showing that this implies f (u? ≤ 01 f (u?α ))dα −
R

L1 (S ) which contradicts the generalized coarea formula. 

References

You might also like