0% found this document useful (0 votes)

10 views9 pages

Understanding Convex Functions

The document discusses convex functions, providing three characterizations of convexity: zeroth-order, first-order, and second-order definitions. It also covers properties of convex functions, examples, and operations that preserve convexity, along with concepts of smoothness, strong convexity, and strict convexity. Additionally, it presents optimality conditions for minimizing convex functions over convex sets.

Uploaded by

alypaty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views9 pages

Understanding Convex Functions

Uploaded by

alypaty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

10-725: Convex Optimization Spring 2023

Lecture 2: January 19
Lecturer: Siva Balakrishnan

2.1 Convex Functions

There are three characterizations of convexity that you should be familiar with:

1. No Assumptions (Zeroth-Order): This is the definition we discussed last time,

i.e. f is convex if its domain is a convex set and, for any x, y ∈ dom(f ),

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).

2. Differentiable (First-Order): Suppose our function f has a derivative (at all points
in its domain) then, f is convex if its domain is a convex set and, for any x, y ∈ dom(f ),

f (y) ≥ f (x) + h∇f (x), y − xi.

3. Twice Differentiable (Second-Order): A function f is convex, if its domain is a

convex set and, for any x ∈ dom(f ),

∇2 f (x) 0.

In your HW you will explore some connections between these definitions (in particular,
showing that (3) =⇒ (2) =⇒ (1)). You can find the proof that (2) =⇒ (1) in the BV
textbook (but it will still likely be a part of your HW).
It is also worth noting that there is a definition analogous to (2) above in the case when the
function is not differentiable everywhere.

2’ Non-Smooth: A function f is convex if its domain is a convex set, and if at every

point x ∈ dom(f ), there exists a vector gx such that, for any y ∈ dom(f ),

f (y) ≥ f (x) + hgx , y − xi.

It is worth noting that if f is differentiable at x, then there is only one vector which will
satisfy the above definition and it will coincide with the usual gradient, i.e. gx = ∇f (x).
Any gx which satisfies the above property is called a subgradient of f at x. The set
of all subgradients at a point x is called the subdifferential of f at x and it will be
denoted as ∂f (x).

2-1
2-2 Lecture 2: January 19

Except for some very pathological functions (and only at the boundary of their domain)
subgradients always exist. Formally, one can for instance show that a subgradient gx
of a convex function f at x exists if x is in the interior of their domain.

Notational Note: I will often stop adding the qualifiers “for x, y ∈ dom(f )”. One way to
make this precise (I, and most textbooks do this implicitly) is to allow f to be whats called
an extended function, and define it to be ∞ outside its (effective) domain. This won’t change
any of its convexity properties, and things like the first and zeroth-order characterizations
will now make sense for any x, y ∈ Rd .

2.1.1 An example

Let us consider the quadratic function f (x) = 21 xT Qx + aT x + b where Q 0.

Applying definition (3) is easiest, since ∇2 f (x) = Q and this is PSD.
Now, let us try to apply definition (2). It is a differentiable function, with gradient ∇f (x) =
Qx + a. So we need to verify if,
1 T ? 1
y Qy + aT y + b ≥ xT Qx + aT x + b + hQx + a, y − xi.
2 2
Re-arranging we obtain that we need to check if,
1
(y − x)T Q(y − x) ≥ 0,
2
which is certainly the case since Q 0.
Finally, let us try to apply definition (1). We see (after cancelling some terms) that we need
to verify if for 0 ≤ θ ≤ 1,
1 ? θ 1−θ T
(θx + (1 − θ)y)T Q (θx + (1 − θ)y) ≤ xT Qx + y Qy.
2 2 2
Now, use the fact (you should see how you might prove this fact) that, xT Qy ≤ 12 xT Qx + y T Qy

for PSD Q (this is the matrix analogue of the simple fact that a × b ≤ (a2 + b2 )/2), to verify
that the desired inequality holds.

2.2 More Examples of Convex Functions

Here are a few examples of convex functions:

1. exp(ax) is convex for any a over R.

Lecture 2: January 19 2-3

2. log x is concave on R++ .

3. aT x + b is convex (and concave).

4. The least squares loss kAx − bk2 is convex (for any A, b).

5. Any norm is convex, i.e. kxk is a convex function.

6. The spectral norm, and the trace norm of a matrix are convex, i.e. kXkop = σ1 (X),
kXktr = di=1 σi (X) where σi (X) denotes the i-th singular value of X.
P

7. Convex Indicators: If C is a convex set, then the indicator function (which is

defined on the extended reals):
(
0 x∈C
IC (x) =
∞ x∈/ C.

is convex.

2.3 Convexity and Monotonicity

One nice property of convex functions is that their gradients are monotone.
In 1D this is a simple thing to interpret, a monotone function is order preserving. A function
which is monotone increasing has the property that if x ≥ y then f (x) ≥ f (y). One way to
write this mathematically is to say that for any x, y, (x − y) × (f (x) − f (y)) ≥ 0.
The (sub)gradient of a convex function satisfies a multivariate analogue of this property.
Particularly for any x, y ∈ dom(f ), if f is convex we have that for any gx ∈ ∂f (x) and
gy ∈ ∂f (y),

(x − y)T (gx − gy ) ≥ 0.

To see this we observe that by the first-order characterization:

f (y) ≥ f (x) + gxT (y − x),

f (x) ≥ f (y) + gyT (x − y),

and summing these inequalities gives our desired result.

It turns out that there is a converse to the above characterization. If you have a differentiable
function whose gradient is monotone, then it must be convex. This idea will likely be useful
in your HW for verifying some of the equivalences.
2-4 Lecture 2: January 19

2.4 Properties of Convex Functions

Here are a few properties of convex functions that will be useful:

1. A function is convex iff the univariate functions g(t) = f (x + tv) are convex for any
v ∈ Rd , and for any x ∈ dom(f ).

2. A function is convex iff its epigraph,

epi(f ) = {(x, t) ∈ dom(f ) × R : f (x) ≤ t}

is a convex set.

3. Convex functions satisfy Jensen’s inequality. If f is convex, then for any random
variable X supported on dom(f ), f (E[X]) ≤ Ef (X).

2.5 Operations which Preserve Convexity

1. Non-negative
Pm Linear Combination: Suppose f1 , . . . , fm are convex, then so is
i=1 ai fi for any a1 , . . . , am ≥ 0.

2. Pointwise Max: If the collection of functions fs for s ∈ S are convex, then so is

g(x) = sups∈S fs (x).

3. Partial Minimization: If g(x, y) is a convex function, and C is a convex set, then

f (x) = miny∈C g(x, y) is a convex function.

An Example:

1. Suppose C is an arbitrary set, consider f (x) = maxy∈C kx − yk. f is convex. To see

this, we can view f as a maximum of convex functions fy (x) = kx − yk.

2. Let C be a convex set, then f (x) = miny∈C kx − yk is a convex function. We can

view this as a partial minimization of the function g(x, y) = kx − yk which is a convex
function (in (x, y)).

Function compositions:

1. Affine Composition: If f is convex then so is g(x) = f (Ax + b).

Lecture 2: January 19 2-5

2. General Composition: Suppose that f = h ◦ g, where g : Rd 7→ R, h : R 7→ R,

f : Rd 7→ R. Then one can ask when f is convex. There are many cases to cover
(see BV) but we’ll simply study one, and try to understand where it comes from: f is
convex if h is convex and nondecreasing, g is convex.
To see this: imagine everything was twice differentiable, then by the chain rule

f 00 (x) = h00 (g(x))(g 0 (x))2 + h0 (g(x))g 00 (x).

When h is convex and non-decreasing, h00 and h0 are positive, and when g is convex,
g 00 is positive, so f 00 is positive.

2.6 Smooth, Strongly Convex and Strictly Convex Func-

tions
For this section, we will switch back to thinking about differentiable convex functions.

2.6.1 Smoothness

In optimization smoothness has a very particular meaning (it has a slightly different meaning
in stats, and other areas of math). A function f is β-smooth, if its gradient is Lipschitz
continuous with parameter β, i.e. for any x, y ∈ dom(f ),

k∇f (x) − ∇f (y)k ≤ βkx − yk.

There are several useful implications of smoothness that you will show in your HW, but we
will briefly discuss now:

1. If f is β-smooth then the function β2 kxk2 − f (x) is convex. Typically, we would not
expect −f (x) to be convex (except when f is affine).

2. Another implication of smoothness, is that it implies a quadratic upper bound on the

function, i.e. if f is β-smooth then,

β
f (y) ≤ f (x) + ∇f (x)T (y − x) + ky − xk2 .
2

To interpret this fix a point x. Convex functions always lie above their tangent lines.
Smooth convex functions always lie below a parabola which passes through the point
(x, f (x)) (defined by the RHS above).
2-6 Lecture 2: January 19

3. Finally, if f is twice differentiable, then β-smoothness is equivalent to the condition

that,

∇2 f (x) βId .

Examples: It is worth briefly considering two examples (canonical examples of non-smooth

and smooth convex functions):

1. Absolute value: Here we consider f (x) = |x|, and observe that at x = 0, it’s
impossible to seat a parabola at the origin which is always above the function. Roughly,
a parabola must have close to zero derivative near its minimum, but the absolute value
function has constant derivative near its minimum.

2. Quadratic function: Suppose we consider f (x) = xT Qx + aT x + b where Q 0.

Its now easy to see that this function has Hessian 2Q, and consequently it satisfies
smoothness for any β ≥ 2λmax (Q) (i.e. twice the largest eigenvalue of Q).

2.6.2 Strong Convexity

The twin assumption to smoothness is strong convexity. A function f is α-strongly convex, if

the function g(x) = f (x) − α2 kxk2 is convex. As with smoothness there are several important
implications of strong convexity that you will explore in your HW.

1. If f is strongly convex then an equivalent definition is that it satisfies the following

inequality for any x, y ∈ dom(f ),
α
f (y) ≥ f (x) + ∇f (x)T (y − x) + ky − xk2 .
2

Again to interpret this, fix a point x, and observe that this expression tells us that a
strongly-convex function is above a parabola which passes through the point (x, f (x)).

2. If f is twice differentiable, an equivalent characterization is that,

∇2 f (x) αId .

Examples:

1. Absolute value: Consider the same function as before. It is not strongly convex.
For instance, if we consider x = 1, y = 2, then f (y) − (f (x) + ∇f (x)T (y − x)) is 0, so
the definition can only hold with α = 0.
Lecture 2: January 19 2-7

2. Quadratic function: Once again using the second-order characterization of strong

convexity we see that the quadratic function satisfies the definition of strong convexity
for any α ≤ 2λmin (Q).

It is possible to have strongly convex functions which are not smooth and vice versa, and it
is worth trying to “draw” some examples to convince yourself of this.

2.6.3 Strict Convexity

Strict convexity is a “weakening” of strong convexity (we won’t use it so much in this course
but it’s a useful concept to be aware of). A function f is strictly convex if either:

1. f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) for 0 < θ < 1.

2. f (y) > f (x) + ∇f (x)T (y − x), for any x 6= y.

It is worth noting the second-order characterization doesn’t work in the expected way, i.e.
you can have twice-differentiable, strictly convex functions which don’t satisfy the condition
that ∇2 f (x) 0. (As an example, think about the function x4 at x = 0.)

2.7 Optimality Conditions

Here we will revisit some things we discussed briefly in the previous lecture. Here is the
basic question. We are interested in solving a problem:

min f (x),
x∈C

where f is a convex function, and C is a convex set. What can I say about a solution x∗ to
this problem?

1. Unconstrained Case: Suppose first that C = Rd , and that dom(f ) = Rd then our
characterization should be familiar to us from usual calculus classes.

Theorem 2.1 x∗ is optimal, if (and only if ) 0 ∈ ∂f (x∗ ).

Proof: If 0 ∈ ∂f (x∗ ), then from the first-order condition we know that,

f (y) ≥ f (x∗ ) + 0T (y − x∗ ) = f (x∗ ).

2-8 Lecture 2: January 19

Conversely, if x∗ is optimal, then we know that, f (y) ≥ f (x∗ ) + gxT∗ (y − x∗ ) for all y,
when gx∗ = 0 and so we know that 0 is valid subgradient at x∗ .
Notice an interesting aspect of this result – it does not require convexity, i.e. for any
function f the condition that 0 ∈ ∂f (x∗ ) is necessary and sufficient for x∗ to be a
minimizer.

2. Constrained, Differentiable Case: A feasible point x∗ is optimal, if and only if

∇f (x∗ )T (y − x∗ ) ≥ 0 for all y ∈ C.
We will only verify one direction of this (the other direction requires a bit of analysis
to check). Suppose that, ∇f (x∗ )T (y − x∗ ) ≥ 0 for all y ∈ C, then from the first-order
condition we have that,

f (y) ≥ f (x∗ ) + ∇f (x∗ )T (y − x∗ ) ≥ f (x∗ ),

so x∗ is optimal. If you recall the definition of the normal cone from last lecture, then
you will see that this condition says that,

−∇f (x∗ ) ∈ NC (x∗ ).

3. General, Constrained Case: A feasible point x∗ is optimal, if and only if 0 ∈

∂f (x∗ ) + NC (x∗ ). Here we are adding two sets, i.e. C + D = {y : y = u + v, u ∈ C, v ∈
D}.
Again it’s only easy to verify one direction of this, i.e. suppose that 0 ∈ ∂f (x∗ ) +
NC (x∗ ), this means that there are two vectors u ∈ ∂f (x∗ ) and v ∈ NC (x∗ ) such that,

u + v = 0.

Now, we know that for any y which is feasible,

f (y) ≥ f (x∗ ) + uT (y − x∗ )
= f (x∗ ) − v T (y − x∗ ).

Since v ∈ NC (x∗ ) we know that v T (y − x∗ ) ≤ 0 for every feasible y, and so we conclude

that f (y) ≥ f (x∗ ).

2.7.1 Optimality Conditions for Projection

Here is a very basic/important problem. It arises in signal processing and statistics as a

basic denoising scheme. For some convex set K, and observation y we would like to solve
the constrained minimization problem,
1
min ky − xk2 .
x∈K 2
Lecture 2: January 19 2-9

This finds the closest point in K to y, and is called the projection of y onto K. We will
denote the solution x∗ to the above program by PK (y).
Let us first write out the optimality conditions, and then use them to show a nice property
of this projection operation. Since f is differentiable we have that,

0 ∈ x∗ − y + NC (x∗ ).

Equivalently, this means that, (y − x∗ )T (a − x∗ ) ≤ 0 for all a ∈ K. This can be easily

understood with a picture.

Theorem 2.2 Projection onto a convex set is a contraction, i.e. for any pair of points a, b,

kPK (a) − PK (b)k ≤ ka − bk.

Proof: From the optimality conditions we have that for any x ∈ K,

(a − PK (a))T (x − PK (a)) ≤ 0
(b − PK (b))T (x − PK (b)) ≤ 0.

As a consequence we can see that,

(a − PK (a))T (PK (b) − PK (a)) ≤ 0

(b − PK (b))T (PK (a) − PK (b)) ≤ 0.

Adding these inequalities we obtain that,

(b − a + (PK (a) − PK (b)))T (PK (a) − PK (b)) ≤ 0.

Now, re-arranging and applying the Cauchy-Schwarz inequality, we see that,

kPK (a) − PK (b)k2 ≤ (a − b)T (PK (a) − PK (b)) ≤ ka − bkkPK (a) − PK (b)k,

which is our desired conclusion.

Understanding Convex Functions
No ratings yet
Understanding Convex Functions
31 pages
Understanding Convex Functions
No ratings yet
Understanding Convex Functions
14 pages
Fundamentals of Convex Analysis
No ratings yet
Fundamentals of Convex Analysis
12 pages
Convex Functions in Optimization
No ratings yet
Convex Functions in Optimization
43 pages
Strictly vs Strongly Convex Functions
No ratings yet
Strictly vs Strongly Convex Functions
14 pages
Overview of Convex Optimization
No ratings yet
Overview of Convex Optimization
12 pages
Understanding Convex Functions and Properties
No ratings yet
Understanding Convex Functions and Properties
35 pages
Convex Optimization Concepts and Definitions
No ratings yet
Convex Optimization Concepts and Definitions
6 pages
C3Convex Analysis
No ratings yet
C3Convex Analysis
9 pages
Optimization in Machine Learning
No ratings yet
Optimization in Machine Learning
52 pages
Understanding Convex Functions and Optimization
No ratings yet
Understanding Convex Functions and Optimization
30 pages
Optimization Preliminaries2
No ratings yet
Optimization Preliminaries2
53 pages
Understanding Convex Functions and Properties
No ratings yet
Understanding Convex Functions and Properties
4 pages
Convex Sets and Functions Overview
No ratings yet
Convex Sets and Functions Overview
44 pages
Understanding Convex Functions and Optimization
No ratings yet
Understanding Convex Functions and Optimization
21 pages
Convexity of Pointwise Maximums
No ratings yet
Convexity of Pointwise Maximums
32 pages
Affine vs. Convex Sets Explained
No ratings yet
Affine vs. Convex Sets Explained
42 pages
2 CVX - FCN
No ratings yet
2 CVX - FCN
52 pages
03 Convex Functions Notes Cvxopt f21
No ratings yet
03 Convex Functions Notes Cvxopt f21
20 pages
Numerical ML Paper
No ratings yet
Numerical ML Paper
22 pages
Convex Function
No ratings yet
Convex Function
66 pages
Convex Functions and Optimization Methods
No ratings yet
Convex Functions and Optimization Methods
143 pages
Understanding Convex Functions
No ratings yet
Understanding Convex Functions
38 pages
Understanding Convex Functions and Applications
100% (2)
Understanding Convex Functions and Applications
44 pages
Properties and Conditions of Convex Functions
No ratings yet
Properties and Conditions of Convex Functions
30 pages
Convex Functions and Inequalities Explained
No ratings yet
Convex Functions and Inequalities Explained
11 pages
Understanding Convex Functions and Subgradients
No ratings yet
Understanding Convex Functions and Subgradients
5 pages
Understanding Convex Functions in Optimization
No ratings yet
Understanding Convex Functions in Optimization
20 pages
Convex Functions in Optimization
No ratings yet
Convex Functions in Optimization
55 pages
2301MC11
No ratings yet
2301MC11
8 pages
Convex Optimization Concepts Summary
No ratings yet
Convex Optimization Concepts Summary
57 pages
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
No ratings yet
Jan Van Tiel - Convex Analysis - An Introductory Text-Wiley (1984) PDF
135 pages
Understanding Convex Functions in Optimization
No ratings yet
Understanding Convex Functions in Optimization
11 pages
Convex Functions in Data Science
No ratings yet
Convex Functions in Data Science
4 pages
Convex Function: From Wikipedia, The Free Encyclopedia
No ratings yet
Convex Function: From Wikipedia, The Free Encyclopedia
7 pages
Definitions of Smoothness and Lipschitzness
No ratings yet
Definitions of Smoothness and Lipschitzness
5 pages
Introduction to Convex Optimization
No ratings yet
Introduction to Convex Optimization
48 pages
Differentiability and Optimality Conditions
No ratings yet
Differentiability and Optimality Conditions
10 pages
Convex Optimization: Conjugates & Subdifferentials
No ratings yet
Convex Optimization: Conjugates & Subdifferentials
32 pages
Convex Optimization Fundamentals
No ratings yet
Convex Optimization Fundamentals
32 pages
Understanding Convex Functions and Properties
No ratings yet
Understanding Convex Functions and Properties
43 pages
Understanding Convex Functions in Optimization
No ratings yet
Understanding Convex Functions in Optimization
30 pages
Preserving Convexity in Sets
No ratings yet
Preserving Convexity in Sets
27 pages
Building and Composing Convex Functions
No ratings yet
Building and Composing Convex Functions
4 pages
Stephen Boyd Optimization 1 100 5
No ratings yet
Stephen Boyd Optimization 1 100 5
20 pages
Unified Theory of Robust Optimization
No ratings yet
Unified Theory of Robust Optimization
37 pages
Fast Algorithms via Convex Optimization
No ratings yet
Fast Algorithms via Convex Optimization
114 pages
Convex Optimization Concepts and Methods
No ratings yet
Convex Optimization Concepts and Methods
116 pages
Understanding Matrix and Vector Norms
No ratings yet
Understanding Matrix and Vector Norms
12 pages
Understanding Convex Sets and Functions
No ratings yet
Understanding Convex Sets and Functions
16 pages
Understanding Subgradients and Their Properties
No ratings yet
Understanding Subgradients and Their Properties
13 pages
Introduction to Convex Sets and Functions
No ratings yet
Introduction to Convex Sets and Functions
27 pages
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
No ratings yet
Grundlehren Der Mathematischen Wissenschaften 305: A Series of Comprehensive Studies in Mathematics
431 pages
Convex Analysis and Minimization Algorithms I Fundamentals Jean Baptiste Hiriart Urruty Claude Lemaréchal Auth. WeLib - Org 1 61
No ratings yet
Convex Analysis and Minimization Algorithms I Fundamentals Jean Baptiste Hiriart Urruty Claude Lemaréchal Auth. WeLib - Org 1 61
61 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
181 pages
Convex Optimization Overview
No ratings yet
Convex Optimization Overview
24 pages
Poly OSCch 1
No ratings yet
Poly OSCch 1
7 pages
09 Subgradient Notes Cvxopt f21
No ratings yet
09 Subgradient Notes Cvxopt f21
22 pages
Excsol Convex Optimization Theory
No ratings yet
Excsol Convex Optimization Theory
20 pages
GD and Subgradients in Convex Optimization
No ratings yet
GD and Subgradients in Convex Optimization
6 pages
SCC Issue 11 Nov 2015
No ratings yet
SCC Issue 11 Nov 2015
22 pages
Forex Predictions with DeepTrading AI
No ratings yet
Forex Predictions with DeepTrading AI
50 pages
DeepTrading with TensorFlow II Guide
No ratings yet
DeepTrading with TensorFlow II Guide
9 pages
Convex Optimization Course Overview
No ratings yet
Convex Optimization Course Overview
12 pages
SCC Issue 09 Sep 2015
No ratings yet
SCC Issue 09 Sep 2015
15 pages
DeepTrading AI: Neural Network Basics
No ratings yet
DeepTrading AI: Neural Network Basics
10 pages
Monte Carlo Simulations: Iterations & Accuracy
No ratings yet
Monte Carlo Simulations: Iterations & Accuracy
34 pages
Forex Trading Systems Explained
100% (1)
Forex Trading Systems Explained
35 pages
Inflation Trading Strategies Explained
No ratings yet
Inflation Trading Strategies Explained
37 pages
Mouteki Trading Stop Loss Strategies
No ratings yet
Mouteki Trading Stop Loss Strategies
5 pages
Adjusted Forward Rates and Spot Rate Predictions
No ratings yet
Adjusted Forward Rates and Spot Rate Predictions
32 pages
Tiga Abdul Trading Method Overview
No ratings yet
Tiga Abdul Trading Method Overview
4 pages
High Frequency Trading Insights
No ratings yet
High Frequency Trading Insights
7 pages
Inverse Functions and Significant Figures
100% (1)
Inverse Functions and Significant Figures
10 pages
Basics of Differential Calculus
No ratings yet
Basics of Differential Calculus
5 pages
Math Exam for Grade 5 Students
No ratings yet
Math Exam for Grade 5 Students
3 pages
Continuous-Time Linear Systems Overview
No ratings yet
Continuous-Time Linear Systems Overview
35 pages
Grade 9 Math Study Guide: Slope & Lines
No ratings yet
Grade 9 Math Study Guide: Slope & Lines
11 pages
Ball Bounce Temperature Experiment Report
No ratings yet
Ball Bounce Temperature Experiment Report
2 pages
Energy Systems Design Optimization Methods
No ratings yet
Energy Systems Design Optimization Methods
10 pages
Ap23 FRQ Calculus Ab
No ratings yet
Ap23 FRQ Calculus Ab
11 pages
Cocoa Bag Production Analysis
No ratings yet
Cocoa Bag Production Analysis
1 page
Jetpack Compose Navigation 3 Guide
No ratings yet
Jetpack Compose Navigation 3 Guide
10 pages
Understanding Physics: Key Definitions
No ratings yet
Understanding Physics: Key Definitions
6 pages
TMS3725 Assignment 02
No ratings yet
TMS3725 Assignment 02
7 pages
Key Mistakes of Successful Traders
No ratings yet
Key Mistakes of Successful Traders
10 pages
Topology, 2/E James Munkres Ebook Expanded Content
No ratings yet
Topology, 2/E James Munkres Ebook Expanded Content
43 pages
Computer Graphics A Programming Approach Harrington, Steven 1987
67% (3)
Computer Graphics A Programming Approach Harrington, Steven 1987
492 pages
Class 11 Maths Sample Paper 2025
No ratings yet
Class 11 Maths Sample Paper 2025
5 pages
Kindergarten Math Olympiad Question Paper
100% (2)
Kindergarten Math Olympiad Question Paper
9 pages
Astm - ASTM D 2270 PDF
100% (1)
Astm - ASTM D 2270 PDF
7 pages
Bisection Method for Root Finding
No ratings yet
Bisection Method for Root Finding
43 pages
Surveying Formulas and Calculations
No ratings yet
Surveying Formulas and Calculations
15 pages
Final Exam
No ratings yet
Final Exam
29 pages
Unit III Inner Product Spaces
No ratings yet
Unit III Inner Product Spaces
35 pages
LPG Spill Vaporization and Dispersion Analysis
No ratings yet
LPG Spill Vaporization and Dispersion Analysis
189 pages
Measuring Gravity via Inclined Planes
No ratings yet
Measuring Gravity via Inclined Planes
4 pages
2016 AMC 8 Problem Set
No ratings yet
2016 AMC 8 Problem Set
9 pages
Quadratic Lagrange Interpolation Guide
No ratings yet
Quadratic Lagrange Interpolation Guide
36 pages
Find Angle BDC in Triangle ABCD
No ratings yet
Find Angle BDC in Triangle ABCD
6 pages
ELI220 Tutorial 1: Polar & Cartesian Problems
No ratings yet
ELI220 Tutorial 1: Polar & Cartesian Problems
2 pages
Principles of Arithmetic for Teachers
No ratings yet
Principles of Arithmetic for Teachers
696 pages
Induced Surface Charge Calculation
No ratings yet
Induced Surface Charge Calculation
1 page

Understanding Convex Functions

Uploaded by

Understanding Convex Functions

Uploaded by

10-725: Convex Optimization Spring 2023

2.1 Convex Functions

1. No Assumptions (Zeroth-Order): This is the definition we discussed last time,

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).

f (y) ≥ f (x) + h∇f (x), y − xi.

3. Twice Differentiable (Second-Order): A function f is convex, if its domain is a

2’ Non-Smooth: A function f is convex if its domain is a convex set, and if at every

f (y) ≥ f (x) + hgx , y − xi.

Let us consider the quadratic function f (x) = 21 xT Qx + aT x + b where Q  0.

2.2 More Examples of Convex Functions

1. exp(ax) is convex for any a over R.

2. log x is concave on R++ .

3. aT x + b is convex (and concave).

5. Any norm is convex, i.e. kxk is a convex function.

7. Convex Indicators: If C is a convex set, then the indicator function (which is

2.3 Convexity and Monotonicity

To see this we observe that by the first-order characterization:

f (y) ≥ f (x) + gxT (y − x),

and summing these inequalities gives our desired result.

2.4 Properties of Convex Functions

2. A function is convex iff its epigraph,

epi(f ) = {(x, t) ∈ dom(f ) × R : f (x) ≤ t}

2.5 Operations which Preserve Convexity

2. Pointwise Max: If the collection of functions fs for s ∈ S are convex, then so is

3. Partial Minimization: If g(x, y) is a convex function, and C is a convex set, then

1. Suppose C is an arbitrary set, consider f (x) = maxy∈C kx − yk. f is convex. To see

2. Let C be a convex set, then f (x) = miny∈C kx − yk is a convex function. We can

1. Affine Composition: If f is convex then so is g(x) = f (Ax + b).

2. General Composition: Suppose that f = h ◦ g, where g : Rd 7→ R, h : R 7→ R,

f 00 (x) = h00 (g(x))(g 0 (x))2 + h0 (g(x))g 00 (x).

2.6 Smooth, Strongly Convex and Strictly Convex Func-

k∇f (x) − ∇f (y)k ≤ βkx − yk.

2. Another implication of smoothness, is that it implies a quadratic upper bound on the

3. Finally, if f is twice differentiable, then β-smoothness is equivalent to the condition

Examples: It is worth briefly considering two examples (canonical examples of non-smooth

2. Quadratic function: Suppose we consider f (x) = xT Qx + aT x + b where Q  0.

2.6.2 Strong Convexity

The twin assumption to smoothness is strong convexity. A function f is α-strongly convex, if

1. If f is strongly convex then an equivalent definition is that it satisfies the following

2. If f is twice differentiable, an equivalent characterization is that,

2. Quadratic function: Once again using the second-order characterization of strong

2.6.3 Strict Convexity

1. f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) for 0 < θ < 1.

2. f (y) > f (x) + ∇f (x)T (y − x), for any x 6= y.

2.7 Optimality Conditions

Theorem 2.1 x∗ is optimal, if (and only if ) 0 ∈ ∂f (x∗ ).

Proof: If 0 ∈ ∂f (x∗ ), then from the first-order condition we know that,

f (y) ≥ f (x∗ ) + 0T (y − x∗ ) = f (x∗ ).

2. Constrained, Differentiable Case: A feasible point x∗ is optimal, if and only if

f (y) ≥ f (x∗ ) + ∇f (x∗ )T (y − x∗ ) ≥ f (x∗ ),

−∇f (x∗ ) ∈ NC (x∗ ).

3. General, Constrained Case: A feasible point x∗ is optimal, if and only if 0 ∈

Now, we know that for any y which is feasible,

Since v ∈ NC (x∗ ) we know that v T (y − x∗ ) ≤ 0 for every feasible y, and so we conclude

2.7.1 Optimality Conditions for Projection

Here is a very basic/important problem. It arises in signal processing and statistics as a

Equivalently, this means that, (y − x∗ )T (a − x∗ ) ≤ 0 for all a ∈ K. This can be easily

kPK (a) − PK (b)k ≤ ka − bk.

Proof: From the optimality conditions we have that for any x ∈ K,

As a consequence we can see that,

(a − PK (a))T (PK (b) − PK (a)) ≤ 0

Adding these inequalities we obtain that,

(b − a + (PK (a) − PK (b)))T (PK (a) − PK (b)) ≤ 0.

Now, re-arranging and applying the Cauchy-Schwarz inequality, we see that,

which is our desired conclusion.

You might also like

Let us consider the quadratic function f (x) = 21 xT Qx + aT x + b where Q 0.

2. Quadratic function: Suppose we consider f (x) = xT Qx + aT x + b where Q 0.