0% found this document useful (0 votes)

9 views6 pages

Convex Functions in Optimization Theory

The lecture covers classical optimization algorithms and essential theory related to optimality conditions and convexity. It discusses equivalent characterizations of convex functions, including the first-order Taylor approximator and the positive semidefiniteness of the Hessian. The document also outlines key theorems and conditions for identifying convex functions and introduces iterative algorithms for optimization.

Uploaded by

harrynguyen788

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views6 pages

Convex Functions in Optimization Theory

Uploaded by

harrynguyen788

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ISyE 3013: Optimization for Machine Learning

Lecture #04 2025-09-11

Instructor: Swati Padmanabhan, Teaching Assistant: Zhenyi Zhang

We are now on the cusp of learning some classical optimization algorithms that, to this day,
power processes around us and remain objects of study. Before getting there, we need to finish our
current module on some essential theory underpinning these algorithms: optimality conditions (in
unconstrained minimization) and convexity. To this end, we continue our study of convex functions.

4.1 Equivalent Characterizations of Convex Functions

Recall that a function f : K → R (where K ⊆ Rd is a convex set) is convex if and only if for all
x, y ∈ K,
f (λy + (1 − λ)x) ≤ λf (y) + (1 − λ)f (x), for all λ ∈ [0, 1]. (4.1.1)
Two important points to note:

1. We say λa + (1 − λ)b for λ ∈ [0, 1] is a “convex combination” of a and b. Thus, a convex

combination is a weighted average with nonnegative weights that add to one.

2. Any point x ∈ [a, b] (i.e., x in the line segment between a and b) can be equivalently represented
as x = λa + (1 − λ)b for some λ ∈ [0, 1]. As we increase λ from zero to one, we move from b
to a. (You will need to use this representation in your very first homework problem.)

In the following two subsections, we cover two more equivalent definitions of convex functions when
we have additional assumptions on higher-order derivatives of f .

4.1.1 First-Order Taylor Approximator as Global Underestimator

Theorem 4.T1 (Repeating Theorem 3.T2). Let f : K → R be a continuously differentiable function
on convex set K ⊆ Rd . Then

f is convex on K if and only if f (y) ≥ f (x) + ∇f (x)⊤ (y − x) for any x, y ∈ K.

Proof. We showed one direction in the previous lecture, so in this lecture, we cover only the other
direction: Assume that f is convex. Then, by rearranging Inequality 4.1.1, we have:

f (λy + (1 − λ)x) − f (x) ≤ λ(f (y) − f (x)), for all λ ∈ [0, 1].

Next, for λ ∈ (0, 1], the above can safely be divided throughout by λ (and without the inequality
being flipped); since the inequality holds for all λ ∈ (0, 1], it is valid even in the limit as λ → 0:

f (x + λ(y − x)) − f (x)

lim ≤ f (y) − f (x).
λ→0 λ

The left-hand side is exactly ∇f (x)⊤ (y − x), which when plugged into the above inequality then
finishes the proof. We explain this last step in more detail in Theorem 4.T2.
These notes have not been subjected to the usual scrutiny reserved for formal peer-reviewed publications. Thank
you for reporting any typos! These notes closely follow the textbooks and lecture notes acknowledged at the end.

1
What Theorem 4.T1 is saying is that given a convex function, no matter where you draw a tangent
to it, that tangent lies under the function everywhere (and conversely, if for a function on a convex
set, all tangents lies below the function everywhere, then the function is convex). In other words,
if there exists even a single point such that the tangent to the function at that point lies above the
function somewhere, then the function most certainly is not convex. See Example 4.E1.

Example 4.E1. Satisfaction and violations of the global underestimator property:

Note, f (x) = .6x2 − 1.2x + 1.6 is convex and
has tangents under it everywhere (of course, it
is not physically possible to check this at every
point, but we know this to be true and illustrate
this only for three tangents). For the function x
4 3 2
f (x) = x8 − x6 − x2 , we see at least one tan-
gent that lies above the function at some points, f (x) = x4 − x3 − x2 f (x) = .6x2 − 1.2x + 1.6
8 6 2
which is evidence to show that it is non-convex.

We now provide1 a proof for the claim

f (x + t · p) − f (x)
lim = ∇f (x)⊤ p,
t→0 t
critically used in the last step when proving Theorem 4.T1. This is a crucial statement, which
shows up whenever simplifying an expression about function values to one in terms of the gradient.

Theorem 4.T2. Let f : Rd → R. Fix x ∈ Rd and a direction p = (p1 , . . . , pd ). If f is C 1 in

∂f
a neighborhood of x (i.e., all partials ∂xi
exist and are continuous near x), then the directional
derivative of f at x along p exists and
d
′ f (x + t · p) − f (x) ⊤
X ∂f
f (x; p) := lim = ∇f (x) p = pi (x).
t→0 t ∂xi
i=1

As a warm-up, note that if d = 1 (i.e., we are operating only on scalars), then the above reduces
to the chain rule. We see that the proof essentially performs such a chain rule in every coordinate.

Proof. We will generate a sequence of iterates by “walking” from x to x + t · p, moving along one
coordinate axis at each step. We denote the vector we are at at the ith iterate by x(i) , i.e.,
i
X
x(0) := x, x(i) := x + t pk ek (i = 1, . . . , d), (4.1.2)
k=1

so that x(d) = x + t · p. Then, the difference between the initial (at x) and final (at x + t · p)
function values may be decomposed into the sum of differences between these successive iterates:
d
X
f (x(i) ) − f (x(i−1) ) .

f (x + t · p) − f (x) =
i=1
1
Thank you very much for asking this question in class!

2
We now write x(i) = x(i−1) + tpi ei and divide throughout by t ̸= 0 and later take the limit as t → 0,
which will make the left-hand side identical to that of the statement of the theorem:
d
f (x + t · p) − f (x) X f x(i−1) + tpi ei − f (x(i−1) )

= . (4.1.3)
t t
i=1

For each index i, if pi = 0, the ith summand in the right-hand side above is 0. If pi ̸= 0, we have

f x(i−1) + tpi ei − f (x(i−1) ) f x(i−1) + tpi ei − f (x(i−1) )

= pi · . (4.1.4)
t tpi

By the one–variable Mean Value Theorem applied to s 7→ f (x(i−1) + sei ) on the interval from 0 to
tpi , there exists a scalar θi (t) ∈ (0, tpi ) such that

f x(i−1) + tpi ei − f (x(i−1) )

∂f (i−1)
= x + θi (t)ei . (4.1.5)
tpi ∂xi

Hence, for all t with pi ̸= 0, plugging the right-hand side of Equation (4.1.5) into that of Equa-
tion (4.1.4) and Equation (4.1.4) into each term of the right-hand side of Equation (4.1.3) yields:
d
f (x + t · p) − f (x) X ∂f (i−1)
= pi x + θi (t)ei . (4.1.6)
t ∂xi
i=1

Now taking the limit t → 0, we have x(i−1) → x (by definition of x(i) in Equation (4.1.2)) and
∂f
θi (t) → 0 (since θi (t) ∈ (0, tpi )). Consequently, x(i−1) + θi (t)ei → x. By continuity of ∂xi
at x,

∂f (i−1) ∂f
x + θi (t)ei −→ (x).
∂xi ∂xi

Therefore, plugging this into Equation (4.1.6) yields the claim.

4.1.2 Positive Semidefiniteness of Hessian

We now circle back to a second-order characterization from the second lecture, which showed global
minimality under global positive semidefiniteness of the Hessian. As we remarked then, for twice
continuously differentiable functions, this completely defines convex functions2 .

Theorem 4.T3. Let K ⊆ Rd be an open convex set, and let f : K → R be twice continuously
differentiable. Then f is convex if and only if ∇2 f is positive semidefinite at all x ∈ K.

Proof. Consider two points x, y ∈ K. Then, by the second-order Taylor approximation of f , there
exists some z ∈ [x, y] such that:
1
f (y) = f (x) + ∇f (x)⊤ (y − x) + (y − x)⊤ ∇2 f (z)(y − x). (4.1.7)
2
2
Note, this definition holds only for convex functions that are also twice continuously differentiable; the function
f (x) = |x| is convex, but not differentiable at x = 0.

3
Suppose ∇2 f (u) ⪰ 0 for all u ∈ K. Since ∇2 f (z) ⪰ 0 (since z ∈ K too, by convexity of K), the
third term on the right-hand side above is non-negative, which implies the linear underestimator
inequality from Theorem 4.T1 for all x, y ∈ K, thus implying convexity of f .
We now show the other direction. Suppose instead that f is convex on K. Fix x ∈ K and a
direction h ∈ Rd . Because K is open, there exists r > 0 such that x + th ∈ K for all t ∈ (−r, r).
By Theorem 4.T1, for all |t| < r,

f (x + th) ≥ f (x) + t ∇f (x)⊤ h. (4.1.8)

By the second–order Taylor theorem (Lemma 1.L1), there exists zt ∈ [x, x + th] ⊆ K such that

f (x + th) = f (x) + t ∇f (x)⊤ h + 21 t2 · h⊤ ∇2 f (zt )h. (4.1.9)

Comparing Inequality 4.1.8 and Equation (4.1.9) and cancelling the common terms gives, for t ̸= 0,
1 2
2t · h⊤ ∇2 f (zt )h ≥ 0.

Note that because t2 /2 > 0, the above inequality is the same as:

h⊤ ∇2 f (zt )h ≥ 0. (4.1.10)

Since zt ∈ [x, x + th], if we let t → 0, we have zt → x. We therefore have:

h⊤ ∇2 f (x)h = lim h⊤ ∇2 f (zt )h ≥ 0, .

t→0

where the first step is by continuity of ∇2 f (since we assume f is twice continuously differentiable),
and the second step is by Equation (4.1.10). The above inequality, as of now, holds only for the
vector h that we fixed at the start of this proof. However, note that the proof does not make any
special assumptions on h, which means the above inequality holds for any arbitrary h. In other
words, we have that
h⊤ ∇2 f (x)h ≥ 0, for all h.
This is precisely the definition of ∇2 f (x) ⪰ 0, which finishes the proof of the claim.

To conclude this section, we have three equivalent tests for whether a twice continuously differ-
entiable function is convex or not (one that involves only the function values, one involving the first
derivative, and this one involving the second derivative). Depending on the function at hand, one
test might be significantly easier to perform than the others. For example, it is almost immediate
to see the convexity of f (x) = ex and f (x) = 5x2 via the second-derivative test; however, these
are not all that trivial using Inequality 4.1.1. As we will see in the next few lectures, each of these
definitions turns out to be useful also when designing fast algorithms.

Problem 4.P1. Based on Theorem 4.T3, prove the convexity of:

1. f (x) = eax ,

2. f (x) = x⊤ Px when P ⪰ 0,

3. f (x) = − log(x), x > 0,

4. f (x) = x · log(x) on x > 0,

4
4.1.3 A Brief Summary of Our Current Toolkit

Assumptions Implication Description in plain English

⋆ ⋆
x local minimum ∇f (x ) = 0 First-Order Necessary Condition
x⋆ local minimum, f ∈ C 2 ∇2 f (x⋆ ) ⪰ 0 Second-Order Necessary Condition
f ∈ C 2 , ∇f (x⋆ ) = 0, ∇2 f (x⋆ ) ≻ 0 x⋆ is a strict local minimum Second-Order Sufficient Condition
f ∈ C 2 , ∇f (x⋆ ) = 0, ∇2 f (x) ⪰ 0 ∀ x x⋆ is a global minimum Convexity
f is convex f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) Secant inequality
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) f is convex Secant inequality
f ∈ C 1 , f convex f (y) ≥ f (x) + ∇f (x)⊤ (y − x) Global linear underestimator
f ∈ C 1 , f (y) ≥ f (x) + ∇f (x)⊤ (y − x) f convex Global linear underestimator

4.2 Iterative Algorithms for Optimization

The algorithms we study in this class are all iterative. Suppose the problem is minx f (x). We denote
the problem variable at the k th iteration by xk . The key design principle is to find a direction pk
and a step size αk such that moving in that direction improves the function value:
f (xk + αk · pk ) < f (xk ). (Descent Property)
One method to achieve this is to find the (unit) direction pk that minimizes the first-order Taylor
approximation of f at xk :
pk := arg min f (xk ) + p⊤ ∇f (xk ). (Descent Candidate)
p:∥p∥≤1

Question 4.Q1. What norm should we choose in (Descent Candidate), and what step size should
we choose to enable (Descent Property)?
We will examine these questions in the next few lectures. For now, we touch upon two norm choices.

4.2.1 Linear Minimization Over the ℓ2 -Ball y

(0, 1)
We consider the problem maxp:∥p∥2 ≤1 p⊤ g. Note, this is a constrained
optimization over the unit ℓ2 -norm ball. The projection of p onto g is g
maximized when we choose p to be a vector that directly aligns with g
hitting the boundary of the ball. (Try to draw any other vector in the x
adjacent blue circle and see its projection onto g.) This can be seen (0, 0) (1, 0)
g
formally via the Cauchy-Schwarz inequality. Thus p⋆ = ∥g∥ 2
(when g ̸=
0). This gives an optimal value of ∥g∥2 for the problem in consideration.

4.2.2 Linear Minimization Over the ℓ1 -Ball y

The unit ℓ1 -ball has quite a different geometry than the unit ℓ2 -ball. If (0, 1)
we therefore instead consider the problem maxp:∥p∥1 ≤1 p⊤ g, then the g
vector p⋆ that attains this largest projection is, up to a sign factor, the
indicator vector corresponding to the component of g with the largest x
absolute value. Thus, imposing a constraint on the ℓ1 -norm promotes (0, 0) (1, 0)
sparsity in solutions, a fact which also nicely generalizes to the matrix
setting [Faz] and finds use in recommendation systems for personalized
user experiences across, e.g., e-commerce and entertainment.

5
Readings
The material in these notes is based on the following excellent sources: [BV04, Chapter 2, Chapter
3, Section 9.4, Appendix A.4] and [NW06, Chapter 2].

References
[BV04] Stephen P Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university
press, 2004 (cit. on p. 6).
[NW06] Jorge Nocedal and Stephen J Wright. Numerical optimization. Springer, 2006 (cit. on
p. 6).
[Faz] Maryam Fazel. “Matrix rank minimization with applications”. PhD thesis (cit. on p. 5).

2301MC11
No ratings yet
2301MC11
8 pages
Convex Functions and Inequalities Explained
No ratings yet
Convex Functions and Inequalities Explained
11 pages
Convex Function Optimization Examples
No ratings yet
Convex Function Optimization Examples
4 pages
Differentiability and Optimality Conditions
No ratings yet
Differentiability and Optimality Conditions
10 pages
Taylor Theorem
No ratings yet
Taylor Theorem
18 pages
Understanding Convexity in Functions
No ratings yet
Understanding Convexity in Functions
5 pages
Understanding Convex Functions and Subgradients
No ratings yet
Understanding Convex Functions and Subgradients
5 pages
Understanding Convex Functions in Optimization
No ratings yet
Understanding Convex Functions in Optimization
11 pages
Understanding Convex Functions
No ratings yet
Understanding Convex Functions
38 pages
Understanding Convex Functions
No ratings yet
Understanding Convex Functions
9 pages
Lagrange Multipliers in Nonlinear Programming
No ratings yet
Lagrange Multipliers in Nonlinear Programming
14 pages
Introduction to Convex Optimization
No ratings yet
Introduction to Convex Optimization
48 pages
Strictly vs Strongly Convex Functions
No ratings yet
Strictly vs Strongly Convex Functions
14 pages
Understanding Convex Functions
No ratings yet
Understanding Convex Functions
14 pages
Mclas Tema1 v2.1
No ratings yet
Mclas Tema1 v2.1
76 pages
Mathematical Methods: Variational Calculus
No ratings yet
Mathematical Methods: Variational Calculus
74 pages
Twice Continuously Differentiable Functions
No ratings yet
Twice Continuously Differentiable Functions
13 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
181 pages
Fast Algorithms via Convex Optimization
No ratings yet
Fast Algorithms via Convex Optimization
114 pages
Convex Functions and Their Conditions
No ratings yet
Convex Functions and Their Conditions
3 pages
Review of S-Convexity Results and Issues
No ratings yet
Review of S-Convexity Results and Issues
10 pages
Convex Functions and Optimization Methods
No ratings yet
Convex Functions and Optimization Methods
143 pages
Lipschitz Continuous Gradient Overview
No ratings yet
Lipschitz Continuous Gradient Overview
17 pages
Convex Nonsmooth Optimization Techniques
No ratings yet
Convex Nonsmooth Optimization Techniques
14 pages
Definitions of Smoothness and Lipschitzness
No ratings yet
Definitions of Smoothness and Lipschitzness
5 pages
Building and Composing Convex Functions
No ratings yet
Building and Composing Convex Functions
4 pages
Understanding Convex Functions and Properties
No ratings yet
Understanding Convex Functions and Properties
4 pages
Convex Function Practice Problems
No ratings yet
Convex Function Practice Problems
4 pages
09 Subgradient Notes Cvxopt f21
No ratings yet
09 Subgradient Notes Cvxopt f21
22 pages
Fundamentals of Convex Analysis
No ratings yet
Fundamentals of Convex Analysis
12 pages
Optimization Preliminaries2
No ratings yet
Optimization Preliminaries2
53 pages
Curvature Analysis of Functions in R2
No ratings yet
Curvature Analysis of Functions in R2
8 pages
03 Convex Functions Notes Cvxopt f21
No ratings yet
03 Convex Functions Notes Cvxopt f21
20 pages
Convex Functions and Sets in Optimization
No ratings yet
Convex Functions and Sets in Optimization
5 pages
Convex and Concave Functions Explained
No ratings yet
Convex and Concave Functions Explained
5 pages
Equivalent Proofs of Convex Functions
No ratings yet
Equivalent Proofs of Convex Functions
4 pages
Continuous Method for General Convex Problems
No ratings yet
Continuous Method for General Convex Problems
12 pages
Solution Concepts in Optimization
No ratings yet
Solution Concepts in Optimization
4 pages
Tutorial 1 Fundamentals: Review of Derivatives, Gradients and Hessians
No ratings yet
Tutorial 1 Fundamentals: Review of Derivatives, Gradients and Hessians
25 pages
Consequences of the Mean-Value Theorem
No ratings yet
Consequences of the Mean-Value Theorem
4 pages
Concave vs Quasiconcave Functions
No ratings yet
Concave vs Quasiconcave Functions
9 pages
Understanding Matrix and Vector Norms
No ratings yet
Understanding Matrix and Vector Norms
12 pages
Applied Microeconomics: Mathematical Concepts
No ratings yet
Applied Microeconomics: Mathematical Concepts
25 pages
Problem Set 1
No ratings yet
Problem Set 1
4 pages
Convexity of Pointwise Maximums
No ratings yet
Convexity of Pointwise Maximums
32 pages
Smooth Optimizations in Microeconomics
No ratings yet
Smooth Optimizations in Microeconomics
18 pages
Foundations of Optimization Algorithms
No ratings yet
Foundations of Optimization Algorithms
8 pages
Conevx Sets 12 Feb
No ratings yet
Conevx Sets 12 Feb
109 pages
Unconstrained Optimization Optimality Conditions
No ratings yet
Unconstrained Optimization Optimality Conditions
10 pages
Chapter 7 3
No ratings yet
Chapter 7 3
4 pages
GD and Subgradients in Convex Optimization
No ratings yet
GD and Subgradients in Convex Optimization
6 pages
Smooth Convex Optimization Lecture
No ratings yet
Smooth Convex Optimization Lecture
3 pages
Concave and Convex Functions Explained
No ratings yet
Concave and Convex Functions Explained
14 pages
Concavity and Convexity of Functions
No ratings yet
Concavity and Convexity of Functions
14 pages
Continuity and Convexity in Optimization
No ratings yet
Continuity and Convexity in Optimization
7 pages
Convex Functions and Optimization Concepts
No ratings yet
Convex Functions and Optimization Concepts
35 pages
Understanding Convex Functions and Properties
No ratings yet
Understanding Convex Functions and Properties
43 pages
Gr11 IeBT Maths Exemplar MG Explanations 2018
No ratings yet
Gr11 IeBT Maths Exemplar MG Explanations 2018
24 pages
Basic Pixel Relationships in Image Processing
No ratings yet
Basic Pixel Relationships in Image Processing
15 pages
Linear Transformations in R2 and R4
No ratings yet
Linear Transformations in R2 and R4
10 pages
Bipartite 5-Regular Graph on 10 Vertices
No ratings yet
Bipartite 5-Regular Graph on 10 Vertices
24 pages
Graphical Solutions of Simultaneous Equations
No ratings yet
Graphical Solutions of Simultaneous Equations
4 pages
Understanding Coordinate Geometry Concepts
No ratings yet
Understanding Coordinate Geometry Concepts
20 pages
Properties of Equality Explained
No ratings yet
Properties of Equality Explained
15 pages
6th Grade Math Revision Paper
No ratings yet
6th Grade Math Revision Paper
3 pages
Square Root and Quadratic Solutions
No ratings yet
Square Root and Quadratic Solutions
7 pages
Math 331 Lab 8: Group Theory and Rings
No ratings yet
Math 331 Lab 8: Group Theory and Rings
4 pages
MAS201 Sylabus
No ratings yet
MAS201 Sylabus
2 pages
Fourier Series in Signals and Systems
No ratings yet
Fourier Series in Signals and Systems
12 pages
Higher Order Linear Differential Equations
No ratings yet
Higher Order Linear Differential Equations
15 pages
HCF and LCM Calculations in Class X Math
No ratings yet
HCF and LCM Calculations in Class X Math
3 pages
Spinors in Einstein-Cartan Cosmology
No ratings yet
Spinors in Einstein-Cartan Cosmology
80 pages
Grade X Mathematics Question Bank
No ratings yet
Grade X Mathematics Question Bank
153 pages
Std 11 Mathematics MCQ Practice Solutions
No ratings yet
Std 11 Mathematics MCQ Practice Solutions
25 pages
Variation of Parameters for Differential Equations
No ratings yet
Variation of Parameters for Differential Equations
16 pages
Introduction to Theory of Computing
No ratings yet
Introduction to Theory of Computing
27 pages
Patch Test in Advanced FEM Techniques
No ratings yet
Patch Test in Advanced FEM Techniques
11 pages
Understanding Artificial Neural Networks
No ratings yet
Understanding Artificial Neural Networks
34 pages
Class 12 Mean Value Theorem Solutions
No ratings yet
Class 12 Mean Value Theorem Solutions
31 pages
Group and Ring Theory Solutions
No ratings yet
Group and Ring Theory Solutions
3 pages
Least Squares Fitting in Fortran
0% (1)
Least Squares Fitting in Fortran
2 pages
Quadratic Equations in Business Math
100% (1)
Quadratic Equations in Business Math
6 pages
Understanding Bernoulli Differential Equations
No ratings yet
Understanding Bernoulli Differential Equations
3 pages
Additional Mathematics: Paper 4037/11 Paper 11
No ratings yet
Additional Mathematics: Paper 4037/11 Paper 11
19 pages
Class XI Mathematics Quarterly Exam 2024
No ratings yet
Class XI Mathematics Quarterly Exam 2024
6 pages
Class X Mathematics Holiday Homework
No ratings yet
Class X Mathematics Holiday Homework
2 pages
College Algebra for STEM Students
100% (1)
College Algebra for STEM Students
367 pages

Convex Functions in Optimization Theory

Uploaded by

Convex Functions in Optimization Theory

Uploaded by

ISyE 3013: Optimization for Machine Learning

Lecture #04 2025-09-11

4.1 Equivalent Characterizations of Convex Functions

1. We say λa + (1 − λ)b for λ ∈ [0, 1] is a “convex combination” of a and b. Thus, a convex

4.1.1 First-Order Taylor Approximator as Global Underestimator

f is convex on K if and only if f (y) ≥ f (x) + ∇f (x)⊤ (y − x) for any x, y ∈ K.

f (x + λ(y − x)) − f (x)

Example 4.E1. Satisfaction and violations of the global underestimator property:

We now provide1 a proof for the claim

Theorem 4.T2. Let f : Rd → R. Fix x ∈ Rd and a direction p = (p1 , . . . , pd ). If f is C 1 in

f x(i−1) + tpi ei − f (x(i−1) ) f x(i−1) + tpi ei − f (x(i−1) )

f x(i−1) + tpi ei − f (x(i−1) )

Therefore, plugging this into Equation (4.1.6) yields the claim.

4.1.2 Positive Semidefiniteness of Hessian

f (x + th) ≥ f (x) + t ∇f (x)⊤ h. (4.1.8)

f (x + th) = f (x) + t ∇f (x)⊤ h + 21 t2 · h⊤ ∇2 f (zt )h. (4.1.9)

Since zt ∈ [x, x + th], if we let t → 0, we have zt → x. We therefore have:

h⊤ ∇2 f (x)h = lim h⊤ ∇2 f (zt )h ≥ 0, .

Problem 4.P1. Based on Theorem 4.T3, prove the convexity of:

3. f (x) = − log(x), x > 0,

4. f (x) = x · log(x) on x > 0,

Assumptions Implication Description in plain English

4.2 Iterative Algorithms for Optimization

4.2.1 Linear Minimization Over the ℓ2 -Ball y

4.2.2 Linear Minimization Over the ℓ1 -Ball y

You might also like