Decomposition Methods
• separable problems, complicating variables
• primal decomposition
• dual decomposition
• complicating constraints
• general decomposition structures
EE364b, Stanford University
Separable problem
minimize f1(x1) + f2(x2)
subject to x1 ∈ C1, x2 ∈ C2
• we can solve for x1 and x2 separately (in parallel)
• even if they are solved sequentially, this gives advantage if
computational effort is superlinear in problem size
• called separable or trivially parallelizable
• generalizes to any objective of form Ψ(f1, f2) with Ψ nondecreasing
(e.g., max)
EE364b, Stanford University 1
Complicating variable
consider unconstrained problem,
minimize f (x) = f1(x1, y) + f2(x2, y)
x = (x1, x2, y)
• y is the complicating variable or coupling variable; when it is fixed
the problem is separable in x1 and x2
• x1, x2 are private or local variables; y is a public or interface or
boundary variable between the two subproblems
EE364b, Stanford University 2
Primal decomposition
fix y and define
subproblem 1: minimizex1 f1(x1, y)
subproblem 2: minimizex2 f2(x2, y)
with optimal values φ1(y) and φ2(y)
original problem is equivalent to master problem
minimizey φ1(y) + φ2(y)
with variable y
called primal decomposition since master problem manipulates primal
(complicating) variables
EE364b, Stanford University 3
• if original problem is convex, so is master problem
• can solve master problem using
– bisection (if y is scalar)
– gradient or Newton method (if φi differentiable)
– subgradient, cutting-plane, or ellipsoid method
• each iteration of master problem requires solving the two subproblems
(in parallel)
• if master algorithm converges fast enough and subproblems are
sufficiently easier to solve than original problem, we get savings
EE364b, Stanford University 4
Primal decomposition algorithm
(using subgradient algorithm for master)
repeat
1. Solve the subproblems (in parallel).
Find x1 that minimizes f1(x1, y), and a subgradient g1 ∈ ∂φ1(y).
Find x2 that minimizes f2(x2, y), and a subgradient g2 ∈ ∂φ2(y).
2. Update complicating variable.
y := y − αk (g1 + g2).
step length αk can be chosen in any of the standard ways
EE364b, Stanford University 5
Example
• x1, x2 ∈ R20, y ∈ R
• fi are PWL (max of 100 affine functions each); f ⋆ ≈ 1.71
3.5
φ1 (y)
φ2 (y)
3 φ1 (y) + φ2 (y)
2.5
1.5
0.5
−1 −0.5 0 0.5 1
y
EE364b, Stanford University 6
primal decomposition, using bisection on y
0
10
−1
10
f (k) − f ⋆
−2
10
−3
10
−4
10
1 2 3 4 5 6 7 8
k
EE364b, Stanford University 7
Dual decomposition
Step 1: introduce new variables y1, y2
minimize f (x) = f1(x1, y1) + f2(x2, y2)
subject to y1 = y2
• y1, y2 are local versions of complicating variable y
• y1 = y2 is consensus constraint
EE364b, Stanford University 8
Step 2: form dual problem
L(x1, y1, x2, y2) = f1(x1, y1) + f2(x2, y2) + ν T (y1 − y2)
separable; can minimize over (x1, y1) and (x2, y2) separately
inf f1(x1, y1) + ν y1 = −f1∗(0, −ν)
T
g1(ν) =
x1 ,y1
g2(ν) = inf f2(x2, y2) − ν y2 = −f2∗(0, ν)
T
x2 ,y2
dual problem is: maximize g(ν) = g1(ν) + g2(ν)
• computing gi(ν) are the dual subproblems
• can be done in parallel
• a subgradient of −g is y2 − y1 (from solutions of subproblems)
EE364b, Stanford University 9
Dual decomposition algorithm
(using subgradient algorithm for master)
repeat
1. Solve the dual subproblems (in parallel).
Find x1, y1 that minimize f1(x1, y1) + ν T y1.
Find x2, y2 that minimize f2(x2, y2) − ν T y2.
2. Update dual variables (prices).
ν := ν − αk (y2 − y1).
• step length αk can be chosen in standard ways
• at each step we have a lower bound g(ν) on p⋆
• iterates are generally infeasible, i.e., y1 6= y2
EE364b, Stanford University 10
Finding feasible iterates
• reasonable guess of feasible point from (x1, y1), (x2, y2):
(x1, ȳ), (x2, ȳ), ȳ = (y1 + y2)/2
– projection onto feasible set y1 = y2
– gives upper bound p⋆ ≤ f1(x1, ȳ) + f2(x2, ȳ)
• a better feasible point: replace y1, y2 with ȳ and solve primal
subproblems minimizex1 f1(x1, ȳ), minimizex2 f2(x2, ȳ)
– gives (better) upper bound p⋆ ≤ φ1(ȳ) + φ2(ȳ)
EE364b, Stanford University 11
(Same) example
2.5
g1 (ν)
g2 (ν)
g1 (ν) + g2 (ν)
2
1.5
0.5
0
−1 −0.5 0 0.5 1
ν
EE364b, Stanford University 12
dual decomposition convergence (using bisection on ν)
2.6
better bound
2.5 worse bound
2.4 g(ν)
2.3
2.2
2.1
1.9
1.8
1.7
1.6
0 5 10 15
k
EE364b, Stanford University 13
Interpretation
• y1 is resources consumed by first unit, y2 is resources generated by
second unit
• y1 = y2 is consistency condition: supply equals demand
• ν is a set of resource prices
• master algorithm adjusts prices at each step, rather than allocating
resources directly (primal decomposition)
EE364b, Stanford University 14
Recovering the primal solution from the dual
• iterates in dual decomposition:
(k) (k) (k) (k)
ν (k), (x1 , y1 ), (x2 , y2 )
(k) (k)
– x1 , y1 is minimizer of f1(x1, y1) + ν (k)T y1 found in subproblem 1
(k) (k)
– x2 , y2 is minimizer of f2(x2, y2) − ν (k)T y2 found in subproblem 2
• ν (k) → ν ⋆ (i.e., we have price convergence)
(k) (k)
• subtlety: we need not have y1 − y2 →0
(k) (k)
• the hammer: if fi strictly convex, we have y1 − y2 →0
• can fix allocation (i.e., compute φi), or add regularization terms ǫkyik2
EE364b, Stanford University 15
Decomposition with constraints
can also have complicating constraints, as in
minimize f1(x1) + f2(x2)
subject to x1 ∈ C1, x2 ∈ C2
h1(x1) + h2(x2) 0
• fi, hi, Ci convex
• h1(x1) + h2(x2) 0 is a set of p complicating or coupling constraints,
involving both x1 and x2
• can interpret coupling constraints as limits on resources shared between
two subproblems
EE364b, Stanford University 16
Primal decomposition
fix t ∈ Rp and define
minimize f1(x1)
subproblem 1:
subject to x1 ∈ C1, h1(x1) t
minimize f2(x2)
subproblem 2:
subject to x2 ∈ C2, h2(x2) −t
• t is the quantity of resources allocated to first subproblem
(−t is allocated to second subproblem)
• master problem: minimize φ1(t) + φ2(t) (optimal values of
subproblems) over t
• subproblems can be solved separately (in parallel) when t is fixed
EE364b, Stanford University 17
Primal decomposition algorithm
repeat
1. Solve the subproblems (in parallel).
Solve subproblem 1, finding x1 and λ1.
Solve subproblem 2, finding x2 and λ2.
2. Update resource allocation.
t := t − αk (λ2 − λ1).
• λi is an optimal Lagrange multiplier associated with resource constraint
in subproblem i
• λ2 − λ1 ∈ ∂(φ1 + φ2)(t)
• αk is an appropriate step size
• all iterates are feasible (when subproblems are feasible)
EE364b, Stanford University 18
Example
• x1, x2 ∈ R20, t ∈ R2; fi are quadratic, hi are affine, Ci are polyhedra
defined by 100 inequalities; p⋆ ≈ −1.33; αk = 0.5/k
0
10
−1
10
f (k) − p⋆
−2
10
−3
10
−4
10
0 20 40 60 80 100
k
EE364b, Stanford University 19
resource allocation t to first subsystem (second subsystem gets −t)
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
−0.1
−0.2
0 20 40 60 80 100
k
EE364b, Stanford University 20
Dual decomposition
form (separable) partial Lagrangian
L(x1, x2, λ) = f1(x1) + f2(x2) + λT (h1(x1) + h2(x2))
T T
= f1(x1) + λ h1(x1) + f2(x2) + λ h2(x2)
fix dual variable λ and define
minimize f1(x1) + λT h1(x1)
subproblem 1:
subject to x1 ∈ C1
minimize f2(x2) + λT h2(x2)
subproblem 2:
subject to x2 ∈ C2
with optimal values g1(λ), g2(λ)
EE364b, Stanford University 21
• −hi(x̄i) ∈ ∂(−gi)(λ), where x̄i is any solution to subproblem i
• −h1(x̄1) − h2(x̄2) ∈ ∂(−g)(λ)
• the master algorithm updates λ using this subgradient
EE364b, Stanford University 22
Dual decomposition algorithm
(using projected subgradient method)
repeat
1. Solve the subproblems (in parallel).
Solve subproblem 1, finding an optimal x̄1.
Solve subproblem 2, finding an optimal x̄2.
2. Update dual variables (prices).
λ := (λ + αk (h1(x̄1) + h2(x̄2)))+.
• αk is an appropriate step size
• iterates need not be feasible
• can again construct feasible primal variables using projection
EE364b, Stanford University 23
Interpretation
• λ gives prices of resources
• subproblems are solved separately, taking income/expense from resource
usage into account
• master algorithm adjusts prices
• prices on over-subscribed resources are increased; prices on
undersubscribed resources are reduced, but never made negative
EE364b, Stanford University 24
(Same) example
subgradient method for master; resource prices λ
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15 20 25 30
k
EE364b, Stanford University 25
dual decomposition convergence; fˆ is objective of projected feasible
allocation
−0.4
g
−0.6 fˆ
−0.8
−1
−1.2
−1.4
−1.6
−1.8
0 5 10 15 20 25 30
k
EE364b, Stanford University 26
1
10
p⋆ − g(λ)
0 fˆ − g(λ)
10
−1
10
−2
10
−3
10
−4
10
0 5 10 15 20 25 30
k
EE364b, Stanford University 27
General decomposition structures
• multiple subsystems
• (variable and/or constraint) coupling constraints between subsets of
subsystems
• represent as hypergraph with subsystems as vertices, coupling as
hyperedges or nets
• without loss of generality, can assume all coupling is via consistency
constraints
EE364b, Stanford University 28
Simple example
1 2 3
• 3 subsystems, with private variables x1, x2, x3, and public variables y1,
(y2, y3), and y4
• 2 (simple) edges
minimize f1(x1, y1) + f2(x2, y2, y3) + f3(x3, y4)
subject to (x1, y1) ∈ C1, (x2, y2, y3) ∈ C2, (x3, y4) ∈ C3
y1 = y2 , y3 = y4
EE364b, Stanford University 29
A more complex example
c1
1 2
c2
c3 c4
3 4 5
EE364b, Stanford University 30
General form
PK
minimize i=1 fi (xi , yi )
subject to (xi, yi) ∈ Ci, i = 1, . . . , K
yi = Eiz, i = 1, . . . , K
• private variables xi, public variables yi
• net (hyperedge) variables z ∈ RN ; zi is common value of public
variables in net i
• matrices Ei give netlist or hypergraph
row k is ep, where kth entry of yi is in net p
EE364b, Stanford University 31
Primal decomposition
φi(yi) is optimal value of subproblem
minimize fi(xi, yi)
subject to (xi, yi) ∈ Ci
repeat
1. Distribute net variables to subsystems.
yi := Eiz, i = 1, . . . , K.
2. Optimize subsystems (separately).
Solve subproblems to find optimal xi, gi ∈ ∂φi(yi), i = 1, . . . , K.
3. Collect and sum subgradients for each net.
PK
g := i=1 EiT gi.
4. Update net variables.
z := z − αk g.
EE364b, Stanford University 32
Dual decomposition
gi(νi) is optimal value of subproblem
minimize fi(xi, yi) + νiT yi
subject to (xi, yi) ∈ Ci
given initial price vector ν that satisfies E T ν = 0 (e.g., ν = 0).
repeat
1. Optimize subsystems (separately).
Solve subproblems to obtain xi, yi.
2. Compute average value of public variables over each net.
ẑ := (E T E)−1E T y.
3. Update prices on public variables.
ν := ν + αk (y − E ẑ).
EE364b, Stanford University 33
A more complex example
subsystems: quadratic plus PWL objective with 10 private variables;
9 public variables and 4 nets; p⋆ ≈ 11.1; α = 0.5
8
g(ν)
7 f (x̂, ŷ)
f (x, ŷ)
6
0
0 2 4 6 8 10
k
EE364b, Stanford University 34
consistency constraint residual ky − E ẑk versus iteration number
1
10
0
10
ky − E ẑk
−1
10
−2
10
−3
10
−4
10
0 2 4 6 8 10
k
EE364b, Stanford University 35