Retiming and Re-synthesis
Outline:
Retiming
Retiming and Resynthesis (RnR)
Resynthesis of Pipelines
Optimizing Sequential Circuits
by Retiming Netlist of Gates
Netlist of gates and registers:
Inputs
Various Goals:
Outputs
Reduce clock cycle time
Reduce area
Reduce number of latches
Retiming
Problem
Pure combinational optimization can be myopic since
relations across register boundaries are disregarded
Solutions
Retiming: Move register(s) so that
clock cycle decreases, or number of registers decreases and
input-output behavior is preserved
RnR: Combine retiming with combinational
optimization techniques
Move latches out of the way temporarily
optimize larger blocks of combinational
Circuit Represetation
[Leiserson, Rose and Saxe (1983)]
Circuit representation: G(V,E,d,w)
V set of gates
E set of wires
d(v) = delay of gate/vertex v, (d(v)0)
w(e) = number of registers on edge e, (w(e)0)
Circuit Representation
Example: Correlator
+
Host
(x, y) = 1 if x=y
0 otherwise
7
0
0
2
Graph (Directed)
Circuit
Every cycle in Graph has at least one register
i.e. no combinational loops.
Operation delay
Preliminaries
For a path p :
e0
ek 1
e1
v0 v1 L vk 1 vk
d ( p ) d (vi )
(includes endpoints)
i 0
k 1
w( p) w(ei )
Clock cycle
i 0
c max {d ( p )}
p: w ( p ) 0
0
0
Path with
0
0
2
w(p)=0
3
For correlator c = 13
6
Basic Operation
Movement of registers from input to output of a gate or
vice versa
Retime by -1
Retime by 1
Does not affect gate functionalities
A mathematical definition: retardation
r: V Z, an integer vertex labeling
wr(e) = w(e) + r(v) - r(u) for edge e = (u,v)
Basic Operation
Thus in the example, r(u) = -1, r(v) = -1 results
in
0
0
0
2
0
0
1
1
u
0
For a path p: st, wr(p) = w(p) + r(t) - r(s)
Retardation
r: VZ, an integer vertex labeling
wr(e) =w(e) + r(v) - r(u) for edge e= (u,v)
A retiming r is legal if wr(e) 0, eE
Retiming for minimum clock cycle
Problem Statement: (minimum cycle time)
Given G (V, E, d, w), find a legal retiming r so that
c max {d ( p )}
p: wr ( p ) 0
is minimized
Retiming: 2 important matrices
Register weight matrix
W (u , v) min{w( p) : u
p
v}
Delay matrix
D(u, v) max{d ( p ) : u
p
v, w( p )
w(u, v)}
9
Retiming for minimum clock cycle
V0 V1 V2 V3
V0 V1 V2 V3
W = register path
weight matrix
(minimum # latches
on all paths between
u and v)
D = path delay matrix
(maximum delay on
all paths between
u and v)
0222
0000
0200
0220
0
13
10
7
V0
V1
V2
V3
v0 0
0
2
V1
V0
V1
V2
V3
0
3
V2
3
3
13
10
6
6
3
13
13
13
10
7
c p, if d(p) then w(p) 1
10
Conditions for Retiming
Assume that we are asked to check if a retiming exists for a clock
cycle
Legal retiming: wr(e) 0 for all e. Hence
wr(e) = w(e) = r(v) - r(u) 0 or
r (u) - r (v) w (e)
For all paths p: u v such that d(p) , we require wr(p) 1
Thus
k 1
1 wr ( p) wr (ei )
i 0
k 1
[ w(ei ) r (vi 1 ) r (vi )]
i 0
w( p ) r (vk ) r (v0 )
w( p ) r (v) r (u )
Or take the least w(p) (tightest constraint) r(u)-r(v) W(u,v)-1
Note: this is independent of the path from u to v, so we just need to apply it
to u, v such that D(u,v)
11
Solving the constraints
All constraints in difference-of-2-variable form
Related to shortest path problem
W
Correlator: = 7
V0 V1 V2 V3
V0 V1 V2 V3
D>7:
Legal: r(u)-r(v)w(e) r(u)-r(v)W(u,v)-1
r (v0 ) r (v1 ) 2
r (v1 ) r (v2 ) 0
r (v1 ) r (v3 ) 0
r (v2 ) r (v3 ) 0
r (v3 ) r (v0 ) 0
r (v0 ) r (v3 ) 1
r (v1 ) r (v0 ) 1
r (v1 ) r (v3 ) 1
r (v2 ) r (v0 ) 1
r (v2 ) r (v1 ) 1
r (v2 ) r (v3 ) 1
r (v3 ) r (v1 ) 1
r (v3 ) r (v2 ) 1
V0
V1
V2
V3
0222
0000
0200
0220
v0 0
0
13
10
7
3 6 13
3 6 13
13 3 10
10 13 7
7
0
0
2
v1
V2
12
V0
V1
V2
V3
Solving the constraints
Do shortest path on constraint graph: (O(|V| 3 )).
A solution exists if and only if there exists no negative
weighted cycle.
D>7:
Legal: r(u)-r(v)w(e) r(u)-r(v)W(u,v)-1
r (v0 ) r (v1 ) 2
r (v1 ) r (v2 ) 0
r (v1 ) r (v3 ) 0
r (v2 ) r (v3 ) 0
r (v3 ) r (v0 ) 0
r (v0 ) r (v3 ) 1
r (v1 ) r (v0 ) 1
r (v1 ) r (v3 ) 1
r (v2 ) r (v0 ) 1
r (v2 ) r (v1 ) 1
r (v2 ) r (v3 ) 1
r (v3 ) r (v1 ) 1
r (v3 ) r (v2 ) 1
A solution is r(v0) = r(v3) = 0, r(v1) = r(v2) = -1
r(0)
-1
-1
2
0
-1
r(2)
-1
1
1
0,-1
1
r(1)
0,-1
r(3)
0
Constraint graph
13
Retiming
To find the minimum cycle time, do a binary search among the
0
v0 0
entries of the D matrix (0( V
v1
V0 V1 V2 V3
V0
V1
V2
V3
3
V2
Retime
Host
0222
0000
0200
0220
Retimed correlator:
Clock cycle
= 3+3+7=13
log V ))
0
2
V0 V1 V2 V3
0
13
10
7
3
3
13
10
6
6
3
13
13
13
10
7
Host
Clock cycle = 7
14
V0
V1
V2
V3
Retiming: 2 more algorithms
1. Relaxation based:
Repeatedly find critical path;
retime vertex at end of path by +1 (O(
(O( V
E log
log V ))
2. Also, Mixed Integer Linear Program formulation
+1
Critical path
15
Retiming for minimum area
(minimum # latches)
Goal: minimize number of registers used
min N r wr (e)
eE
( w(e) r (v) r (u ))
e:u v
w(e)
eE
(r (v) r (u ))
e:u v
N (r (v) r (u ))
u v
N r (v)(# fanin(v ) # fanout (v)
vV
N aV r (v )
vV
where av is a constant.
16
Minimum registers formulation
Minimize:
a r (v )
vV
Subject to: wr(e) =w(e) + r(v) - r(u) 0
Reducible to a flow problem
17
Retiming and resynthesis:
motivation
Goal: incorporate combinational optimization into sequential optimization
Nave approach: carve out combinational regions, do optimization on each region. Only local
gains made.
Can we do any better?
RnR: a new approach
Sentovich, Malik, Brayton andSangiovanni-Vincentelli (89)
3 step approach
1.
2.
3.
Move registers to boundary of circuit
Optimize network
Move registers back in an optimal way
18
RnR: circuit representation
Circuit representation: communication graph
internal/peripheral edges
edge-weight = register count
c
1
0
2
1
i
internal
peripheral
19
Extended Retiming
Move register to the periphery
Negative edge-weights permitted
-1
negative
latch
A negative latch has the interpretation that it
advances its output by 1 clock cycle instead of
delaying it
20
Peripheral Retiming
A retiming is called a peripheral retiming if it results in
all internal edges having zero weight
Peripheral edges can have negative weight
1
All internal edge
weights are 0
1
k k
peripheral
weights
21
Peripheral Retiming
A circuit that undergoes peripheral retiming followed by
a legal retiming, i.e. one that results in all weights 0
is functionally equivalent to the original circuit
Functional equivalence: equivalence of finite state
automata:
But we have to be careful about the initial conditions and
initializing sequences. The resulting circuit may only exhibit
equivalence after an appropriate delay. See [Singhal [Link]
ICCAD 1995].
This holds even for regular retiming
22
Peripheral Retiming - an example
c
1
c
0
Peripheral retiming
Not possible for all circuits:
0
i
j
1
j
a 0
b
0
0
c 0
d
o1
o2
23
Path Weight Matrix (PWM)
Matrix W
inputs
rows:
columns: outputs
1
i
Wij
o2
o1
i j
w(e)
o1
I 1
j 0
k *
no path i j
same weight for all paths
if paths with different weights
o2
*
~
0
24
PWM and Peripheral Retiming
Satisfiable path weight matrix:
1. Wij ~, i, j and
2. i, j, such that Wij= I + j , Wij *
Peripheral retiming possible matrix is satisfiable
i, j specify registers on peripheral edge
some i, j can be negative
Complexity: linear in size of communication graph
25
Path Weight Matrix - generation
From inputs to outputs:
generate output columns of W.
Example:
[ 1 0 * ] o1
o2
[*~0]
Paths from inputs
to node
1
i
[0**]
o2
*
~
0
( [ * 0 * ]+1) & [ * 0 0 ]
=[*1*]&[*00]
=[*~0]
[10*]
([ 0 * * ]+1) & ([ * 0 * ]+0)
=[1**]&[*0*]
=[10*]
o1
I 1
j 0
k *
[*0*]
([ * 0 * ]+0) & ([ * * 0 ]+0)
=[*0*]&[**0]
=[*00]
[**0]
26
Computing ,
Each constraint of the form I + j = Wij
Procedure
1.
2.
3.
4.
set 1 = 0
use first row to generate s
determine s from s
check for consistency
o
0/2
Example:
c
1/0
2/0
o
1/1 i 2
j 3
1/0
i
1 = 0
2 = 1
1 = 2
27
Computing ,
Example:
i
j
a 0
b
o1 o2
i 0 0
j 0 1
0
0
c 0
d
1+ 1 = 0
o1
o2
2+ 1 = 0
1+ 2 = 0 2+ 2 = 1
-----------------------------1- 2 = 0
1- 2 = -1
contradiction
28
Optimizing Acyclic Sequential
Circuits
Acyclic circuits with satisfiable W
Do peripheral retiming putting i,j registers at the I/O
Resynthesize interior
Do a legal retiming (move registers in)
May not always be possible
Acyclic circuits with unsatisfiable W
Identify maximal sub-circuits with satisfiable W
Cut connections
Repeat previous procedure
29
Acyclic Sequential Circuits
cut
circuits
Note that both of the cut circuits can be peripherally
retimed, unlike the original.
30
Optimizing Cyclic Sequential
Circuits
Cyclic circuits
1. Make circuit acyclic by breaking cycles
Different feedback-cuts give different Ws
2. Find sub-circuits with satisfiable Ws
3. Repeat procedure
31
FSM Optimization
X
Y
OUT
A
Break feedback
Y
OUT
B
32
FSM Optimization
Peripheral retiming
combinational
Y
OUT
B
X
Resynthesize
Y
OUT
B
33
FSM Optimization
Retime
X
Y
OUT
B
Reconnect
OUT
B
34
FSM Optimization
Original circuit
OUT
A
Resynthesized circuit
OUT
A
35
Resynthesis of Pipelines
Goal: Performance optimization of pipeline
circuits
Example: Pipeline circuit C
In
Ii
I1
Cn
Ci
C1
On
Oi
O1
In
Peripheral
Retiming
Cn
-(n -1)
Ii
Ci
-(i-1)
I1
C1
On
n -1
Oi
i-1
Combinational
circuit
O1
36
Resynthesis of Pipelines
Parameters:
A
R
c
C
vector of arrival times for inputs
vector of required times for outputs
target clock cycle
circuit
Pipeline performance problem:
PP(CP,cP,AP,RP)
Combinational performance problem:
PC(CC,cC,AC,RC)
37
Resynthesis of Pipelines
Problem Transformation:
Relax PP(CP,cP,AP,RP) to P P(CP,cp+,AP,RP)
where = largest possible single gate delay
Convert: pipeline P P to combinational problem P C
C
P
peripheral retime
CC
stagei
C
(i 1) c R
stagei
C
(i 1) c A
stagei
P
stagei
P
38
Perfomance Synthesis of Pipeline
Arrival and Required Times for P C :
In
Ii
I1
Cn
Ci
On
Oi
In
Peripheral
Retiming
Cn
On
(i-1)c+Ri
Ii
Ci
Oi
Combinational
circuit
(i-1)c+Ai
C1
O1
I1
C1
O1
Theoretical Contribution :
If P C(cp+) has a solution then retiming yields a solution to P P (cp+)
If there is a solution to PP (cp) then peripheral retiming yields a solution
to P C (cp+)
39