Lecture1
Design Flow
[Link] Abdel-Majeed
Associate Professor
University of Jordan
Coping with Complexity
• How to design System-on-Chip?
• Many millions (even billions!) of transistors
• Tens to hundreds of engineers
• Structured Design
• Design Partitioning
2
Structured Design
• ________: Divide and Conquer
• Recursively system into modules
• ________:
• Reuse modules wherever possible
• Ex: Standard cell library
• ________: well-formed interfaces
• Allows modules to be treated as black boxes
• ________:
• Physical and temporal
3
Structural Decomposition of 4-b Adder
CMOS Digital Integrated Circuits – 4th Edition 4
Structural Decomposition of 4-b Adder
• Easier to
handle
CMOS Digital Integrated Circuits – 4th Edition 5
Structural Hierarchy of 16-b Adder
6
Structural Hierarchy of 16-b Adder
7
Concepts of Regularity
• Regularity
• decomposition into
similar blocks
• Example: parallel
multiplication array
8
Concepts of Modularity and Locality
• Modularity
• Functional blocks have well-defined functions and interfaces
• Each block can be designed independently and combined easily
• Design process parallelized
• Locality
• Ensures connections are mostly between neighboring modules
• Delay minimized by avoiding long interconnect
CMOS Digital Integrated Circuits – 4th Edition 9
Design Partitioning
• Architecture: User’s perspective, what does it do?
• Instruction set, registers
• MIPS, x86, Alpha, PIC, ARM, …
• Microarchitecture
• Single cycle, multcycle, pipelined, superscalar?
• Logic: how are functional blocks constructed
• Ripple carry, carry lookahead, carry select adders
• Circuit: how are transistors used
• Complementary CMOS, pass transistors, domino
• Physical: chip layout
• Datapaths, memories, random logic
10
Flow of Circuit Design Procedures
CMOS Digital Integrated Circuits – 4th Edition 11
More Simplified VLSI Design Flow
• Simplified design flow
• Verification plays an important role in every step
• Top-down and bottom-up approaches combined in the design process
12
Example 1.1 (1)
• Problem: Design of 1-bit full-adder circuit using __ nm, twin-well
CMOS technology
• Specifications:
• Propagation delay of sum and carry_out <_____ __
• Transition delay of sum and carry_out < ____ __
• Circuit area < 10 µm2
• Dynamic power dissipation (@VDD = 1.1V and fmax = 500MHz) < 20 µW
CMOS Digital Integrated Circuits – 4th Edition13
Example 1.1 (2)
• Boolean Description
• Boolean Functions:
• A , B = Two inputs
• C = Carry in
• sum_out =ABC +AB’C’ + A’B’C + A’C’B
• carry_out = AB +AC+BC
• Alternatively, sum_out = ABC+(A+B+C)(carry_out)’
CMOS Digital Integrated Circuits – 4th Edition 14
Example 1.1 (2)
• Boolean Description
• Boolean Functions:
• A , B = Two inputs
• C = Carry in
• sum_out =ABC +AB’C’ + A’B’C + A’C’B
• carry_out = AB +AC+BC
• Alternatively, sum_out = ABC+(A+B+C)(carry_out)’
CMOS Digital Integrated Circuits – 4th Edition 15
Example 1.1 (3)
• Logic circuit
CMOS Digital Integrated Circuits – 4th Edition 16
Example 1.1 (4)
• Transistor-level circuit
• AND = Series-connected nMOS
• OR = Parallel- connected nMOS
• pMOS network = dual of nMOS network
17
Example 1.1 (6)
• Initial sizes
Minimum Size Full Adder
1.2 Carry In
0.6
• nMOS, (W/L) = 90nm/50nm 0.0
1.2 A
• pMOS, (W/L) = 90nm/50nm 0.6
• May need to be changed depending
0.0
1.2 B
0.6
on performance 0.0
1.2 Carry Out
0.6
0.0
1.2 SUM
0.6
0.0
0 2 4 6 8 10
Time [ns]
Figure [Link] input and output waveforms of the
full-adder circuit.
CMOS Digital Integrated Circuits – 4th Edition 18
Example 1.1 (7)
• Timing constraint violation
Worst-Case Delay, Minimum-Size Transistors
1.2
• sum_out and carry_out violate timing 1.0
constraints
Amplitude [V]
0.8
• Worst-case delay 250 ps (> 220 ps) 0.6
• Modification necessary 0.4 INPUT
CARRY
0.2 SUM
0.0
-0.2
4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0
Time [ns]
Figure 1.11. Simulated output waveforms of the full adder circuit
with minimum transistor dimensions, showing the signal propagation
delay during one of the worst-case transitions.
19
Example 1.1 (8)
Worst-Case Delay, Optimum-Size Transistors
• Resizing transistors to improve 1.2
design 1.0
Amplitude [V]
• is an iterative process
0.8
0.6
• To meet timing specifications (W/L) of 0.4
INPUT
(n/p)MOS is increased 0.2
CARRY
SUM
0.0
-0.2
4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0
Time [ns]
Figure [Link] output waveforms of the full-adder
circuit with optimized transistor dimensions, showing the
signal propagation delay during the same worst-case transition.
20
Example 1.1 (8)
• Layout Design
• Design rule checker (DRC) tool used to check violation of design rules
• Parasitic capacitances and resistances extracted
• Design Verification
• Extracted parasitics used to create SPICE input file
• Simulation is run
• Simulation Results
• Not all specifications met
CMOS Digital Integrated Circuits – 4th Edition 21
Example 1.1 (9) A B
• New and compact layout for 1-bit VDD
full adder (optimized)
• Now, all the design specifications
are satisfied CO C
ACTIVE
• Propagation and transition (rise/fall)
PIMPLANT
NIMPLANT
POLY
delay within 220 ps VTG
METAL1
• Dynamic power dissipation = 4.9
GND
METAL2
NWELL
µW (<20µW) PWELL
• Area = (2.04 µm x 3.01 µm) = 6.14 SUM
Figure 1.12. Layout of the full-adder circuit, with optimized
µm2 (<10 µm2 ) transistor dimensions.
22
8-bit Binary Adder (1)
• Obtained by cascading 8 full
adders – called “carry ripple
adder”
• Speed limited by the delay of carry
bits GND
S0 S1 S2 S3
C0 C4
VDD
A0 B0 A1 B1 A2 B2 A3 B3
CMOS Digital Integrated Circuits – 4th Edition 23
8-bit Binary Adder (2)
• Simulation results
8 Bit Full Adder
A (8bit) 00 64 9B 25 A5 FD 7F
B (8bit) 00 55 64 B9 12 22 80
• Sum bit of last adder stage is generated
CarryIN
SUM(LSB)
last SUM(1)
SUM(2)
• Overall delay as long as 0.7 ns SUM(3)
SUM(4)
SUM(5)
SUM(6)
SUM(7)
SUM(MSB)
Carry IN of Last Adder Stage
[V] 1.2
0.6
0.0
SUM(7): Sum of Last Adder Stage
[V] 1.2
0.6
0.0
SUM(MSB): Carry OUT of Last Adder Stage
[V] 1.2
0.6
0.0
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0
Time [ns]
CMOS Digital Integrated Circuits – 4th Edition 24
Y-Chart
25
VLSI Design Styles
• Field Programmable Gate Array (FPGA)
• Consists of
• I/O buffers
• Array of configurable logic blocks (CLBs)
• Programmable interconnect structure
• Contains thousands of logic gates
• Routing between CLBs and I/O blocks done by setting the configurable switch
matrices
• Proper choice of design style is essential to delivering the product in time
with low cost
• Full-custom
• Semi-custom
27
Field Programmable Gate Array (1)
[Link]
28
Verilog Example
module fulladder(input a, b, c, a b c
output s, cout);
a b
sum
sum s1(a, b, c, s); cout c carry
carry c1(a, b, c, cout); s
fulladder
endmodule
cout s
module carry(input a, b, c,
output cout)
assign cout = (a&b) | (a&c) | (b&c);
endmodule
29
Gate-level Netlist(Synthesis)
module carry(input a, b, c,
output cout)
g1
a x
wire x, y, z; b
g2 g4
a y
cout
and g1(x, a, b); c
g3
and g2(y, a, c); b z
c
and g3(z, b, c);
or g4(cout, x, y, z);
endmodule
30
Place and route
g1
a x
b
g2 g4
a y
cout
c
g3
b z
c
Circuit Design
• How should logic be implemented?
• NANDs and NORs vs. ANDs and ORs?
• Fan-in and fan-out?
• How wide should transistors be?
• These choices affect speed, area, power
• Logic synthesis makes these choices for you
• Good enough for many applications
• Hand-crafted circuits are still better
32
Standard-Cell Based Design (1)
• One of the most prevalent full custom design styles
• Commonly used logic cells are optimized and developed
• Several versions are stored in a standard library cell
• Each cell is characterized by
• Delay time vs. load capacitance
• Circuit simulation model
• Timing simulation model
• Fault simulation model
• Cell data for place-and-route
• Mask data
33
Standard-Cell Based Design (2)
• Each cell layout is designed with fixed height so that
• Cells can be placed side-by-side
• Routing of intercell connection is easy
Floorplan for a standard-
cell based design contains
I/O frame, cell rows
Channels between rows
• channels may be reduced or
removed if over-the-cell
routing is done
34
Standard-Cell Based Design (3)
• Common bus may be incorporated if cells must share same input
and/or output signals
35
Structured ASIC (____________________)
FPGA Vs. Standard Cell ASIC
Easy to Design Difficult to Design
Short Development Time Long Development Time
Low NRE Costs High NRE Costs
Design Size Limited Support Large Designs
Design Complexity Limited Support Complex Designs
Performance Limited High Performance
High Power Consumption Low Power Consumption
High Per-Unit Cost Low Per-Unit Cost (at high volume)
Structured ASIC’s Combine the Best of Both Worlds
37
• Generally speaking
• 100:1 ratio between the number of gates in a given area for _______,
_______
• ___:___ ratio for performance (based on clock frequency)
• ___:___ ratio for power
Full Custom Design (1)
• Design is done from scratch
• Geometry, orientation and placement of every transistor done by designer
• Development cost and time very high
• “Design Reuse” becoming popular to reduce cost and time
• Example of a true full custom design – design of memory cell (static or
dynamic)
39
Full Custom Design (2)
• Full custom design rarely used due to high labor cost
• Rather combination of different design styles are used to develop a chip
40
Design Quality
• Important metrics for measuring the quality of design
• Testability
• Yield and manufacturability
• Reliability
• Technology updateability
41
Testability
• Fabricated chips should be fully testable which requires
• Generation of good test vectors
• Availability of reliable test fixture at speed
• Design of testable chip
42
Yield and Manufacturability
• Yield may be defined in two ways
• (1) No. of good tested chips divided by the total no. of tested chips
• (2) No. of good tested chips divided by the total no. of chip sites available at
the start of wafer processing – strictest definition
• Chip yield can be further divided into
• Functional yield – obtained by testing the functionality of the chip at a speed
lower than required
• Weeds out problems of short, open and leakage
• Can detect logic and circuit design faults
• Parametric yield – performed at the required speed on chips that passed
functional test
• Delay testing done in this phase
43
Reliability
• Reliability depends on design and process conditions
• Major causes of chip reliability problem are
• Electrostatic discharge (ESD) and electrical overstress (EOS) and electromigration
• Latch-up in CMOS I/O internal circuits
• Hot carrier induced aging
• Oxide breakdown and single event upset
• Power and ground bouncing
• On-chip noise and crosstalk
• Measures taken to ensure reliability
• Metal wire widened to avoid over-etching
• Rise time of signals applied to nMOS gate reduced to avoid aging
44
Technology Updateability
• Process technology advancing at a high pace
• Design styles should be chosen such that chips are technology updateable
• “Silicon Compilation” – where physical layout is done automatically – is used
45
References
Backup
MIPS Architecture
• Example: subset of MIPS processor architecture
• Drawn from Patterson & Hennessy
• MIPS is a 32-bit architecture with 32 registers
• Consider 8-bit subset using 8-bit datapath
• Only implement 8 registers ($0 - $7)
• $0 hardwired to 00000000
• 8-bit program counter
• You’ll build this processor in the labs
• Illustrate the key concepts in VLSI design
48
Instruction Set
49
Instruction Encoding
• 32-bit instruction encoding
• Requires four cycles to fetch on 8-bit datapath
format example encoding
6 5 5 5 5 6
R add $rd, $ra, $rb 0 ra rb rd 0 funct
6 5 5 16
I beq $ra, $rb, imm op ra rb imm
6 26
J j dest op dest
50
MIPS Microarchitecture
• Multicycle marchitecture ( [Paterson04], [Harris07] )
51
Multicycle Controller
Instruction fetch
0 1 2 3
MemRead MemRead MemRead MemRead Instruction decode/
ALUSrcA = 0 ALUSrcA = 0 ALUSrcA = 0 ALUSrcA = 0 register fetch
IorD = 0 IorD = 0 IorD = 0 IorD = 0 4
IRWrite3 IRWrite2 IRWrite1 IRWrite0
ALUSrcB = 01 ALUSrcB = 01 ALUSrcB = 01 ALUSrcB = 01 ALUSrcA = 0
ALUOp = 00 ALUOp = 00 ALUOp = 00 ALUOp = 00 ALUSrcB = 11
PCWrite PCWrite PCWrite PCWrite ALUOp = 00
PCSource = 00 PCSource = 00 PCSource = 00 PCSource = 00
e)
')
-t y p
EQ
(Op = 'J')
Reset
=R
'B
(Op
=
Memory address ')
'S B
p
Branch
(O
= Jump
computation Op
or ( Execution completion completion
B ')
= 'L
5 ( Op 9 11 12
ALUSrcA = 1
ALUSrcA = 1 ALUSrcA =1 ALUSrcB = 00
ALUSrcB = 10 PCWrite
ALUSrcB = 00 ALUOp = 01
ALUOp = 00 PCSource = 10
ALUOp = 10 PCWriteCond
PCSource = 01
(O
(Op = 'L B ')
p
=
'S
B
')
Memory Memory
access access R-type completion
6 8 10
RegDst = 1
MemRead MemWrite RegWrite
IorD = 1 IorD = 1 MemtoReg = 0
Write-back step
7
RegDst = 0
RegWrite
MemtoReg = 1
52
Logic Design
• Start at top level
• Hierarchically decompose MIPS into units
• Top-level interface
2-phase memread
crystal ph1 memwrite
clock
oscillator MIPS
generator ph2 8
processor adr
8 external
writedata
reset memory
8
memdata
53
Block Diagram
54
Hierarchical Design
mips
controller alucontrol datapath
standard bitslice zipper
cell library
alu inv4x flop ramslice
fulladder or2 and2 mux4
nor2 inv nand2 mux2
tri
55