Power Dissipation in CMOS Circuits
Static CMOS are power-efficient because they dissipate nearly zero
power while idle.
However as transistor counts and clock frequency have increased and
transistor dimension decreased, power consumption has sky rocketed
and now is a primary design constraint.
Power dissipation in CMOS circuits are from
Static dissipation due to
subthreshold conduction through OFF transistor
tunneling current through gate oxide (both ON and OFF tr.)
leakage through reverse-biased diodes
Dynamic dissipation due to
Charging and discharging of load capacitance
Short-cirucit current while both NMOS and PMOS are ON.
Contention current of ratioed circuit
Advanced VLSI EEE 6405 Slide1
ABM HARUN-UR RASHID
Charging a Capacitor
When the gate output rises
Energy stored in capacitor is
2
EC 12 CLVDD
But energy drawn from the supply is
EVDD I t VDD dt CL
0
CLVDD
dV
VDD dt
dt
VDD
dV C V
2
L DD
Half the energy from VDD is dissipated in the pMOS transistor
as heat, other half stored in capacitor
When the gate output falls
Energy in capacitor is dumped to GND
Dissipated as heat in the nMOS transistor
Advanced VLSI EEE 6405 Slide2
ABM HARUN-UR RASHID
Switching Waveforms
Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz
Advanced VLSI EEE 6405 Slide3
ABM HARUN-UR RASHID
Activity Factor
Suppose the system clock frequency = f
Let fsw = f, where = activity factor
If the signal is a clock, = 1
If the signal switches once per cycle, =
Dynamic power:
Pswitching CVDD 2 f
Advanced VLSI EEE 6405 Slide4
ABM HARUN-UR RASHID
Review: Energy & Power Equations
In time period T total energy consumed is
E = CL VDD2 + tsc VDD Ipeak + VDD Ileakage T
P = CL VDD2 f + (tsc/T)VDD Ipeak + VDD Ileakage
Dynamic power
(~90% today and
decreasing
relatively)
Advanced VLSI EEE 6405 Slide5
Short-circuit
power
(~8% today and
decreasing
absolutely)
Leakage power
(~2% today and
increasing)
ABM HARUN-UR RASHID
Power and Energy Design Space
Constant
Throughput/Latency
Energy
Design Time
Variable
Throughput/Latency
Non-active Modules
Logic Design
Active
Reduced Vdd
Sizing
Run Time
DFS, DVS
Clock Gating
Multi-Vdd
(Dynamic
Freq, Voltage
Scaling)
Sleep Transistors
Leakage
+ Multi-VT
Multi-Vdd
+ Variable VT
Variable VT
Advanced VLSI EEE 6405 Slide6
ABM HARUN-UR RASHID
Bus Multiplexing
Buses are a significant source of power dissipation due to
high switching activities and large capacitive loading
15% of total power in Alpha 21064
30% of total power in Intel 80386
Share long data buses with time multiplexing (S1 uses even
cycles, S2 odd)
S1
S2
D1
S1
D1
D2
S2
D2
But what if data samples are correlated (e.g., sign bits)?
Advanced VLSI EEE 6405 Slide7
ABM HARUN-UR RASHID
Correlated Data Streams
Bit switching probabilities
MSB
For a shared (multiplexed)
bus advantages of data
correlation are lost (bus
carries samples from two
uncorrelated data
streams)
Bit position
Advanced VLSI EEE 6405 Slide8
LSB
Bus sharing should not be
used for positively
correlated data streams
Bus sharing may prove
advantageous in a
negatively correlated data
stream (where successive
samples switch sign bits) more random switching
ABM HARUN-UR RASHID
Glitch Reduction by Pipelining
Glitches depend on the logic depth of the circuit - gates
deeper in the logic network are more prone to glitching
Reduce logic depth by adding pipeline registers
additional energy used by the clock and pipeline registers
I$
Decode
Instruction
PC
Fetch
Execute
Memory
D$
WriteBack
MDR
arrival times of the gate inputs are more spread due to delay
imbalances
usually affected more by primary input switching
MAR
pipeline
stage
isolation
register
clk
Advanced VLSI EEE 6405 Slide9
ABM HARUN-UR RASHID
Power and Energy Design Space
Constant
Throughput/Latency
Energy
Design Time
Variable
Throughput/Latency
Non-active Modules
Logic Design
Active
Reduced Vdd
Sizing
Run Time
DFS, DVS
Clock Gating
Multi-Vdd
(Dynamic
Freq, Voltage
Scaling)
Sleep Transistors
Leakage
+ Multi-VT
Multi-Vdd
+ Variable VT
Variable VT
Advanced VLSI EEE 6405 Slide10
ABM HARUN-UR RASHID
Clock Gating
Most popular method for power reduction of clock signals
and functional units
Gate off clock to idle functional
units
e.g., floating point units
need logic to generate
R
Functional
e
unit
g
disable
signal
- increases complexity of control logic
- consumes power
- timing critical to avoid clock glitches
at
OR gate output
clock
disable
additional gate delay on clock signal
- gating OR gate can replace a buffer
in the clock distribution tree
Advanced VLSI EEE 6405 Slide11
ABM HARUN-UR RASHID
Clock Gating
The best way to reduce the activity is to turn off the clock
to registers in unused blocks
Saves clock activity ( = 1)
Eliminates all switching activity in the block
Requires determining if block will be used
Advanced VLSI EEE 6405 Slide12
ABM HARUN-UR RASHID
Clock Gating in a Pipelined Datapath
For idle units (e.g., floating point units in Exec stage, WB
stage for instructions with no write back operation)
Execute
Memory
D$
WriteBack
MDR
I$
Decode
Instruction
PC
Fetch
MAR
clk
No FP
Advanced VLSI EEE 6405 Slide13
No WB
ABM HARUN-UR RASHID
Power and Energy Design Space
Constant
Throughput/Latency
Energy
Design Time
Variable
Throughput/Latency
Non-active Modules
Logic Design
Active
Reduced Vdd
Sizing
Run Time
DFS, DVS
Clock Gating
Multi-Vdd
(Dynamic
Freq, Voltage
Scaling)
Sleep Transistors
Leakage
+ Multi-VT
Multi-Vdd
+ Variable VT
Variable VT
Advanced VLSI EEE 6405 Slide14
ABM HARUN-UR RASHID
Voltage / Frequency
Run each block at the lowest possible voltage and
frequency that meets performance requirements
Voltage Domains
Provide separate supplies to different blocks
Level converters required when crossing
from low to high VDD domains
Level Converter
The standard method to handle voltage domain crossings is a level
converter. When A = 0, N1 is OFF and N2 is ON. N2 pulls Y down to
0, which turns on P1, pulling X up to VDDH and ensuring that P2
turns OFF. When A = 1, N1 is ON and N2 is OFF. N1 pulls X down to
0, which turns on P2, pulling Y up to VDDH. In either case, the level
converter behaves as a buffer and properly drives Y between 0 and
VDDH without risk of transistors remaining partially ON.
Unfortunately, the level converter costs delay (about 2 FO4) and
power at each domain crossing.
Advanced VLSI EEE 6405 Slide15
ABM HARUN-UR RASHID
Voltage / Frequency
Many systems have time varying performance requirements. For
example, a video decoder requires more computation for rapidly moving
scenes than for static scenes. Such systems can save large amounts of
energy by reducing the clock frequency to the minimum sufficient to
complete the task on schedule, then reducing the supply voltage to the
minimum necessary to operate at that frequency. This is called dynamic
voltage scaling (DVS) or dynamic voltage/frequency scaling (DVFS) .
The DVS controller takes information from the system about the
workload and/or the die temperature. It determines the supply voltage
and clock frequency sufficient to complete the workload on schedule or
to maximize performance without overheating.
A switching voltage regulator efficiently steps
down Vin from a high value to the necessary
VDD. The core logic contains a phase-locked
loop or other clock synthesizer to generate
the specified clock frequency. Dynamic
Voltage Scaling adjust VDD and f according to
workload
Advanced VLSI EEE 6405 Slide16
ABM HARUN-UR RASHID
Decreasing the VDD
decreases dynamic
energy consumption
(quadratically)
But, increases gate
delay (decreases
performance)
tp(normalized)
Review: Dynamic Power as a Function of VDD
VDD (V)
Determine the critical path(s) at design time and use high
VDD for the transistors on those paths for speed. Use a
lower VDD on the other logic to reduce dynamic energy
consumption.
Advanced VLSI EEE 6405 Slide17
ABM HARUN-UR RASHID
Dynamic Frequency and Voltage Scaling
Intels SpeedStep
Hardware that steps down the clock frequency (dynamic frequency
scaling DFS) when the user unplugs from AC power
- PLL from 650MHz 500MHz
CPU stalls during SpeedStep adjustment
Transmeta LongRun
Hardware that applies both DFS and DVS (dynamic supply
voltage scaling)
- 32 levels of VDD from 1.1V to 1.6V
- PLL from 200MHz 700MHz in increments of 33MHz
Triggered when CPU load change is detected by software
- heavier load ramp up VDD, when stable speed up clock
- lighter load slow down clock, when PLL locks onto new rate,
ramp down VDD
CPU stalls only during PLL relock (< 20 microsec)
Advanced VLSI EEE 6405 Slide18
ABM HARUN-UR RASHID
Power and Energy Design Space
Constant
Throughput/Latency
Energy
Design Time
Variable
Throughput/Latency
Non-active Modules
Logic Design
Active
Reduced Vdd
Sizing
Run Time
DFS, DVS
Clock Gating
Multi-Vdd
(Dynamic
Freq, Voltage
Scaling)
Sleep Transistors
Leakage
+ Multi-VT
Multi-Vdd
+ Variable VT
Variable VT
Advanced VLSI EEE 6405 Slide19
ABM HARUN-UR RASHID
Speculated Power of a 15mm P
70
Power (Watts)
60
50
70
Leakage
Active
0% 0% 0% 0% 1% 1% 1% 2% 3%
40
30
20
60
Power (Watts)
0.25 , 15mm die, 2V
50
40
0.18 , 15mm die, 1.4V
Leakage
Active
9%
0% 0% 1% 1% 2% 3% 5% 7%
30
20
10
10
Temp (C)
Power (Watts)
60
50
40
Leakage
0.13 , 15mm die. 1V Active
26%
20%
11% 15%
1% 2% 3% 5% 8%
30
20
70
50
40
30
20
10
10
Temp (C)
Advanced VLSI EEE 6405 Slide20
41% 49% 56%
33%
60
Power (Watts)
70
Temp (C)
14%
6% 9%
19%
26%
0.1 , 15mm die, 0.7V
Leakage
Active
Temp (C)
ABM HARUN-UR RASHID
Review: Leakage as a Function of Design Time VT
Reducing the VT
increases the subthreshold leakage
current (exponentially)
But, reducing VT
decreases gate delay
(increases performance)
Determine the critical path(s) at design time and use low
VT devices on the transistors on those paths for speed.
Use a high VT on the other logic for leakage control.
Advanced VLSI EEE 6405 Slide21
ABM HARUN-UR RASHID
Review: Variable VT (ABB) at Run Time
VT = VT0 + (|-2F + VSB| - |-2F|)
where VT0 is the threshold voltage at VSB = 0
VSB is the source-bulk (substrate) voltage
is the body-effect coefficient
For an n-channel device,
the substrate is normally
tied to ground
A negative bias causes VT
to increase from 0.45V to
0.85V
Adjusting the substrate
bias at run time is called
adaptive body-biasing
(ABB)
Advanced VLSI EEE 6405 Slide22
VT (V)
VSB (V)
ABM HARUN-UR RASHID
Multi-Threshold CMOS (MTCMOS) Sleep Transistor
Advanced VLSI EEE 6405 Slide23
Adds high-Vth sleep transistor between
pull-up network and Vdd and between
pull down network and ground
Logic circuit use low-vth transistor for
speed
Sleep transistor are turned off when the
logic circuit is not in use by the sleep
signal.
The additional slepp transistor increase
increase area and delay. Furthermore the
pull up and pull down network will have
floating values and thus will loose state
during sleep mode.
ABM HARUN-UR RASHID
Multi-Threshold CMOS (MTCMOS) Sleep Transistor
Advanced VLSI EEE 6405 Slide24
Adds high-Vth sleep transistor between
pull-up network and Vdd and between
pull down network and ground
Logic circuit use low-vth transistor for
speed
Sleep transistor are turned off when the
logic circuit is not in use by the sleep
signal.
The additional sleep transistor increase
area and delay. Furthermore the pull up
and pull down network will have floating
values and thus will loose state during
sleep mode.
ABM HARUN-UR RASHID
Stacked transistor
Stack effect result in substantial sub
threshold leakage current reduction when
two or more stacked transistor are turned
off together.
Series OFF transistor
demonstrating Stack
effect.
Stack effect reduces subs-threshold leakage by a
factor of 10..
Advanced VLSI EEE 6405 Slide25
ABM HARUN-UR RASHID
Forced Stack
Advanced VLSI EEE 6405 Slide26
Transistor stacking exploits stack effect :
which results in substantial sub-threshold
leakage current reduction when two or
more stacked transistor are turned off
together.
Stacking increases delay.
ABM HARUN-UR RASHID
Sleepy stack
Advanced VLSI EEE 6405 Slide27
Sleepy stack technique divide the
existing transistor into two transistors
each typically with the same width W1
half the size of the original single
transistors width W0 (i.e. W1=W0/2)
During active mode all sleep transistors
are turned on.
High Vth transistor can be used for the
sleep transistors and the transistor
parallel to the sleep transistors without
incurring large delay.
ABM HARUN-UR RASHID
Sleepy Keeper
Advanced VLSI EEE 6405 Slide28
A PMOS transistor is placed in parallel to
the sleep transistor (Sleep) and a NMOS
is placed in parallel to the sleep transistor
(Sleep)
During sleep mode sleep transistors are
turned OFF and one of the transistors in
parallel to the sleep transistors keep the
connection with the appropriate power
rail to maintain a vlue of 1 in sleep
mode, given that the 1 have already
been calculated.
Similarly a value of 0 is maintained in
sleep mode, given the 0 value have
already been calculated.
ABM HARUN-UR RASHID