Chapter 4: Circuit Level Optimiation at Design Time

Report
Optimizing Power @ Design Time
Circuits
Jan M. Rabaey
Dejan Marković
Borivoje Nikolić
Low Power Design Essentials ©2008
Chapter 4
Chapter Outline
 Optimization framework for energy-delay trade-off
 Dynamic power optimization
– Multiple supply voltages
– Transistor sizing
– Technology mapping
 Static power optimization
– Multiple thresholds
– Transistor stacking
Low Power Design Essentials ©2008
4.2
Energy/Power Optimization Strategy
 For given function and activity, an optimal operation
point can be derived in the energy-performance space
 Time of optimization depends upon activity profile
 Different optimizations apply to active and static power
Fixed
Activity
Variable
Activity
No Activity
- Standby
Design time
Run time
Sleep
Active
Static
Low Power Design Essentials ©2008
4.3
Energy-Delay Optimization and Trade-off
Energy/op
Trade-off space
Unoptimized
design
Emax
Emin
Dmin
Dmax Delay
Maximize throughput for given energy or
Minimize energy for given throughput
Other important metrics: Area, Reliability, Reusability
Low Power Design Essentials ©2008
4.4
The Design Abstraction Stack
A very rich set of design parameters to consider!
It helps to consider options in relation to their
abstraction layer
System/Application
This Chapter
Software
Choice of algorithm
Amount of concurrency
(Micro-)Architecture
Parallel versus pipelined,
general purpose versus
application specific
Logic/RT
logic family, standard cell
versus custom
Circuit
sizing, supply, thresholds
Device
Bulk versus SOI
Low Power Design Essentials ©2008
4.5
Optimization Can/Must Span Multiple Levels
Architecture
Micro-Architecture
Circuit (Logic & FFs)
Design optimization combines top-down and bottom-up:
“meet-in-the-middle”
Low Power Design Essentials ©2008
4.6
topology A
topology B
Delay
Energy/op
Energy/op
Energy-Delay Optimization
topology A
topology B
Delay
Globally optimal energy-delay curve for a
given function
Low Power Design Essentials ©2008
4.7
Some Optimization Observations
Energy
∂E / ∂A
SA=
∂D / ∂A
SA
A=A0
(A0,B0)
SB
f (A,B0)
f (A0,B)
D0
Delay
Energy-Delay Sensitivities
Low Power Design Essentials ©2008
[Ref: V. Stojanovic, ESSCIRC’02]
4.8
Finding the Optimal Energy-Delay Curve
Pareto-optimal:
the best that can be achieved without disadvantaging at least one metric.
f (A1,B)
Energy
∆E = SA∙(∆D) + SB∙∆D
(A0,B0)
f (A,B0)
∆D
D0
f (A0,B)
Delay
On the optimal curve, all sensitivities must be equal
Low Power Design Essentials ©2008
4.9
Reducing Active Energy @ Design Time
Eactive ~ a  CL Vswing VDD
Pactive ~ a  CL Vswing VDD  f
 Reducing voltages
– Lowering the supply voltage (VDD) at the expense of clock
speed
– Lowering the logic swing (Vswing)
 Reducing transistor sizes (CL)
– Slows down logic
 Reducing activity (a)
– Reducing switching activity through transformations
– Reducing glitching by balancing logic
Low Power Design Essentials ©2008
4.10
Observation
 Downsizing and/or lowering the supply on the critical path
lowers the operating frequency
 Downsizing non-critical paths reduces energy for free, but
target
delay
tp (path)
Low Power Design Essentials ©2008
# of paths
# of paths
– Narrows down the path delay distribution
– Increases impact of variations, impacts robustness
target
delay
tp (path)
4.11
Circuit Optimization Framework
Energy (VDD, VTH, W)
Delay (VDD, VTH, W) ≤ Dcon
Constraints
VDDmin < VDD < VDDmax
VTHmin < VTH < VTHmax
Wmin < W
 Reference case
Energy/op
minimize
subject to
topology A
topology B
– Dmin sizing @ VDDmax, VTHref
Low Power Design Essentials ©2008
[Ref: V. Stojanovic, ESSCIRC’02]
Delay
4.12
Optimization Framework: Generic Network
Ci
VDD,i
VDD,i+1
i
i+1
gCi
Cw
Ci+1
Gate in stage i loaded by fanout (stage i+1)
Low Power Design Essentials ©2008
4.13
Alpha-power based Delay Model
K dVDD
gC i Cw  Ci 1
1 Ci1
t p
(
)   nom (1  
)
ad
gCi
g Ci
(VDD  Von )
Fit parameters: Von, ad, Kd, g
4
60
simulation
model
simulation
model
50
3
Von = 0.37 V
a d = 1.53
2.5
2
1.5
Delay (ps)
FO4 delay (norm.)
3.5
nom = 6 ps
g = 1.35
40
30
20
1
0.5
0
tp
10
(90nm technology)
0.5
0.6
0.7 0.8 0.9
ref
VDD / VDD
Low Power Design Essentials ©2008
1
0
0
2
4
6
8
10
Fanout (Ci+1/Ci)
VDDref = 1.2V, technology 90 nm
4.14
Combined with Logical Effort Formulation
For Complex Gates
t p   nom ( pi 
fi gi
g
)
 Parasitic delay pi – depends upon gate topology
 Electrical effort fi ≈ Si+1/Si
 Logical effort gi – depends upon gate topology
 Effective fanout hi = figi
Low Power Design Essentials ©2008
[Ref: I. Sutherland, Morgan-Kaufman’99]
4.15
Dynamic Energy
Edyn  (gCi  Cw  Ci 1 ) VDD ,i  Ci (g  f i) VDD ,i
2
f i  (Cw  Ci 1 ) / Ci  Si1 / S i
Ci  K e Si
Ci
2
VDD,i
VDD,i+1
i
i+1
gCi
Cw
Ei  Ke Si (V
2
DD ,i 1
Ci+1
 gVDD ,i )
2
= energy consumed by logic gate i
Low Power Design Essentials ©2008
4.16
Optimizating Return on Investment (ROI)
Depends on Sensitivity (E/D)
 Gate Sizing
E
D
Si
Si
Ei

 nom (hi  hi 1 )
 for equal h
(Dmin)
 Supply Voltage
E
D
VDD
VDD
Von
2  (1 
)
E
VDD
 
D a  1  Von
d
VDD
Low Power Design Essentials ©2008
max at VDD(max)
(Dmin)
4.17
Example: Inverter Chain
 Properties of inverter chain
– Single path topology
– Energy increases geometrically from input to output
1
S1 = 1
S2
S3
…
SN
CL
 Goal
– Find optimal sizing S = [S1, S2, …, SN], supply voltage, and
buffering strategy to achieve the best energy-delay tradeoff
Low Power Design Essentials ©2008
4.18
Inverter Chain: Gate Sizing
effective fanout, h
25
nom
opt
20
d
inc
= 50%
30%
15
10%
10
1%
5
0%
0
1
2
3
4 5
stage
6
7
S i 1  S i 1
Si 
1  S i 1
2
2  K e  VDD

 nom  FS
Ei
FS 
hi  hi 1
2
[Ref: Ma, JSSC’94]
 Variable taper achieves minimum energy
 Reduce number of stages at large dinc
Low Power Design Essentials ©2008
4.19
Inverter Chain: VDD Optimization
0%
V
DD
/ V DD
nom
1.0
1%
0.8
10%
0.6
30%
0.4
d
= 50%
nom
opt
0.2
0
inc
1
2
3
4
5
stage
6
7
 VDD reduces energy of the final load first
 Variable taper achieved by voltage scaling
Low Power Design Essentials ©2008
4.20
Inverter Chain: Optimization Results
100
0.8
energy reduction (%)
Sensitivity (norm)
1.0
S
gVDD
2VDD
cVDD
0.6
0.4
0.2
0
0
10
20 30
dinc (%)
40
50
80
60
40
20
0
0
10
20 30
dinc (%)
40
50
 Parameter with the largest sensitivity has the largest
potential for energy reduction
 Two discrete supplies mimic per-stage VDD
Low Power Design Essentials ©2008
4.21
Example: Kogge-Stone Tree Adder
(A15, B15)
S15
 Tree adder
– Long wires
– Re-convergent paths
– Multiple active outputs
(A0, B0)
S0
Cin
Low Power Design Essentials ©2008
[Ref: P. Kogge, Trans. Comp’73]
4.22
Tree Adder: Sizing vs. Dual-VDD Optimization
 Reference design: all paths are critical
reference
D=Dmin
sizing: E (-54%)
dinc=10%
2Vdd: E (-27%)
dinc=10%
 Internal energy  S more effective than VDD
– S: E(-54%), 2Vdd: E(-27%) at dinc = 10%
Low Power Design Essentials ©2008
4.23
Tree Adder: Multi-dimensional Search
1
Reference
VDD, VTH
0.8
S, VDD
S, VTH
0.6
S, VDD, VTH
0.4
0.2
0
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Delay / Dmin
 Can get pretty close to optimum with only 2 variables
 Getting the minimum speed or delay is very expensive
Low Power Design Essentials ©2008
4.24
Multiple Supply Voltages
 Block-level supply assignment
– Higher throughput/lower latency functions are
implemented in higher VDD
– Slower functions are implemented with lower VDD
– This leads to so-called “voltage islands” with separate
supply grids
– Level conversion performed at block boundaries
 Multiple supplies inside a block
– Non-critical paths moved to lower supply voltage
– Level conversion within the block
– Physical design challenging
Low Power Design Essentials ©2008
4.25
Using Three VDD’s
© IEEE 2002
1
1.3
1.21.2
0.8
1.1
0.7
11
0.6
0.5
V2 (V)
0.9
V3 (V)
Power Reduction Ratio
1.41.4
0.9
0.80.8
0.7
0.4
1.5
+
0.60.6
0.5
1
0.5
0 0
0.5
1
1.5
0.40.4
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9 1
1
V1 (V)
1.1
1.2
1.2
1.3
1.4
1.4
V2 (V)
V1 = 1.5V, VTH = 0.3V
Low Power Design Essentials ©2008
[Ref: T. Kuroda, ICCAD’02]
4.26
Optimum Number of VDD’s
{ V1, V2, V3 }
{ V1, V2 }
VDD Ratio
1.0
{ V1, V2, V3, V4 }
V2/V1
V2/V1
V2/V1
V3/V1
V3/V1
0.5
V4/V1
1.0
P Ratio
P2/P1
P3/P1
P4/P1
0.4
© IEEE 2001
0.5
1.0
V1
1.5
(V)
0.5
1.5 0.5
1.0
V1
(V)
1.0
V1
1.5
(V)
 The more VDD’s the less power, but the effect saturates
 Power reduction effect decreases with scaling of VDD
 Optimum V2/V1 is around 0.7
Low Power Design Essentials ©2008
[Ref: M. Hamada, CICC’01]
4.27
Lessons: Multiple Supply Voltages
 Two supply voltages per block are optimal
 Optimal ratio between the supply voltages is 0.7
 Level conversion is performed on the voltage boundary,
using a level-converting flip-flop (LCFF)
 An option is to use an asynchronous level converter
– More sensitive to coupling and supply noise
Low Power Design Essentials ©2008
4.28
Distributing Multiple Supply Voltages
Conventional
VDDH
i1
Shared N-well
VDDH
VDDL
VDDL
o1
i1
i2
o2
VSS
VDDH circuit
Low Power Design Essentials ©2008
o1
i2
o2
VSS
VDDL circuit
VDDH circuit
VDDL circuit
4.29
Conventional
VDDL Row
N-well isolation
VDDH
VDDL
VDDH Row
VDDL Row
VDDH Row
(a) Dedicated row
VSS
VDDH circuit
VDDL circuit
VDDH
Region
VDDL
Region
(b) Dedicated region
Low Power Design Essentials ©2008
4.30
Shared N-Well
Shared N-well
VDDL circuit
VDDH circuit
VDDH
VDDL
VSS
VDDH circuit
VDDL circuit
[Shimazaki et al, ISSCC’03]
Low Power Design Essentials ©2008
(a) Floor plan image
4.31
Example: Multiple Supplies in a Block
Conventional Design
CVS Structure
FF
Level-Shifting F/F
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
Critical Path
FF
© IEEE 1998
FF
Critical Path
Lower VDD portion is shared
“Clustered voltage scaling”
Low Power Design Essentials ©2008
[Ref: M. Takahashi, ISSCC’98]
4.32
Level Converting Flip-Flops (LCFFs)
level conversion
ck
ckb
level conversion
sf
mo
db
so
q
d
ck sf
so
d
q (inv.)
ck
mf
MN1 MN2
ckb
ck
ck
clk
clk
Master-Slave
Pulsed Half-Latch
© IEEE 2003
Pulsed Half-Latch versus Master-Slave LCFFs
 Smaller # of MOSFETs / clock loading
 Faster level conversion using half-latch structure
 Shorter D-Q path from pulsed circuit
Low Power Design Essentials ©2008
[Ref: F. Ishihara, ISLPED’03]
4.33
Dynamic Realization of Pulsed LCFF
VDDH
xb
 Pulsed precharge
LCFF (PPR)
– Fast level conversion by
precharge mechanism
– Suppressed
charge/discharge toggle
by conditional capture
– Short D-Q path
clk
MN1
ckd1
MN2
VDDH
VDDH
IV1
x
MP1
q (inv.)
qb
clk
ckd1
d
level conversion
db
ck
qb
Pulsed Precharge Latch
© IEEE 2003
Low Power Design Essentials ©2008
[Ref: F. Ishihara, ISLPED’03]
4.34
Case Study: ALU for 64-bit Processor
clock gen.
clk
ain0
ain
9:1
MUX
5:1
MUX
9:1
MUX
2:1
MUX
carry
gp
gen.
INV2
bin
: VDDH circuit
: VDDL circuit
carry
gen.
partial
sum
sum
sum
sel.
INV1
s0/s1
0.5pF
logical
unit
sumb (long loop-back bus)
© IEEE 2003
Low Power Design Essentials ©2008
[Ref: Y. Shimazaki, ISSCC’03]
4.35
Low-Swing Bus and Level Converter
VDDH
pc
VDDL
VDDL
sumb
sum
INV1
keeper
sel
(VDDH)
VDDH
ain0
INV2
domino level converter (9:1 MUX)
© IEEE 2003
 INV2 is placed near 9:1 MUX to increase noise immunity
 Level conversion is done by a domino 9:1 MUX
Low Power Design Essentials ©2008
[Ref: Y. Shimazaki, ISSCC’03]
4.36
Measured Results: Energy and Delay
Energy [pJ]
800
Room temperature
© IEEE 2003
700
600
500
400
300
200
0.6
1.16GHz
VDDL=1.4V
Energy:-25.3%
Delay :+2.8%
VDDL=1.2V
Energy:-33.3%
Delay :+8.3%
0.8
Low Power Design Essentials ©2008
1.0 1.2
TCYCLE [ns]
1.4
Single-supply
Shared well
(VDDH=1.8V)
1.6
[Ref: Y. Shimazaki, ISSCC’03]
4.37
Practical Transistor Sizing
 Continuous sizing of transistors only an option in
custom design
 In ASIC design flows, options set by available
library
 Discrete sizing options made possible in
standard-cell design methodology by providing
multiple options for the same cell
– Leads to larger libraries (> 800 cells)
– Easily integrated into technology mapping
Low Power Design Essentials ©2008
4.38
Technology Mapping
a
b
f
c
d
slack=1
Larger gates reduce capacitance, but are slower
Low Power Design Essentials ©2008
4.39
Technology Mapping
Example: 4-input AND
 (a) Implemented using 4 input NAND + INV
 (b) Implemented using 2 input NAND + 2-input NOR
Gate
type
Library 1:
High-Speed
Library 2:
Low-Power
Area
(cell unit)
Input
cap. (fF)
Average delay
(ps)
Average delay
(ps)
INV
3
1.8
7.0 + 3.8 CL
12.0 + 6.0 CL
NAND2
4
2.0
10.3 + 5.3 CL
16.3 + 8.8 CL
NAND4
5
2.0
13.6 + 5.8 CL
22.7 + 10.2 CL
NOR2
3
2.2
10.7 + 5.4 CL
16.7 + 8.9 CL
(delay formula: CL in fF)
Low Power Design Essentials ©2008
(numbers calibrated for 90 nm)
4.40
Technology Mapping – Example
4-input AND
(a) NAND4 +
INV
(b) NAND2 +
NOR2
Area
8
11
HS: Delay (ps)
31.0 + 3.8 CL
32.7 + 5.4 CL
LP: Delay (ps)
53.1 + 6.0 CL
52.4 + 8.9 CL
Sw Energy (fF)
0.1 + 0.06 CL
0.83 + 0.06 CL
 Area
– 4-input more compact than 2-input (2 gates vs. 3 gates)
 Timing
– both implementations are 2-stage realizations
– 2nd stage INV (a) is better driver than NOR2 (b)
– For more complex blocks, simpler gates will show better
performance
 Energy
– Internal switching increases energy in the 2-input case
– Low-power library has worse delay, but lower leakage (see later)
Low Power Design Essentials ©2008
4.41
Gate-Level Tradeoffs for Power
 Technology mapping
 Gate selection
 Sizing
 Pin assignment
 Logical Optimizations
 Factoring
 Restructuring
 Buffer insertion/deletion
 Don’t care optimization
Low Power Design Essentials ©2008
4.42
Logic Restructuring
1
1
1
0
0
1
0
1
1
Logic restructuring to minimize spurious transitions
1
1
1
1
2
1
1
1
1
1
3
Buffer insertion for path balancing
Low Power Design Essentials ©2008
4.43
Algebraic Transformations
Idea: Modify network to reduce capacitance
p1=0.05
a
b
a
c
p3=0.075
f
p5=0.075
a
f
b
c
p2=0.05
p4=0.75
pa = 0.1; pb = 0.5; pc = 0.5
Caveat: This may increase activity!
Low Power Design Essentials ©2008
4.44
Lessons from Circuit Optimization
 Joint optimization over multiple design parameters
possible using sensitivity-based optimization framework
– Equal marginal costs ⇔ Energy-efficient design
 Peak performance is VERY power inefficient
– About 70% energy reduction for 20% delay penalty
– Additional variables for higher energy-efficiency
 Two supply voltages in general sufficient; 3 or more
supply voltages only offer small advantage
 Choice between sizing and supply voltage parameters
depends upon circuit topology
 But … leakage not considered so far
Low Power Design Essentials ©2008
4.45
Considering Leakage @ Design Time
 Considering leakage as well as dynamic
power is essential in sub-100 nm
technologies
 Leakage is not essentially a bad thing
– Increased leakage leads to improved
performance, allowing for lower supply voltages
– Again a trade-off issue …
Low Power Design Essentials ©2008
4.46
Leakage – Not Necessarily a Bad Thing
1
Version 1
Vref
-180mV
th
0.8
 ELk
max
E norm
0.81VDD
ESw opt 
0.6
Version 2
0.4
Topology
 Ld
ln 
 a avg


K


Inv Add Dec
(ELk/ESw)opt 0.8
Vref
-140mV
th
0.2
2
0.5
0.2
max
0.52VDD
© IEEE 2004
0 -2
10
-1
0
10
10
Estatic /Edynamic
1
10
Optimal designs have high leakage (ELk/ESw ≈ 0.5)
Must adapt to process and activity variations
Low Power Design Essentials ©2008
[Ref: D. Markovic, JSSC’04]
4.47
Refining the Optimization Model
 Switching energy
Edyn  a 01Ke S (g  f )VDD
2
 Leakage energy
Estat  SI 0 (Y )e
VTH  d VDD
kT / q
VDD Tcycle
with:
I0(Y): normalized leakage current with inputs in state Y
Low Power Design Essentials ©2008
4.48
Reducing Leakage @ Design Time
 Using longer transistors
– Limited benefit
– Increase in active current
 Using higher thresholds
– Channel doping
– Stacked devices
– Body biasing
 Reducing the voltage!!
Low Power Design Essentials ©2008
4.49
Longer Channels
1.0
10
90 nm CMOS
0.8
9
8
Leakage power
0.7
7
0.6
6
0.5
5
0.4
4
Switching energy
0.3
3
0.2
2
0.1
100
110
120
130
140
150
160
170
180
190
Normalized switching energy
Normalized leakage power
0.9
 10% longer gates
reduce leakage by
50%
 Increases switching
power by 18% with
W/L = const.
1
200
Transistor length (nm)
 Doubling L reduces leakage by 5x
 Impacts performance
– Attractive when don’t have to increase W (e.g. memory)
Low Power Design Essentials ©2008
4.50
Using Multiple Thresholds
 There is no need for level conversion
 Dual thresholds can be added to standard design flows
– High-VTh and Low-VTh libraries are a standard in sub-0.18m
processes
– For example: can synthesize using only high-VTh and then only
in-place swap in low-VTh cells to improve timing.
– Second VTh insertion can be combined with resizing
 Only two thresholds are needed per block
– Using more than two yields small improvements
Low Power Design Essentials ©2008
4.51
Three VTH’s
1.41.4
1.3
1
1.21.2
1.1
0.6
11
0.4
0.2
Vth1 (V)
0.8
VTH.2 (V)
Leakage Reduction Ratio
© IEEE 2002
0.9
0.80.8
0.7
0
1.5
0.60.6
0.5
1
1
0.5
0 0
1.5
0.5
0.40.4
+
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9 1
1 1.1
Vth2 (V)
1.2
1.2
1.3
1.4
1.4
VTH.3 (V)
VDD = 1.5V, VTH.1 = 0.3V
Impact of third threshold very limited
Low Power Design Essentials ©2008
[Ref: T. Kuroda, ICCAD’02]
4.52
Using Multiple Thresholds
 Cell-by-cell VTH assignment (not at block level)
 Achieves all-low-VTH performance with substantial
leakage reduction in leakage
FF
FF
FF
FF
FF
High VTH
Low Power Design Essentials ©2008
Low VTH
[Ref: S. Date, SLPE’94]
4.53
Dual-VT Domino
Low-threshold transistors used only in critical paths
Inv3
Inv2
Clkn+1
Clkn
P1
Dn+1
Dn
…
Inv1
Shaded transistors are
low threshold
Low Power Design Essentials ©2008
4.54
Multiple Thresholds and Design Methodology
 Easily introduced in standard cell design
methodology by extending cell libraries with cells
with different thresholds
– Selection of cells during technology mapping
– No impact on dynamic power
– No interface issues (as was the case with multiple
VDD’s)
 Impact: Can reduce leakage power substantially
Low Power Design Essentials ©2008
4.55
Dual-VTH Design for High-Performance Design
High-VTH
Only
Low-VTH
Only
Dual VTH
Total Slack
-53 psec
0 psec
0 psec
Dynamic
Power
3.2 mW
3.3 mW
3.2 mW
Static
Power
914 nW
3873 nW
1519 nW
All designs synthesized automatically using Synopsys Flows
Low Power Design Essentials ©2008
[Courtesy: Synopsys, Toshiba, 2004]
4.56
Example: High- vs. Low-Threshold Libraries
Leakage Power (nW)
8000
Selected combinational tests
130 nm CMOS
7000
6000
5000
LVth
LVth+HVth
HVth
HVth+LVth
4000
3000
2000
1000
0
i10
Low Power Design Essentials ©2008
des
C7552
seq
pair
AVER
[Courtesy: Synopsys 2004]
4.57
Complex Gates Increase Ion/Ioff Ratio
140
3
(90nm technology)
(90nm technology)
120
2.5
100
Ioff (nA)
Ion (A)
No stack
2
1.5
80
60
No stack
1
40
Stack
0.5
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
VDD (V)
Stack
20
1
0
0
0.1 0.2
0.3 0.4
0.5 0.6 0.7
0.8 0.9
1
VDD (V)
 Ion and Ioff of single NMOS versus stack of 10 NMOS
transistors
 Transistors in stack are sized up to give similar drive
Low Power Design Essentials ©2008
4.58
Complex Gates Increase Ion/Ioff Ratio
3.5
x 105
(90nm technology)
3
Ion/Ioff ratio
2.5
Stack
2
Factor 10!
1.5
1
No stack
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VDD (V)
Stacking transistors suppresses submicron effects
 Reduced velocity saturation
 Reduced DIBL effect
 Allows for operation at lower thresholds
Low Power Design Essentials ©2008
4.59
Complex Gates Increase Ion/Ioff Ratio
 Example: 4-input NAND
versus
Fan-in (4)
Fan-in (2)
With transistors sized for
similar performance:
Leakage of Fan-in(2) =
Leakage of Fan-in(4) x 3
(Averaged over all possible
input patterns)
Leakage Current (nA)
14
12
10
8
Fan-in (2)
6
4
2
0
Fan-in (4)
2
4
6
8
10
12
14
16
Input pattern
Low Power Design Essentials ©2008
4.60
Example: 32 bit Kogge-Stone Adder
factor 18
% of input vectors
© Springer 2001
Standby leakage current (A)
Reducing the threshold by 150 mV increases leakage of
single NMOS transistor by factor 60
Low Power Design Essentials ©2008
[Ref: S.Narendra, ISLPED’01]
4.61
Summary
 Circuit optimization can lead to substantial
energy reduction at limited performance loss
 Energy-delay plots the perfect mechanisms
for analyzing energy-delay trade-off’s.
 Well-defined optimization problem over W,
VDD and VTH parameters
 Increasingly better support by today’s CAD
flows
 Observe: leakage is not necessarily bad – if
appropriately managed.
Low Power Design Essentials ©2008
4.62
References
Books:





A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer
Academic Publishers, 1st Ed, 1995.
D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002.
D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007.
J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed,
Prentice Hall 2003.
I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, MorganKaufmann, 1st Ed, 1999.
Articles:






R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power
Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002.
S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology
with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power
Electronics, San Diego, CA, pp. 90-91, Oct. 1994.
M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE
Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.
F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low
Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.
P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of
Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp. 786-793, Aug 1973.
T. Kuroda, “Optimization and control of VDD and VTH for low-power, high-speed CMOS design,”
Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.
Low Power Design Essentials ©2008
4.63
References
Articles (cont.):









H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J.
Solid-State Circuits, vol. SC-10, no. 2, pp. 106-109, Apr. 1975.
S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS
Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994.
D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True EnergyPerformance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39,
no. 8, pp. 1282-1293, Aug. 2004.
MathWorks, http://www.mathworks.com
S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its
applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp.
195-200, Aug. 2001.
T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS
Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2,
pp. 584-594, Apr. 1990.
Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf.
Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.
V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs
in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European SolidState Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.
M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable
supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37,
Feb. 1998.
Low Power Design Essentials ©2008
4.64

similar documents