### Chapter 4: Circuit Level Optimiation at Design Time

```Optimizing Power @ Design Time
Circuits
Jan M. Rabaey
Dejan Marković
Borivoje Nikolić
Chapter 4
Chapter Outline
 Optimization framework for energy-delay trade-off
 Dynamic power optimization
– Multiple supply voltages
– Transistor sizing
– Technology mapping
 Static power optimization
– Multiple thresholds
– Transistor stacking
4.2
Energy/Power Optimization Strategy
 For given function and activity, an optimal operation
point can be derived in the energy-performance space
 Time of optimization depends upon activity profile
 Different optimizations apply to active and static power
Fixed
Activity
Variable
Activity
No Activity
- Standby
Design time
Run time
Sleep
Active
Static
4.3
Energy/op
Unoptimized
design
Emax
Emin
Dmin
Dmax Delay
Maximize throughput for given energy or
Minimize energy for given throughput
Other important metrics: Area, Reliability, Reusability
4.4
The Design Abstraction Stack
A very rich set of design parameters to consider!
It helps to consider options in relation to their
abstraction layer
System/Application
This Chapter
Software
Choice of algorithm
Amount of concurrency
(Micro-)Architecture
Parallel versus pipelined,
general purpose versus
application specific
Logic/RT
logic family, standard cell
versus custom
Circuit
sizing, supply, thresholds
Device
Bulk versus SOI
4.5
Optimization Can/Must Span Multiple Levels
Architecture
Micro-Architecture
Circuit (Logic & FFs)
Design optimization combines top-down and bottom-up:
“meet-in-the-middle”
4.6
topology A
topology B
Delay
Energy/op
Energy/op
Energy-Delay Optimization
topology A
topology B
Delay
Globally optimal energy-delay curve for a
given function
4.7
Some Optimization Observations
Energy
∂E / ∂A
SA=
∂D / ∂A
SA
A=A0
(A0,B0)
SB
f (A,B0)
f (A0,B)
D0
Delay
Energy-Delay Sensitivities
[Ref: V. Stojanovic, ESSCIRC’02]
4.8
Finding the Optimal Energy-Delay Curve
Pareto-optimal:
the best that can be achieved without disadvantaging at least one metric.
f (A1,B)
Energy
∆E = SA∙(∆D) + SB∙∆D
(A0,B0)
f (A,B0)
∆D
D0
f (A0,B)
Delay
On the optimal curve, all sensitivities must be equal
4.9
Reducing Active Energy @ Design Time
Eactive ~ a  CL Vswing VDD
Pactive ~ a  CL Vswing VDD  f
 Reducing voltages
– Lowering the supply voltage (VDD) at the expense of clock
speed
– Lowering the logic swing (Vswing)
 Reducing transistor sizes (CL)
– Slows down logic
 Reducing activity (a)
– Reducing switching activity through transformations
– Reducing glitching by balancing logic
4.10
Observation
 Downsizing and/or lowering the supply on the critical path
lowers the operating frequency
target
delay
tp (path)
# of paths
# of paths
– Narrows down the path delay distribution
– Increases impact of variations, impacts robustness
target
delay
tp (path)
4.11
Circuit Optimization Framework
Energy (VDD, VTH, W)
Delay (VDD, VTH, W) ≤ Dcon
Constraints
VDDmin < VDD < VDDmax
VTHmin < VTH < VTHmax
Wmin < W
 Reference case
Energy/op
minimize
subject to
topology A
topology B
– Dmin sizing @ VDDmax, VTHref
[Ref: V. Stojanovic, ESSCIRC’02]
Delay
4.12
Optimization Framework: Generic Network
Ci
VDD,i
VDD,i+1
i
i+1
gCi
Cw
Ci+1
Gate in stage i loaded by fanout (stage i+1)
4.13
Alpha-power based Delay Model
K dVDD
gC i Cw  Ci 1
1 Ci1
t p
(
)   nom (1  
)
gCi
g Ci
(VDD  Von )
Fit parameters: Von, ad, Kd, g
4
60
simulation
model
simulation
model
50
3
Von = 0.37 V
a d = 1.53
2.5
2
1.5
Delay (ps)
FO4 delay (norm.)
3.5
nom = 6 ps
g = 1.35
40
30
20
1
0.5
0
tp
10
(90nm technology)
0.5
0.6
0.7 0.8 0.9
ref
VDD / VDD
1
0
0
2
4
6
8
10
Fanout (Ci+1/Ci)
VDDref = 1.2V, technology 90 nm
4.14
Combined with Logical Effort Formulation
For Complex Gates
t p   nom ( pi 
fi gi
g
)
 Parasitic delay pi – depends upon gate topology
 Electrical effort fi ≈ Si+1/Si
 Logical effort gi – depends upon gate topology
 Effective fanout hi = figi
[Ref: I. Sutherland, Morgan-Kaufman’99]
4.15
Dynamic Energy
Edyn  (gCi  Cw  Ci 1 ) VDD ,i  Ci (g  f i) VDD ,i
2
f i  (Cw  Ci 1 ) / Ci  Si1 / S i
Ci  K e Si
Ci
2
VDD,i
VDD,i+1
i
i+1
gCi
Cw
Ei  Ke Si (V
2
DD ,i 1
Ci+1
 gVDD ,i )
2
= energy consumed by logic gate i
4.16
Optimizating Return on Investment (ROI)
Depends on Sensitivity (E/D)
 Gate Sizing
E
D
Si
Si
Ei

 nom (hi  hi 1 )
 for equal h
(Dmin)
 Supply Voltage
E
D
VDD
VDD
Von
2  (1 
)
E
VDD
 
D a  1  Von
d
VDD
max at VDD(max)
(Dmin)
4.17
Example: Inverter Chain
 Properties of inverter chain
– Single path topology
– Energy increases geometrically from input to output
1
S1 = 1
S2
S3
…
SN
CL
 Goal
– Find optimal sizing S = [S1, S2, …, SN], supply voltage, and
buffering strategy to achieve the best energy-delay tradeoff
4.18
Inverter Chain: Gate Sizing
effective fanout, h
25
nom
opt
20
d
inc
= 50%
30%
15
10%
10
1%
5
0%
0
1
2
3
4 5
stage
6
7
S i 1  S i 1
Si 
1  S i 1
2
2  K e  VDD

 nom  FS
Ei
FS 
hi  hi 1
2
[Ref: Ma, JSSC’94]
 Variable taper achieves minimum energy
 Reduce number of stages at large dinc
4.19
Inverter Chain: VDD Optimization
0%
V
DD
/ V DD
nom
1.0
1%
0.8
10%
0.6
30%
0.4
d
= 50%
nom
opt
0.2
0
inc
1
2
3
4
5
stage
6
7
 VDD reduces energy of the final load first
 Variable taper achieved by voltage scaling
4.20
Inverter Chain: Optimization Results
100
0.8
energy reduction (%)
Sensitivity (norm)
1.0
S
gVDD
2VDD
cVDD
0.6
0.4
0.2
0
0
10
20 30
dinc (%)
40
50
80
60
40
20
0
0
10
20 30
dinc (%)
40
50
 Parameter with the largest sensitivity has the largest
potential for energy reduction
 Two discrete supplies mimic per-stage VDD
4.21
(A15, B15)
S15
– Long wires
– Re-convergent paths
– Multiple active outputs
(A0, B0)
S0
Cin
[Ref: P. Kogge, Trans. Comp’73]
4.22
Tree Adder: Sizing vs. Dual-VDD Optimization
 Reference design: all paths are critical
reference
D=Dmin
sizing: E (-54%)
dinc=10%
2Vdd: E (-27%)
dinc=10%
 Internal energy  S more effective than VDD
– S: E(-54%), 2Vdd: E(-27%) at dinc = 10%
4.23
1
Reference
VDD, VTH
0.8
S, VDD
S, VTH
0.6
S, VDD, VTH
0.4
0.2
0
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Delay / Dmin
 Can get pretty close to optimum with only 2 variables
 Getting the minimum speed or delay is very expensive
4.24
Multiple Supply Voltages
 Block-level supply assignment
– Higher throughput/lower latency functions are
implemented in higher VDD
– Slower functions are implemented with lower VDD
– This leads to so-called “voltage islands” with separate
supply grids
– Level conversion performed at block boundaries
 Multiple supplies inside a block
– Non-critical paths moved to lower supply voltage
– Level conversion within the block
– Physical design challenging
4.25
Using Three VDD’s
1
1.3
1.21.2
0.8
1.1
0.7
11
0.6
0.5
V2 (V)
0.9
V3 (V)
Power Reduction Ratio
1.41.4
0.9
0.80.8
0.7
0.4
1.5
+
0.60.6
0.5
1
0.5
0 0
0.5
1
1.5
0.40.4
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9 1
1
V1 (V)
1.1
1.2
1.2
1.3
1.4
1.4
V2 (V)
V1 = 1.5V, VTH = 0.3V
4.26
Optimum Number of VDD’s
{ V1, V2, V3 }
{ V1, V2 }
VDD Ratio
1.0
{ V1, V2, V3, V4 }
V2/V1
V2/V1
V2/V1
V3/V1
V3/V1
0.5
V4/V1
1.0
P Ratio
P2/P1
P3/P1
P4/P1
0.4
0.5
1.0
V1
1.5
(V)
0.5
1.5 0.5
1.0
V1
(V)
1.0
V1
1.5
(V)
 The more VDD’s the less power, but the effect saturates
 Power reduction effect decreases with scaling of VDD
 Optimum V2/V1 is around 0.7
4.27
Lessons: Multiple Supply Voltages
 Two supply voltages per block are optimal
 Optimal ratio between the supply voltages is 0.7
 Level conversion is performed on the voltage boundary,
using a level-converting flip-flop (LCFF)
 An option is to use an asynchronous level converter
– More sensitive to coupling and supply noise
4.28
Distributing Multiple Supply Voltages
Conventional
VDDH
i1
Shared N-well
VDDH
VDDL
VDDL
o1
i1
i2
o2
VSS
VDDH circuit
o1
i2
o2
VSS
VDDL circuit
VDDH circuit
VDDL circuit
4.29
Conventional
VDDL Row
N-well isolation
VDDH
VDDL
VDDH Row
VDDL Row
VDDH Row
(a) Dedicated row
VSS
VDDH circuit
VDDL circuit
VDDH
Region
VDDL
Region
(b) Dedicated region
4.30
Shared N-Well
Shared N-well
VDDL circuit
VDDH circuit
VDDH
VDDL
VSS
VDDH circuit
VDDL circuit
[Shimazaki et al, ISSCC’03]
(a) Floor plan image
4.31
Example: Multiple Supplies in a Block
Conventional Design
CVS Structure
FF
Level-Shifting F/F
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
FF
Critical Path
FF
FF
Critical Path
Lower VDD portion is shared
“Clustered voltage scaling”
[Ref: M. Takahashi, ISSCC’98]
4.32
Level Converting Flip-Flops (LCFFs)
level conversion
ck
ckb
level conversion
sf
mo
db
so
q
d
ck sf
so
d
q (inv.)
ck
mf
MN1 MN2
ckb
ck
ck
clk
clk
Master-Slave
Pulsed Half-Latch
Pulsed Half-Latch versus Master-Slave LCFFs
 Faster level conversion using half-latch structure
 Shorter D-Q path from pulsed circuit
[Ref: F. Ishihara, ISLPED’03]
4.33
Dynamic Realization of Pulsed LCFF
VDDH
xb
 Pulsed precharge
LCFF (PPR)
– Fast level conversion by
precharge mechanism
– Suppressed
charge/discharge toggle
by conditional capture
– Short D-Q path
clk
MN1
ckd1
MN2
VDDH
VDDH
IV1
x
MP1
q (inv.)
qb
clk
ckd1
d
level conversion
db
ck
qb
Pulsed Precharge Latch
[Ref: F. Ishihara, ISLPED’03]
4.34
Case Study: ALU for 64-bit Processor
clock gen.
clk
ain0
ain
9:1
MUX
5:1
MUX
9:1
MUX
2:1
MUX
carry
gp
gen.
INV2
bin
: VDDH circuit
: VDDL circuit
carry
gen.
partial
sum
sum
sum
sel.
INV1
s0/s1
0.5pF
logical
unit
sumb (long loop-back bus)
[Ref: Y. Shimazaki, ISSCC’03]
4.35
Low-Swing Bus and Level Converter
VDDH
pc
VDDL
VDDL
sumb
sum
INV1
keeper
sel
(VDDH)
VDDH
ain0
INV2
domino level converter (9:1 MUX)
 INV2 is placed near 9:1 MUX to increase noise immunity
 Level conversion is done by a domino 9:1 MUX
[Ref: Y. Shimazaki, ISSCC’03]
4.36
Measured Results: Energy and Delay
Energy [pJ]
800
Room temperature
700
600
500
400
300
200
0.6
1.16GHz
VDDL=1.4V
Energy:-25.3%
Delay :+2.8%
VDDL=1.2V
Energy:-33.3%
Delay :+8.3%
0.8
1.0 1.2
TCYCLE [ns]
1.4
Single-supply
Shared well
(VDDH=1.8V)
1.6
[Ref: Y. Shimazaki, ISSCC’03]
4.37
Practical Transistor Sizing
 Continuous sizing of transistors only an option in
custom design
 In ASIC design flows, options set by available
library
 Discrete sizing options made possible in
standard-cell design methodology by providing
multiple options for the same cell
– Leads to larger libraries (> 800 cells)
– Easily integrated into technology mapping
4.38
Technology Mapping
a
b
f
c
d
slack=1
Larger gates reduce capacitance, but are slower
4.39
Technology Mapping
Example: 4-input AND
 (a) Implemented using 4 input NAND + INV
 (b) Implemented using 2 input NAND + 2-input NOR
Gate
type
Library 1:
High-Speed
Library 2:
Low-Power
Area
(cell unit)
Input
cap. (fF)
Average delay
(ps)
Average delay
(ps)
INV
3
1.8
7.0 + 3.8 CL
12.0 + 6.0 CL
NAND2
4
2.0
10.3 + 5.3 CL
16.3 + 8.8 CL
NAND4
5
2.0
13.6 + 5.8 CL
22.7 + 10.2 CL
NOR2
3
2.2
10.7 + 5.4 CL
16.7 + 8.9 CL
(delay formula: CL in fF)
(numbers calibrated for 90 nm)
4.40
Technology Mapping – Example
4-input AND
(a) NAND4 +
INV
(b) NAND2 +
NOR2
Area
8
11
HS: Delay (ps)
31.0 + 3.8 CL
32.7 + 5.4 CL
LP: Delay (ps)
53.1 + 6.0 CL
52.4 + 8.9 CL
Sw Energy (fF)
0.1 + 0.06 CL
0.83 + 0.06 CL
 Area
– 4-input more compact than 2-input (2 gates vs. 3 gates)
 Timing
– both implementations are 2-stage realizations
– 2nd stage INV (a) is better driver than NOR2 (b)
– For more complex blocks, simpler gates will show better
performance
 Energy
– Internal switching increases energy in the 2-input case
– Low-power library has worse delay, but lower leakage (see later)
4.41
 Technology mapping
 Gate selection
 Sizing
 Pin assignment
 Logical Optimizations
 Factoring
 Restructuring
 Buffer insertion/deletion
 Don’t care optimization
4.42
Logic Restructuring
1
1
1
0
0
1
0
1
1
Logic restructuring to minimize spurious transitions
1
1
1
1
2
1
1
1
1
1
3
Buffer insertion for path balancing
4.43
Algebraic Transformations
Idea: Modify network to reduce capacitance
p1=0.05
a
b
a
c
p3=0.075
f
p5=0.075
a
f
b
c
p2=0.05
p4=0.75
pa = 0.1; pb = 0.5; pc = 0.5
Caveat: This may increase activity!
4.44
Lessons from Circuit Optimization
 Joint optimization over multiple design parameters
possible using sensitivity-based optimization framework
– Equal marginal costs ⇔ Energy-efficient design
 Peak performance is VERY power inefficient
– About 70% energy reduction for 20% delay penalty
– Additional variables for higher energy-efficiency
 Two supply voltages in general sufficient; 3 or more
supply voltages only offer small advantage
 Choice between sizing and supply voltage parameters
depends upon circuit topology
 But … leakage not considered so far
4.45
Considering Leakage @ Design Time
 Considering leakage as well as dynamic
power is essential in sub-100 nm
technologies
 Leakage is not essentially a bad thing
– Increased leakage leads to improved
performance, allowing for lower supply voltages
– Again a trade-off issue …
4.46
Leakage – Not Necessarily a Bad Thing
1
Version 1
Vref
-180mV
th
0.8
 ELk
max
E norm
0.81VDD
ESw opt 
0.6
Version 2
0.4
Topology
 Ld
ln 
 a avg


K


(ELk/ESw)opt 0.8
Vref
-140mV
th
0.2
2
0.5
0.2
max
0.52VDD
0 -2
10
-1
0
10
10
Estatic /Edynamic
1
10
Optimal designs have high leakage (ELk/ESw ≈ 0.5)
Must adapt to process and activity variations
[Ref: D. Markovic, JSSC’04]
4.47
Refining the Optimization Model
 Switching energy
Edyn  a 01Ke S (g  f )VDD
2
 Leakage energy
Estat  SI 0 (Y )e
VTH  d VDD
kT / q
VDD Tcycle
with:
I0(Y): normalized leakage current with inputs in state Y
4.48
Reducing Leakage @ Design Time
 Using longer transistors
– Limited benefit
– Increase in active current
 Using higher thresholds
– Channel doping
– Stacked devices
– Body biasing
 Reducing the voltage!!
4.49
Longer Channels
1.0
10
90 nm CMOS
0.8
9
8
Leakage power
0.7
7
0.6
6
0.5
5
0.4
4
Switching energy
0.3
3
0.2
2
0.1
100
110
120
130
140
150
160
170
180
190
Normalized switching energy
Normalized leakage power
0.9
 10% longer gates
reduce leakage by
50%
 Increases switching
power by 18% with
W/L = const.
1
200
Transistor length (nm)
 Doubling L reduces leakage by 5x
 Impacts performance
– Attractive when don’t have to increase W (e.g. memory)
4.50
Using Multiple Thresholds
 There is no need for level conversion
 Dual thresholds can be added to standard design flows
– High-VTh and Low-VTh libraries are a standard in sub-0.18m
processes
– For example: can synthesize using only high-VTh and then only
in-place swap in low-VTh cells to improve timing.
– Second VTh insertion can be combined with resizing
 Only two thresholds are needed per block
– Using more than two yields small improvements
4.51
Three VTH’s
1.41.4
1.3
1
1.21.2
1.1
0.6
11
0.4
0.2
Vth1 (V)
0.8
VTH.2 (V)
Leakage Reduction Ratio
0.9
0.80.8
0.7
0
1.5
0.60.6
0.5
1
1
0.5
0 0
1.5
0.5
0.40.4
+
0.4
0.4
0.5
0.6
0.6
0.7
0.8
0.8
0.9 1
1 1.1
Vth2 (V)
1.2
1.2
1.3
1.4
1.4
VTH.3 (V)
VDD = 1.5V, VTH.1 = 0.3V
Impact of third threshold very limited
4.52
Using Multiple Thresholds
 Cell-by-cell VTH assignment (not at block level)
 Achieves all-low-VTH performance with substantial
leakage reduction in leakage
FF
FF
FF
FF
FF
High VTH
Low VTH
[Ref: S. Date, SLPE’94]
4.53
Dual-VT Domino
Low-threshold transistors used only in critical paths
Inv3
Inv2
Clkn+1
Clkn
P1
Dn+1
Dn
…
Inv1
low threshold
4.54
Multiple Thresholds and Design Methodology
 Easily introduced in standard cell design
methodology by extending cell libraries with cells
with different thresholds
– Selection of cells during technology mapping
– No impact on dynamic power
– No interface issues (as was the case with multiple
VDD’s)
 Impact: Can reduce leakage power substantially
4.55
Dual-VTH Design for High-Performance Design
High-VTH
Only
Low-VTH
Only
Dual VTH
Total Slack
-53 psec
0 psec
0 psec
Dynamic
Power
3.2 mW
3.3 mW
3.2 mW
Static
Power
914 nW
3873 nW
1519 nW
All designs synthesized automatically using Synopsys Flows
[Courtesy: Synopsys, Toshiba, 2004]
4.56
Example: High- vs. Low-Threshold Libraries
Leakage Power (nW)
8000
Selected combinational tests
130 nm CMOS
7000
6000
5000
LVth
LVth+HVth
HVth
HVth+LVth
4000
3000
2000
1000
0
i10
des
C7552
seq
pair
AVER
[Courtesy: Synopsys 2004]
4.57
Complex Gates Increase Ion/Ioff Ratio
140
3
(90nm technology)
(90nm technology)
120
2.5
100
Ioff (nA)
Ion (A)
No stack
2
1.5
80
60
No stack
1
40
Stack
0.5
0
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
VDD (V)
Stack
20
1
0
0
0.1 0.2
0.3 0.4
0.5 0.6 0.7
0.8 0.9
1
VDD (V)
 Ion and Ioff of single NMOS versus stack of 10 NMOS
transistors
 Transistors in stack are sized up to give similar drive
4.58
Complex Gates Increase Ion/Ioff Ratio
3.5
x 105
(90nm technology)
3
Ion/Ioff ratio
2.5
Stack
2
Factor 10!
1.5
1
No stack
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VDD (V)
Stacking transistors suppresses submicron effects
 Reduced velocity saturation
 Reduced DIBL effect
 Allows for operation at lower thresholds
4.59
Complex Gates Increase Ion/Ioff Ratio
 Example: 4-input NAND
versus
Fan-in (4)
Fan-in (2)
With transistors sized for
similar performance:
Leakage of Fan-in(2) =
Leakage of Fan-in(4) x 3
(Averaged over all possible
input patterns)
Leakage Current (nA)
14
12
10
8
Fan-in (2)
6
4
2
0
Fan-in (4)
2
4
6
8
10
12
14
16
Input pattern
4.60
factor 18
% of input vectors
Standby leakage current (A)
Reducing the threshold by 150 mV increases leakage of
single NMOS transistor by factor 60
[Ref: S.Narendra, ISLPED’01]
4.61
Summary
 Circuit optimization can lead to substantial
energy reduction at limited performance loss
 Energy-delay plots the perfect mechanisms
 Well-defined optimization problem over W,
VDD and VTH parameters
 Increasingly better support by today’s CAD
flows
 Observe: leakage is not necessarily bad – if
appropriately managed.
4.62
References
Books:





A. Bellaouar, M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer
D. Chinnery, K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002.
D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007.
J. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed,
Prentice Hall 2003.
I. Sutherland, B. Sproul, D. Harris, Logical Effort: Designing Fast CMOS Circuits, MorganKaufmann, 1st Ed, 1999.
Articles:






R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic, V. Stojanovic, “Methods for True Power
Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35-42, Nov. 2002.
S. Date, N. Shibata, S.Mutoh, and J. Yamada, "IV 30MHz Memory-Macrocell-Circuit Technology
with a 0.5urn Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power
Electronics, San Diego, CA, pp. 90-91, Oct. 1994.
M. Hamada, Y. Ootaguro, T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE
Custom Integrated Circuits Conf., (CICC), pp. 89-92, Sept. 2001.
F. Ishihara, F. Sheikh, B. Nikolic, “Level conversion for dual-supply systems,” Int. Conf. Low
Power Electronics and Design, (ISLPED), pp. 164-167, Aug. 2003.
P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of
Recurrence Equations,” IEEE Trans. Comput., vol. C-22, no. 8, pp. 786-793, Aug 1973.
T. Kuroda, “Optimization and control of VDD and VTH for low-power, high-speed CMOS design,”
Proceedings ICCAD 2002, pp. , San Jose, Nov. 2002.
4.63
References
Articles (cont.):









H.C. Lin and L.W. Linholm, “An Optimized Output Stage for MOS Integrated Circuits,” IEEE J.
Solid-State Circuits, vol. SC-10, no. 2, pp. 106-109, Apr. 1975.
S. Ma and P. Franzon, “Energy Control and Accurate Delay Estimation in the Design of CMOS
Buffers,” IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1150-1153, Sept. 1994.
D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Methods for True EnergyPerformance Optimization,” IEEE Journal of Solid-State Circuits, vol. 39,
no. 8, pp. 1282-1293, Aug. 2004.
MathWorks, http://www.mathworks.com
S. Narendra, S. Borkar, V. De, D. Antoniadis, A. Chandrakasan, “Scaling of stack effect and its
applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp.
195-200, Aug. 2001.
T. Sakurai and R. Newton, “Alpha-Power Law MOSFET Model and its Applications to CMOS
Inverter Delay and Other Formulas,” IEEE J. Solid-State Circuits, vol. 25, no. 2,
pp. 584-594, Apr. 1990.
Y. Shimazaki, R. Zlatanovici, B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf.
Solid-State Circuits, (ISSCC), pp. 104-105, Feb. 2003.
V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz, R.W. Brodersen, “Energy-Delay Tradeoffs
in Combinational Logic using Gate Sizing and Supply Voltage Optimization,” European SolidState Circuits Conf., (ESSCIRC), pp. 211-214, Sept. 2002.
M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable
supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36-37,
Feb. 1998.
4.64
```