### BuffOpt

```Incorporating Driver Sizing
Into Buffer Insertion Via
a Delay Penalty Technique
Chuck Alpert, IBM
Chris Chu, Iowa State
Milos Hrkic, UIC
Jiang Hu, IBM
Stephen Quay, IBM
Gopal Gandham, IBM
Chandramouli Kashyap, IBM
Which One is Not Like the Others?
2
Which One is Not Like the Others?
Buffer
insertion
Driver
sizing
BIWS
Wire
sizing
Steiner
tree
3
Why Simultaneous Optimization?
Electrically-challenged net
Driver sizing alone
Buffer insertion alone
Simultaneous optimization
4
Integrating Driver Sizing
Three
choices
5
Driver Sizing Affects Multiple Nets
slow stage
6
Slack (ns)
Upstream Capacitance Effects
# Nets Optimized
7
The Driver Sizing Penalty
Decoupling buffers
C2
C1
C1
Penalty is delay through fastest
decoupling buffer/inverter chain
8
Delay Penalty Algorithm

Continuous buffer library not realizable

Assume



set of discrete buffers B1, . . ., Bn such
that CB1< CB2< . . . < CBn
monotone function delay(Bi, C)
Apply dynamic programming
9
Example
Given optimal chains
B1
B2
Find optimal
chain for B5
CB5
B3
B4
10
Dynamic Programming Recurrence

To drive capacitance CBi , combine optimal
chain driving CBj with buffer Bj

D(CB1)=0

D(CBi)=min0<j<i{D(CBi) + Delay(Bj, CBi)}
11





O(n2) complexity
Compute once as a lookup table
Can handle inverters and slew
Virtually no CPU cost
Applicable for many approaches
12
Experiments


Five unoptimized circuits (73 – 303K cells)
Three approaches




VG (no driver sizing)
Max (driver sizing with no delay penalty)
DP (driver sizing with delay penalty)
Run on thousands of nets
13
Total Buffers Inserted
10000
9000
VG
Max
DP
8000
7000
6000
5000
4000
3000
2000
1000
0
ckt1
ckt2
ckt3
ckt4
ckt5
14
Number of Upsized drivers
20000
VG
Max
DP
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
ckt1
ckt2
ckt3
ckt4
ckt5
15
Total Area Percentage Increase
35
VG
Max
DP
30
25
20
15
10
5
0
ckt1
ckt2
ckt3
ckt4
ckt5
16
Worst Slack
0
-1000
-2000
-3000
-4000
VG
Max
DP
-5000
-6000
-7000
-8000
ckt1
ckt2
ckt3
ckt4
ckt5
17
And So . . .





Simple to combine buffer insertion with
driver sizing
Virtually no CPU impact
Extends to many buffer insertion
approaches
No timing graph queries
Works well
18
```