### Low-power Clock Trees for CPUs

Low-power Clock Trees for CPUs
Dong-Jin Lee, Myung-Chul Kim and Igor L.
Markov
EECS Department, University of Michigan
Outline




Introduction
Preliminaries
Terminologies
Algorithms




Initial tree construction and buffer insertion.
Robustness improvements.
Skew optimizations.
Experimental results and Conclusions
Introduction



tight skew limitations for its PVT
(process, voltage, temperature)
variation.
Clock network and sequential element
consume up to 70% of CPU power.
Clock tree is more susceptible to
variations than mesh while more
power efficient.
Introduction (cont.)



Previous clock tree designing method
rely on symmetric and regular tree
topologies.
These methods are hard to dealing
with obstacles, non-uniform sink
distributions, and varied sink
capacitances.
Commercial tools handle the
capacitance well, but the skew doesn’t.
Contributions





A tree solution with better result than those
mesh or mixed solutions.
The notion of local-skew slack.
A tabular technique to estimate the impact of
variations on skew between two sinks.
A path-based technique to enhance the
robustness of a clock tree to PVT variations.
A time-budgeting algorithm and some tuning
techniques for tuning the clock tree.
Preliminaries

Clock network in microprocessors
Preliminaries (cont.)

Clock tree construction.




H-tree, spine
DME + (MMM or GMA)
Others
Some terms and definitions.
Definition of global skew

Let Ψ be the clock tree, Si be a sink of tree

Skew between two sinks:

Global skew:

where λ(Si) is the clock latency of Si.
Nominal value: the value neglecting
variations.

Definition of local skew

Limit the distance of sink pairs.


Two sinks within a given distance bound
Local skew:
The worst local skew with
variation

Let △>0 be the local skew distance
bound, νbe the variation model, y be
the target yield rate, and f(t) be the
CDF of local skew with variations.
Local skew slack

Global skew slack of a sink


Local skew slack of a sink


The minimum amount of additional delay
for a sink to ensure the clock tree
satisfies the local skew constraint
Local skew slack of a tree edge

The minimum one of its downstream
sinks
An example
Considering variations

Tabular method considering wirelength,
# buffers, largest type of buffer.


Once it is built, it is reusable for another
tree
The local skew considering variations
Impact of variations on local
skew
Flow
Algorithms



Initial tree construction and buffer
insertion.
Robustness improvements.
Skew optimizations.


Wire snaking
Delay buffer insertion
Initial tree construction and buffer
insertion

DME + fast variant of van Ginneken’s
algorithm.


Since total # buffer is not known, estimate
it by the total length of wire, then
compute the initial buffer type
Do buffer insertion and get the exact #
buffers so as the buffer type is more
accurate
Robustness improvements



Perform after initial buffer insertion.
Nominal local skew should be estimated.
From the above theorem, the minimum size
of buffer type is selected to reduce
capacitance and thus all buffers between
these two sinks will be replaced by the
selected one.
Wire snaking





Ttarget – the amount of time should be delayed
to achieve legal local skew.
Lsn – snaking length determined by snaking
algorithm.
Tactual – actual delay resulted from Lsn.
Lideal – the length satisfies Ttarget = Tactual.
Construct a look-up table for Lsn(e)
estimation, then takes iterations to make it
legal.
Wire snaking (cont.)


Wire snaking at buffer output nodes is much
more accurate than at branch.
If the worst slew rate is more than 70% of
the given slew limit, the target node is
excluded from wore snaking.
Delay buffer insertion


Highly unbalanced sink capacitances
or layout obstacles in a sink clusters
can result in significant local skew - an
alternative technique is needed
because wire snaking is inapplicable.
Do delay buffer insertion when
Delay buffer insertion (cont.)

For each path from the buffer to the
sinks, inserting at most one buffer is
sufficient since the wire snaking
algorithm can be invoked again.
Experimental results




Cantango 2.0 in C++.
Slew limitation – 100ps
Target yield – 95%
12hr runtime limit.
Contest benchmarks
Comparison in results



About 4x less capacitances in average.
Fit the local skew constraint in all cases.
Fit the target yield rate in all cases.
PDF for worst local skew comparison
and variational skew

Tight local skew leads to usage of large
buffer, and hence increasing the
capacitance saving.
Conclusions




Proposes a tree solution for CPU clock
routing that improves power consumption
under tight skew constraint with variations.
Local skew slack.
A model for variational skew, a path-based
technique to enhance robustness, a new
time-budgeting algorithm for clock-tree
tuning and accurate optimizations that
satisfy budgets.
Very attractive results.