Low-power Clock Trees for CPUs Dong-Jin Lee, Myung-Chul Kim and Igor L. Markov EECS Department, University of Michigan ICCAD 2010 Outline Introduction Preliminaries Terminologies Algorithms Initial tree construction and buffer insertion. Robustness improvements. Skew optimizations. Experimental results and Conclusions Introduction Advanced technology nodes result in tight skew limitations for its PVT (process, voltage, temperature) variation. Clock network and sequential element consume up to 70% of CPU power. Clock tree is more susceptible to variations than mesh while more power efficient. Introduction (cont.) Previous clock tree designing method rely on symmetric and regular tree topologies. These methods are hard to dealing with obstacles, non-uniform sink distributions, and varied sink capacitances. Commercial tools handle the capacitance well, but the skew doesn’t. Contributions A tree solution with better result than those mesh or mixed solutions. The notion of local-skew slack. A tabular technique to estimate the impact of variations on skew between two sinks. A path-based technique to enhance the robustness of a clock tree to PVT variations. A time-budgeting algorithm and some tuning techniques for tuning the clock tree. Preliminaries Clock network in microprocessors Preliminaries (cont.) Clock tree construction. H-tree, spine DME + (MMM or GMA) Others Some terms and definitions. Definition of global skew Let Ψ be the clock tree, Si be a sink of tree Skew between two sinks: Global skew: where λ(Si) is the clock latency of Si. Nominal value: the value neglecting variations. Definition of local skew Limit the distance of sink pairs. Two sinks within a given distance bound Local skew: The worst local skew with variation Let △>0 be the local skew distance bound, νbe the variation model, y be the target yield rate, and f(t) be the CDF of local skew with variations. Local skew slack Global skew slack of a sink Local skew slack of a sink The minimum amount of additional delay for a sink to ensure the clock tree satisfies the local skew constraint With additional distance bound Local skew slack of a tree edge The minimum one of its downstream sinks An example Considering variations Tabular method considering wirelength, # buffers, largest type of buffer. Once it is built, it is reusable for another tree The local skew considering variations Impact of variations on local skew Flow Algorithms Initial tree construction and buffer insertion. Robustness improvements. Skew optimizations. Wire snaking Delay buffer insertion Initial tree construction and buffer insertion DME + fast variant of van Ginneken’s algorithm. Since total # buffer is not known, estimate it by the total length of wire, then compute the initial buffer type Do buffer insertion and get the exact # buffers so as the buffer type is more accurate Robustness improvements Perform after initial buffer insertion. Nominal local skew should be estimated. From the above theorem, the minimum size of buffer type is selected to reduce capacitance and thus all buffers between these two sinks will be replaced by the selected one. Wire snaking Ttarget – the amount of time should be delayed to achieve legal local skew. Lsn – snaking length determined by snaking algorithm. Tactual – actual delay resulted from Lsn. Lideal – the length satisfies Ttarget = Tactual. Construct a look-up table for Lsn(e) estimation, then takes iterations to make it legal. Wire snaking (cont.) Wire snaking at buffer output nodes is much more accurate than at branch. If the worst slew rate is more than 70% of the given slew limit, the target node is excluded from wore snaking. Delay buffer insertion Highly unbalanced sink capacitances or layout obstacles in a sink clusters can result in significant local skew - an alternative technique is needed because wire snaking is inapplicable. Do delay buffer insertion when Delay buffer insertion (cont.) For each path from the buffer to the sinks, inserting at most one buffer is sufficient since the wire snaking algorithm can be invoked again. Experimental results Cantango 2.0 in C++. Slew limitation – 100ps Target yield – 95% 12hr runtime limit. Contest benchmarks Comparison in results About 4x less capacitances in average. Fit the local skew constraint in all cases. Fit the target yield rate in all cases. PDF for worst local skew comparison Trade-off between capacitance and variational skew Tight local skew leads to usage of large buffer, and hence increasing the capacitance. Loose one leads to capacitance saving. Conclusions Proposes a tree solution for CPU clock routing that improves power consumption under tight skew constraint with variations. Local skew slack. A model for variational skew, a path-based technique to enhance robustness, a new time-budgeting algorithm for clock-tree tuning and accurate optimizations that satisfy budgets. Very attractive results.