1-of-N - USC Asynchronous CAD/VLSI Group

Report
Observability Conditions and Automatic OperandIsolation in High-Throughput Asynchronous Pipelines
Arash Saifhashemi
Peter A. Beerel
University of Southern California
USC Asynchronous CAD/VLSI Group (async.usc.edu)
(Thanks to a grant from Intel and NSF)
Patmos 2012, Sep 2012, Newcastle upon Tyne
Asynchronous Circuit Design - Today
Applications
• 3D Network on chips (STMicroelectronics)
• Ethernet Switches (Intel SRD)
• Ultra high-speed FPGAs (Achronix)
STMicroelectronics WIOMING
3D-IC (July 2012)
• Process variation
• Low-power chip design (Encryption – Tiempo, …)
Basic challenges: Automation
Achronix FPGA.
1.7 M LUTs. 2.1
Gbps IO
Tiempo TAM16 Clockless 16-bit
microcontroller
Proteus design flow (USC)
• Uses commercial synchronous CAD tools
• Starting at a high-level specification written in SVC
(SystemVerilogCSP)
Fulcrum Microsystems Ethernet switch chip
(up to 72 10G ports, 40G) - 1.2 B transistors,
90% Asynchronous 13% Proteus
The Proteus Flow
SystemVerilog
Key Features
Design
Verilog
SVC2RTL
Goals
• Re-uses synchronous EDA tools
• Seamless integration into existing flows
Synth. RTL
• Up to 2X higher performance
Synthesis
Tool Status
Image
Netlist Netlist
• Started at USC Async CAD/VLSI
• Commercialized by TimeLess (2008)
•
Acquired by Fulcrum (2010)
•
Intel Acquired Fulcrum (2011)
Constraints
Proteus/
Sync
Sync
Library
Library
Constraints
ClockGating
Gating
Clock
Netlist
ClockFree
Constraints
Clock Tree Synthesis
Async
NetlistNetlist
Constraints
• Used in Intel Ethernet Alta FM6000 chip
The Problem
•
Limited and manual power optimization
Physical Design
6
Final
Layout
Conditional Communication in Proteus
Dummy value
0
0
Not received
Not sent
1
0
1
Example: ALU
SVC Description
No conditionality in high-level description
Reconverging fanouts
+
Unnecessary calculation
Adding Isolation Cells
• All inputs/outputs are
unconditional
• Operand Isolation
• And-based isolation cells
• Generated by synchronous
RTL synthesizer
• Does not prevent switching
in asynchronous circuits
Isolation cells are not effective in asynchronous circuits
Three-valued logic
• Formal justification of conditioning
• Three-valued logic image model
• Each iteration is modeled by a clock cycle
• Each variable can be 0, 1, or N (no token)
One iteration
Status of each channel
3VL Unconditional Functions
Unconditional functions
• Can be represented only by
,
,
operators
• Example: functions represented
by combinational gates in a
typical cell library: NAND, NOR,
AOI, XOR, …
Lemma 1: the output is N iff at least one of the inputs is N.
SEND/RECEIVE Operators
• Conditional Communication
• RECEIVE and SEND are modeled as Ⓡ and Ⓢ operators
Behave like buffers
when E=1
SEND Reconditioning
Assuming y=f(x) is unconditional and e
TFO(y)
Lemma 2:
Application: SEND cells can be moved through logic
• Similar to retiming in synchronous circuits
Less number of SENDs
Less switching when e=0
Observability in 3V Networks
Local Observability Partial Care (LOPC)
• OPC(f,C,xj) of input xj of a node representing a function f is the condition under
which f’s output is not affected as xj changes in C
{0,1,N}
Global Observability Partial Care (GOPC)
• GOPC(C,x) of a variable x is the condition under which the value of no primary
output is affected as the value of x changes in C
 , , 
implies
{0,1,N}
 , 
s =1
• Example:
 , 0,1 , 1 = 
i1 changes in {0,1} are not
observable when…
i2 =0 or i2 =1
1
2
0,1
GOPC Conditioning
When xj is not observable…
• Add a SEND followed by a RECEIVE
• Move the SENDs using SEND reconditioning
Lemma 3:
 
0
→  0,1 , 1 ℎ:   =   Ⓢ Ⓡ
SEND Reconditioning
N
N
N
N
N
0 or 1
0
1
Conditioning
&
+
+
0
0
No Activity
Inserting Isolating Nodes and
Recognizing Enable Domains
Synchronous synthesis tools can insert isolating nodes
• Constrained to insert isolating nodes only on non-critical paths
Node u is in e’s Enable Domain OIED(e) if
• All paths starting from a primary input and ending at u include an
isolating node controlled by e
• Detected using a DFS search
Pre-layout Analysis
• Wu : power of receiving data on all inputs and sending the
output (unconditional nodes)
• K: power of conditional nodes
• rf: activity factor
Domain power after isolation (n inputs)
Benefit of isolating each domain
Total power
Power of each
domain
Post-layout Experimental Results
• Case study: 32-bit ALU placed and routed
• Back annotated switching activity using a VCD file
• Results:
• Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2
• 53% power reduction when only isolating MUL (rf=0.25)
• Area cost of isolating MUL is about 4% and no performance penalty
Conclusions and Future Work
Conditional communication in async. circuits is not free
• Creates area and performance overheads
• Requires manual or automatic optimization
Asynchronous circuits can/should leverage sync. tools
• This paper is first to use 3-valued-logic and observability don’t cares for
power optimization of asynchronous circuits
Our future work
• Evaluate the proposed method on bigger designs
• Adopt other sync power optimization techniques such as clock gating
• Optimize the location of SEND/RECEIVE nodes (Reconditioning)

similar documents