LMS: A New Logic Synthesis Method Based on Pre

Report
Lazy Man’s Logic Synthesis
Wenlong Yang
Lingli Wang
State Key Lab of ASIC and System
Fudan University, Shanghai, China
Alan Mishchenko
Department of EECS
University of California, Berkeley
1





Introduction
Previous Work
Lazy Man’s Logic Synthesis(LMS)
Experimental Results
Conclusion & Future Work
2

Goal of logic synthesis: Deriving a circuit or improving an available circuit

We proposed a “Lazy” approach to reuse optimal structures derived by
other synthesis tools based on a pre-computed library
Other tools
A Function with N
variables
AIG
LMS
precomputed
library
3





Introduction
Previous Work
Lazy Man’s Logic Synthesis(LMS)
Experimental Results
Conclusion
4

•
•
•
•
Logic synthesis based on precomputed library have been
proposed in several papers, but they are all different from
LMS:
Previous work
Precompute structures in terms of LUTs
[Kennings, IWLS, 2010 ]
Didn't use preexisting benchmarks or tools
[Bjesse, ICCAD , 2004]
Look at only 4-5 input functions
[Li, IWLS, 2011]
Only compute multiple structure choices
[Chatterjee, TCAD, 2006]
LMS
• Precompute structures in terms of AIGs
• Use public benchmarks and existing tools
• Look at 6-16 input functions
• Store many equivalent structures
5
•
•
•
•
For each node
Compute several k-input cuts
Perform delay-optimal tree balancing of the SOP
Select the best one to replace the current structure.
F = !c*!b + !c*a
An AIG subgraph found in benchmark s27.blif
where SOP balancing loses to the proposed approach
F’ = !c*!(b*!a)
6



Introduction
Previous Work
Lazy Man’s Logic Synthesis(LMS)
 Equivalence Classes
 Library Representation/Construction
 Implementation


Experimental Results
Conclusion
7

LMS is based on collecting, storing, and re-using circuit structures
of Boolean functions with 6-16 input variables.

The total number of completely-specified Boolean functions of N
variables is 2^(2^N).

Experiments shows that even for the practical functions, this
number can be very large.
To reduce the number and memory need to store functions in a library,
a canonical form is used to break them into Equivalence Classes.
8

Two functions are NPN-equivalent if one of them
can be obtained from the other by negation and/or
permutation of the inputs and outputs.
Drawbacks of NPN computation:
• Time-consuming
• Complicated
Complete NPN canonical form is not affordable to LMS
9

The idea is to order the input variables and the polarities of inputs/outputs
using the number of positive minterms and cofactors w.r.t. each variable.
Input: TruthTable F
1. Determine the polarity of F by the number of 1’s in TruthTable
2. Determine the polarity of each variable by the number of 1s in
the negative cofactor w.r.t. each variable
3. Sort input variables by the number of 1s in their negative
cofactors and permute inputs accordingly
Output: canonicized TruthTable F
A reasonable trade-off between accuracy and speed
10

An N-input library contains functions up to N variables.

Structures of all functions are represented as a shared AIG

Each output of the AIG is the root node of one logic structure.

When a library is loaded, the following actions are performed:
 A hash table is created to hash the outputs by its semi-canonical form.
 For each structure, the area and pin-to-output delays are computed and
stored.
11
Suppose arrival time:{3,
g:1
+
Pin-to-output delay:{3,
c:3
3, 3, 5, 5, 4, 1}
=
a:3
b:3
2, 4, 5, 2, 3, 1}
f:4
d:5
e:5
{6, 5, 7, 10, 7, 7, 2}
Example of using pin-to-output delays
to compute structure delay
If one structure’s pin-to-output delay is worse than another with
respect to every input, the structure is dominated.
12

LUT mapper if in ABC is used as a structural cut browser to generate Kinput cuts whose logic structures are added to the library.
Input: Cut C
1. If cut C does not meet the requirements
return
2. Compute Boolean function F of cut C as a truthtable
3. Compute the semi-canonical form of F
4.
5.
6.
Rebuild the structure of the cut in the library
If ( the structure already exists or is dominated )
return
Add a new primary output to store the structure in the hash table
13
Input: And-Inverter Graph
 For each node, in a topological order
 Compute several K-input cuts
 For each cut
▪ Compute truth table
▪ Look up in the library
▪ If there is no structure for this function
 Mark the cut to ensure it is not selected as best cut
▪ Else if the best structure found leads to smaller AIG level
 Save the cut as the best cut
 If there is an improvement in level, update AIG
14

The LMS algorithm is implemented in ABC. The LUT mapper if in ABC is used as:
 (a) A cut browser for computing the libraries
 (b) A mapper in the case study on AIG level minimization
Commands related to library construction:
rec_start: Starts the LMS recorder.
rec_add: Add structures from benchmarks
rec_filter: Removes the structures with less frequency
rec_merge: Merges two previously computed libraries
rec_ps:
Prints statistics for the currently loaded library
rec_use: Transforms the internal library to the current network in ABC
rec_stop: Deletes the current library.
Commands used to perform LMS mapping:
if –y –K <num> -C<num>
•
•
•
-y enables level optimization by LMS
-K <num> is the cut size
-C <num> is the number of cuts used at each node
15




Introduction
Previous Work
Lazy Man’s Logic Synthesis(LMS)
Experimental Results
 Library Coverage
 6-input Library
 Optimize Delay After LUT Mapping

Conclusion
16

This experiment was performed to show that LMS has practical memory
requirements for functions up to 12 inputs.

Semi-canonical classes of all functions appearing in the cuts of the
benchmark circuits without synthesis, were collected and the frequency
of their appearance was recorded.
1,500,000
1,000,000
500,000
Function #
2,000,000
• ~2 M classes in total
• ~740 K classes for 90% functions
• ~400MB for truth tables
0
25%
50%
75%
90%
95%
100%
occurrence frequency
17


The goal of this experiment is
to derive a 6-input library used
in the following case study of
AIG level minimization.
The following ABC scripts are
used to collect structures:
• read file; st; rec_add;
• dc2; rec_add;
• if -K 8; bidec; st; rec_add;
• if -K 8; mfs; st; rec_add;
• if -K 8; bidec; st; rec_add;
• if -g -K 6; st; rec_add;
• if -g -K 6; st; rec_add;
Statistics of the precomputed 6-input library
Inputs
2
3
4
5
6
Total
Classes # Structures #
3
3
32
88
2,430
12,673
98,208
471,973
1,148,556 5,202,924
1,249,229 5,687,661
Ratio
1.00
2.75
5.22
4.81
4.53
4.55
• ~77MB AIGER file
18




Two sets of benchmarks are used in this paper: 20 MCNC benchmarks
and 10 large Altera benchmarks.
LUT mapping was performed by the following scripts:
 Map:
st; resyn2; if -K 4 or 6
 MapC:
st; resyn2; dch -f; if -K 4 or 6
 SOPBC: st; if -gm -K 6; st; resyn2; dch -f; if -K 4 or 6
 LMSC:
st; if -ym -K 6; st; resyn2; dch -f; if -K 4 or 6
Benchmarks were run on a workstation with a Intel Xeon Quad Core CPU
and 256 GBytes RAM (~4GB used for the experiment)
The resulting networks were verified by command cec in ABC.
19
4-LUT count
4-LUT levels
Design
Map
MapC
SOPBC
LMSC
40
38856
39842
42092
42371
88
76
17902
17401
18538
18800
19
19
14
8995
9114
12221
11158
20
19
17
13
10967
10940
14590
14321
radar20_opt.blif
39
38
23
16
16834
17216
17717
20663
screen_saver_cyclone.blif
18
18
16
17
35627
35183
35614
35900
sudoku_check.blif
11
11
10
10
20998
20774
21094
21286
top_rs_decode.blif
43
43
31
24
31381
30729
30798
30926
umass_weather.blif
38
38
25
17
15821
15734
18250
18292
uoft_raytracer.blif
70
69
58
30
33294
33852
37118
40147
1.00
0.99
0.80
0.63
1.00
1.00
1.11
1.13
Map
MapC
SOPBC
LMSC
68
68
53
119
116
oc_video_compression_systems_dct_opt.blif
19
oc_video_compression_systems_jpeg_opt.blif
carpat.blif
fp_operators.blif
Ratio
LMSC reduced delay by 37% with an area increase of 13%
20
Design
6-LUT levels
6-LUT count
Map
MapC
SOPBC
LMSC
Map
MapC
SOPBC
LMSC
carpat.blif
35
35
35
27
29826
31098
32243
33321
fp_operators.blif
67
66
57
50
10541
11118
12005
11982
oc_video_compression_systems_dct_opt.blif
10
10
12
9
7349
7566
8816
8606
oc_video_compression_systems_jpeg_opt.blif
10
10
12
9
7796
7822
8365
9537
radar20_opt.blif
20
20
13
10
12351
12705
12871
14964
screen_saver_cyclone.blif
13
12
12
12
27129
27113
27503
27373
sudoku_check.blif
7
7
7
7
14542
14355
14707
15501
top_rs_decode.blif
24
24
20
16
21271
21324
21668
21615
umass_weather.blif
24
24
16
10
12196
11990
13287
14123
uoft_raytracer.blif
36
35
31
19
26128
26666
29802
31356
1.00
0.99
0.92
0.74
1.00
1.02
1.08
1.13
Ratio
LMSC reduced delay by 26% with an area increase of 13%
21
Design
alu4
apex2
b14
b15
b17
b20
b21
b22
clma
des
elliptic
ex5p
frisc
i10
pdc
s38584
s5378
seq
spla
tseng
Raito
Map
7
8
21
22
31
23
23
23
13
6
8
6
20
14
9
9
6
6
9
13
1.00
4-LUT level
MapC SOPBC
7
7
8
8
20
17
22
21
31
27
22
19
22
20
23
19
13
12
6
6
8
8
6
6
20
19
14
13
8
8
9
8
6
5
6
6
9
9
13
12
0.99
0.92
LMSC
7
8
17
21
26
19
19
19
12
6
8
6
16
12
8
8
5
6
8
10
0.90
Map
694
871
1761
3147
9676
3692
3768
5423
4016
1228
431
471
2279
746
1926
4021
459
946
1899
756
1.00
4-LUT count
MapC SOPBC
701
702
867
874
1771
1913
3103
3186
9507
9527
3587
3886
3612
3847
5280
5693
4008
4189
1257
1249
432
442
462
472
2261
2332
741
743
2047
1925
3978
3985
451
470
935
948
1803
1860
800
743
1.00
1.02
LMSC
714
890
1849
3233
9570
3829
3908
5729
4150
1273
443
481
2279
741
2075
3980
468
941
1928
809
1.03
Map
5
6
13
15
21
15
15
15
9
5
6
5
13
9
7
6
4
5
7
8
1.00
6-LUT level
MapC SOPBC
5
5
6
6
13
10
15
14
21
16
15
12
15
11
15
12
9
8
5
5
6
6
4
5
12
11
9
9
7
6
6
6
4
4
5
5
7
6
8
6
0.99
0.90
LMSC
5
6
11
13
16
12
12
11
8
4
6
4
9
9
7
6
4
5
6
6
0.88
Map
503
691
1275
2119
6510
2679
2701
3985
2975
824
317
351
1807
598
1428
2720
356
685
1414
648
1.00
6-LUT count
MapC SOPBC
525
520
683
728
1263
1517
2211
2255
6356
6667
2619
3070
2577
3114
3847
4638
2894
3145
862
866
317
327
382
378
1811
1883
608
575
1350
1619
2802
2816
355
369
668
707
1361
1445
694
689
1.00
1.07
LMSC
532
711
1442
2419
6670
3044
3115
4677
3246
953
333
408
1948
583
1416
2831
358
696
1455
731
1.08
4-LUTs: LMSC reduced delay by 10% with an area increase of 3%
6-LUTs: LMSC reduced delay by 12% with an area increase of 8%
22

A new method to harvest and re-use circuit structures
produced by different tools on benchmark circuits

The “lazy” approach is made practical by
 A semi-canonical form to reduce the number of equivalence classes
 Using AIGs to store precomputed libraries in memory and on disk
 Using truth tables to manipulate Boolean functions


As the case-study, the proposed approach was applied
to improve delay after FPGA mapping
For industrial benchmarks, compared to SOP balancing,
 the delay was reduced by 17% (18%) for LUT4 (LUT6)
 the area penalty was 2% (5%)
23

Improving implementation
 Reducing memory by using a low-memory AIG
 Building libraries in terms of multi-input gates
 Filtering libraries based on their performance
 Giving the user control over the area increase

Continuing experiments
 Performing case studies with larger functions
 Evaluating delay improvements after P&R
24
Authors' E-mail:

WenlongYang
[email protected]

Lingli Wang
[email protected]

Alan Mishchenko [email protected]
25
Deriving a circuit for a Boolean function or improving an
available circuit are typical tasks solved by logic synthesis.
Numerous algorithms in this area have been proposed and
implemented over the last 50 years. This paper presents a
"lazy” approach to logic synthesis based on the following
observations: (a) optimal or near-optimal circuits for many
practical functions are already derived by the tools, making it
unnecessary to implement new algorithms or even run the
old ones repeatedly; (b) larger circuits are composed of
smaller ones, which are often isomorphic up to a
permutation/negation of inputs/outputs. Experiments
confirm these observations. Moreover, a case-study shows
that logic level minimization using lazy man’s synthesis
improves delay after LUT mapping into 4- and 6-input LUTs,
compared to earlier work on high-effort delay optimization.

similar documents