On Abstraction Refinement for Program Analyses in Datalog

Report
On Abstraction Refinement for
Program Analyses in Datalog
Xin Zhang, Ravi Mangal, Mayur Naik
Georgia Tech
Radu Grigore, Hongseok Yang
Oxford University
Datalog for program analysis
Datalog
2
Programming Language Design and Implementation, 2014
6/10/2014
What is Datalog?
Datalog
3
Programming Language Design and Implementation, 2014
6/10/2014
What is Datalog?
Input relations: edge(i, j).
Output relations: path(i, j).
Rules: (1) path(i, i).
(2) path(i, k) :- path(i, j), edge(j, k).
Datalog
4
Least fixpoint computation:
Input: edge(0, 1), edge(1, 2).
path(0, 0).
path(1, 1).
path(2, 2).
path(0, 1) :- path(0, 0), edge(0, 1).
path(0, 2) :- path(0, 1), edge(1, 2).
Programming Language Design and Implementation, 2014
6/10/2014
Why Datalog?
If there exists a path from a to b, and
there is an edge from b to c, then there
exists a path from a to c:
path(a, c) :- path(a, b), edge(b, c).
5
Datalog
Programming Language Design and Implementation, 2014
6/10/2014
Why Datalog?
k-object-sensitivity,
k = 2, ~100KLOC
6
Programming Language Design and Implementation, 2014
6/10/2014
Limitation
k-object-sensitivity,
k = 2, ~100KLOC
7
k-object-sensitivity,
k = 10, ~500KLOC
Programming Language Design and Implementation, 2014
6/10/2014
Program abstraction
Abstraction
Precision
8
Scalability
Programming Language Design and Implementation, 2014
6/10/2014
Parametric program abstraction
1 1 1 1 1
Abstraction
Precision
9
Scalability
Programming Language Design and Implementation, 2014
6/10/2014
Parametric program abstraction
1 0 1 1 0
Abstraction
Precision
10
Scalability
Programming Language Design and Implementation, 2014
6/10/2014
Parametric program abstraction: Example 1
Cloning depth K for each
call site and allocation site
1 0 1 1 0
Pointer Analysis
11
Programming Language Design and Implementation, 2014
6/10/2014
Parametric program abstraction: Example 2
Predicates to use as
abstraction predicates
1 0 1 1 0
Shape Analysis
12
Programming Language Design and Implementation, 2014
6/10/2014
Program abstraction
1 0 1 1 0
Datalog
Program
Datalog
Program
alias(p, q)?
13
0 1 1 0 0
alias(m, n)?
Programming Language Design and Implementation, 2014
6/10/2014
Program abstraction
1 0 1 1 0
Datalog
Program
0 1 1 0 0
Counterexample guided
refinement (CEGAR) via
MAXSAT
alias(p, q)?
14
Datalog
Program
alias(m, n)?
Programming Language Design and Implementation, 2014
6/10/2014
Pointer analysis example
f(){
v1 = new ...;
v2 = id1(v1);
v3 = id2(v2);
q2:assert(v3!= v1);
}
g(){
v4 = new ...;
v5 = id1(v4);
v6 = id2(v5);
q1:assert(v6!= v1);
}
id1(v){return v;}
id2(v){return v;}
15
Programming Language Design and Implementation, 2014
6/10/2014
Pointer analysis as graph reachability
a1
0
a0
6’
b0
3
b1
6
a1
c1
1
a0
b0
c0
d0
7’
4
b1
d1
7
c1
16
6’’
2
c0
7’’
d0
5
d1
Programming Language Design and Implementation, 2014
6/10/2014
Graph reachability in Datalog
a1
0
a0
6’
b0
3
b1
6
a1
c1
1
6’’
a0
b0
c0
d0
7’
4
c1
2
Query Tuple
c0
Output relations:
path(i, j)
b1
d1
7
7’’
d0
5
Input relations:
edge(i, j, n), abs(n)
d1
Original Query
q1: path(0, 5) assert(v6!= v1)
q2: path(0, 2) assert(v3!= v1)
Rules:
(1) path(i, i).
(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).
Input tuples:
edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),
…
abs(a0)⨁abs(a1), abs(b0)⨁abs(b1),
abs(c0)⨁abs(c1), abs(d0)⨁abs(d1).
16 possible abstractions in total
17
Programming Language Design and Implementation, 2014
6/10/2014
Desired result
a1
0
a0
6’
b0
3
b1
6
a1
c1
1
6’’
a0
b0
c0
d0
7’
4
c1
2
c0
d1
7’’
d0
5
d1
Query
Answer
q1: path(0, 5)
a1b0c1d0
q2: path(0, 2)
Impossibility
18
Output relations:
path(i, j)
b1
7
Input relations:
edge(i, j, n), abs(n)
Rules:
(1) path(i, i).
(2) path(i, j) :- path(i, k), edge(k, j, n), abs(n).
Input tuples:
edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0),
…
abs(a0)⨁abs(a1), abs(b0)⨁abs(b1),
abs(c0)⨁abs(c1), abs(d0)⨁abs(d1).
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 1
a1
0
a0
6’
b0
3
b1
6
a1
c1
1
6’’
a0
b0
c0
d0
7’
4
b1
d1
7
c1
2
Query
q1: path(0, 5)
q2: path(0, 2)
19
c0
7’’
d0
5
d1
path(0, 0).
path(0, 6) :- path(0, 0), edge(0, 6, a0), abs(a0).
path(0, 1) :- path(0, 6), edge(6, 1, a0), abs(a0).
path(0, 7) :- path(0, 1), edge(1, 7, c0), abs(c0).
path(0, 2) :- path(0, 7), edge(7, 2, c0), abs(c0).
path(0, 4) :- path(0, 6), edge(6, 4, b0), abs(b0).
path(0, 7) :- path(0, 4), edge(4, 7, d0), abs(d0).
path(0, 5) :- path(0, 7), edge(7, 5, d0), abs(d0).
…
Eliminated Abstractions
abs(a0)⨁abs(a1), abs(b0)⨁abs(b1),
abs(c0)⨁abs(c1), abs(d0)⨁abs(d1).
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 1 - derivation graph
a1
0
a0
6’
b0
3
b1
6
a1
c1
1
6’’
a0
b0
c0
d0
7’
4
b1
d1
7
c1
2
Query
q1: path(0, 5)
q2: path(0, 2)
20
c0
7’’
d0
5
d1
Eliminated Abstractions
abs(a0)⨁abs(a1), abs(b0)⨁abs(b1),
abs(c0)⨁abs(c1), abs(d0)⨁abs(d1).
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 1 - derivation graph
path(0,0)
edge(0,6,a0)
abs(a0)
abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0)
abs(c0) edge(1,7,c0) path(0,1)
abs(c0)
edge(7,2,c0)
path(0,2)
21
abs(b0)
path(0,4) edge(4,7,d0) abs(d0)
path(0,7)
edge(7,5,d0)
abs(d0)
path(0,5)
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 1 - derivation graph
path(0,0)
edge(0,6,a0)
abs(a0)
abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0)
abs(c0) edge(1,7,c0) path(0,1)
abs(c0)
edge(7,2,c0)
path(0,2)
abs(b0)
path(0,4) edge(4,7,d0) abs(d0)
path(0,7)
edge(7,5,d0)
abs(d0)
path(0,5)
a0∗c0∗
22
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 1 - derivation graph
path(0,0)
edge(0,6,a0)
abs(a0)
abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0)
abs(c0) edge(1,7,c0) path(0,1)
abs(c0)
edge(7,2,c0)
path(0,2)
abs(b0)
path(0,4) edge(4,7,d0) abs(d0)
path(0,7)
edge(7,5,d0)
abs(d0)
path(0,5)
a0∗c0d0
23
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 1 - derivation graph
path(0,0)
edge(0,6,a0)
abs(a0)
abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0)
abs(c0) edge(1,7,c0) path(0,1)
abs(c0)
edge(7,2,c0)
path(0,2)
abs(b0)
path(0,4) edge(4,7,d0) abs(d0)
path(0,7)
edge(7,5,d0)
abs(d0)
path(0,5)
a0b0∗d0
24
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 1 - derivation graph
a1
0
a0
6’
b0
3
b1
6
a1
c1
1
6’’
a0
b0
c0
d0
7’
4
b1
d1
7
c1
2
c0
7’’
d0
5
d1
Query
Eliminated Abstractions
q1: path(0, 5)
a0∗c0d0, a0b0∗d0 (4/16)
q2: path(0, 2)
25
a0∗c0∗
(4/16)
abs(a0)⨁abs(a1), abs(b0)⨁abs(b1),
abs(c0)⨁abs(c1), abs(d0)⨁abs(d1).
Programming Language Design and Implementation, 2014
6/10/2014
Encoded as MAXSAT
Hard Constraints
Soft Constraints
MAXSAT( ,  ,  , … , ( ,  )):
Find
Maximize
Subject to
26
solution  that
  ⊨  , 1 ≤  ≤ }
 ⊨ 0
Programming Language Design and Implementation, 2014
6/10/2014
Encoded as MAXSAT
Avoid all the
counterexamples
Minimize the
abstraction cost
Hard constraints:
ℎ(0, 0) ∧
(ℎ(0, 6) ∨ ¬ℎ(0, 0) ∨ ¬(0 )) ∧
(ℎ(0, 1) ∨ ¬ℎ(0, 6) ∨ ¬(0 )) ∧
(ℎ(0, 7) ∨ ¬ℎ(0, 1) ∨ ¬(0 )) ∧
(ℎ(0, 4) ∨ ¬ℎ(0, 6) ∨ ¬(0 )) ∧
…
Soft constraints:
abs(a0)⨁abs(a1), abs(b0)⨁abs(b1),
abs(c0)⨁abs(c1), abs(d0)⨁abs(d1).
27
Programming Language Design and Implementation, 2014
( 0
( 0
( 0
( 0
(¬ℎ 0, 2
(¬ℎ 0, 5
 ) ∧
 ) ∧
 ) ∧
 ) ∧
 ) ∧
 ) ∧
6/10/2014
Encoded as MAXSAT
Solution:
Hard constraints:
ℎ 0, 0 = true, ℎ 0, 6 = false,
ℎ 0, 1 = false, ℎ 0, 4 = false,
ℎ 0, 7 = false, ℎ 0, 2 = false,
ℎ 0, 5 = false, , ℎ 0, 6 = 0,
 0 = false,  0 = true,
 0 = true,  0 = true.
Soft constraints:
a1b0c0d0
Query
q1: path(0, 5)
q2: path(0, 2)
28
ℎ(0, 0) ∧
(ℎ(0, 6) ∨ ¬ℎ(0, 0) ∨ ¬(0 )) ∧
(ℎ(0, 1) ∨ ¬ℎ(0, 6) ∨ ¬(0 )) ∧
(ℎ(0, 7) ∨ ¬ℎ(0, 1) ∨ ¬(0 )) ∧
(ℎ(0, 4) ∨ ¬ℎ(0, 6) ∨ ¬(0 )) ∧
…
Eliminated Abstractions
a0∗c0d0, a0b0∗d0 (4/16)
a0∗c0∗
(4/16)
Programming Language Design and Implementation, 2014
( 0
( 0
( 0
( 0
(¬ℎ 0, 2
(¬ℎ 0, 5
 ) ∧
 ) ∧
 ) ∧
 ) ∧
 ) ∧
 ) ∧
6/10/2014
Iteration 2 and beyond
Iteration 1
Derivation 
Constraints
Datalog
solver
MAXSAT
solver
 ∧

a1b0c0d0
Query
29
Answer
Eliminated Abstractions
q1: path(0, 5)
a0∗c0d0, a0b0∗d0, a1∗ c0d0
(4/16)
q2: path(0, 2)
a0∗c0∗, a1∗c0∗
(4/16)
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 2 and beyond
Iteration 2
Derivation 
Constraints
Datalog
solver
MAXSAT
solver
 ∧

a1b0c0d0
Query
30
Answer
Eliminated Abstractions
q1: path(0, 5)
a0∗c0d0, a0b0∗d0, a1∗ c0d0
(4/16)
q2: path(0, 2)
a0∗c0∗, a1∗c0∗
(4/16)
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 2 and beyond
Iteration 2
Derivation 
Constraints
Datalog
solver
MAXSAT
solver
 ∧
 ∧

a1b0c0d0
Query
31
Answer
Eliminated Abstractions
q1: path(0, 5)
a0∗c0d0, a0b0∗d0, a1∗ c0d0
(4/16)
q2: path(0, 2)
a0∗c0∗
(4/16)
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 2 and beyond
Iteration 2
Derivation 
Constraints
Datalog
solver
MAXSAT
solver
 ∧
 ∧

a1b0c1d0
Query
32
Answer
Eliminated Abstractions
q1: path(0, 5)
a0∗c0d0, a0b0∗d0, a1∗c0d0
(6/16)
q2: path(0, 2)
a0∗c0∗, a1∗c0∗
(8/16)
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 2 and beyond
Iteration 3
Derivation 
Constraints
Datalog
solver
MAXSAT
solver
 ∧
 ∧

q1 is proven.
Query
Answer
q1: path(0, 5)
a1b0c1d0
q2: path(0, 2)
33
a1b0c1d0
Eliminated Abstractions
a0∗c0d0, a0b0∗d0, a1∗c0d0
(6/16)
a0∗c0∗, a1∗c0∗
(8/16)
Programming Language Design and Implementation, 2014
6/10/2014
Iteration 2 and beyond
Iteration 3
Derivation 
Constraints
Datalog
solver
q1 is proven.
 ∧
 ∧

 ∧
a1b0c1d0
Query
Answer
q1: path(0, 5)
a1b0c1d0
Impossibility
q2: path(0, 2)
34
MAXSAT
solver
q2 is impossible
to prove.
Eliminated Abstractions
a0∗c0d0, a0b0∗d0, a1∗c0d0
(6/16)
a0∗c0∗, a1∗c0∗, a1∗c1∗, a0∗c1∗ (16/16)
Programming Language Design and Implementation, 2014
6/10/2014
Mixing counterexamples
Iteration 1
Eliminated
Abstractions:
35
a0∗c0∗
Iteration 3
a1∗c1∗
Programming Language Design and Implementation, 2014
6/10/2014
Mixing counterexamples
Iteration 1
Eliminated
Abstractions:
36
a0∗c0∗
Mixed!
a0∗c1∗
Iteration 3
a1∗c1∗
Programming Language Design and Implementation, 2014
6/10/2014
Experimental setup

Implemented in JChord using off-the-shelf solvers:



Datalog: bddbddb
MAXSAT: MiFuMaX
Applied to two analyses that are challenging to scale:

k-object-sensitivity pointer analysis:


typestate analysis:


flow-insensitive, weak updates, cloning-based
flow-sensitive, strong updates, summary-based
Evaluated on 8 Java programs from DaCapo and Ashes.
37
Programming Language Design and Implementation, 2014
6/10/2014
Benchmark characteristics
classes
methods
bytecode(KB)
KLOC
toba-s
1K
6K
423
258
javasrc-p
1K
6.5K
434
265
weblech
1.2K
8K
504
326
hedc
1K
7K
442
283
antlr
1.1K
7.7K
532
303
luindex
1.3K
7.9K
508
295
lusearch
1.2K
8K
511
314
schroeder-m
1.9k
12K
708
460
38
Programming Language Design and Implementation, 2014
6/10/2014
Results: pointer analysis
4-object-sensitivity
abstraction
< 50%
size
resolved
queries
total
iterations
current baseline
final
max
7
0
< 3% of max
46
0
170
18K
10
470
18K
13
toba-s
7
javasrc-p
46
weblech
5
5
2
140
31K
10
hedc
47
47
6
730
29K
18
antlr
143
143
5
970
29K
15
luindex
138
138
67
1K
40K
26
lusearch
322
322
29
1K
39K
17
schroeder-m
51
51
25
450
58K
15
39
Programming Language Design and Implementation, 2014
6/10/2014
Performance of Datalog: pointer analysis
k = 4, 3h28m
Baseline
k = 3, 590s
k = 2, 214s
k = 1, 153s
40
lusearch
Programming Language Design and Implementation, 2014
6/10/2014
Performance of MAXSAT: pointer analysis
lusearch
41
Programming Language Design and Implementation, 2014
6/10/2014
Statistics of MAXSAT formulae
pointer analysis
42
variables
clauses
toba-s
0.7M
1.5M
javasrc-p
0.5M
0.9M
weblech
1.6M
3.3M
hedc
1.2M
2.7M
antlr
3.6M
6.9M
luindex
2.4M
5.6M
lusearch
2.1M
5M
schroeder-m
6.7M
23.7M
Programming Language Design and Implementation, 2014
6/10/2014
Conclusion
Datalog
MAXSAT
Abstraction
43
Programming Language Design and Implementation, 2014
6/10/2014
Conclusion
Datalog
MAXSAT
A(x, y):- B(x, z), C(z, y)
Soundness
Hard Constraints
 ,  ∨ ¬ ,  ∨ ¬(, )
Tradeoffs
Scalability vs. Precision
Sound vs. Complete
…
44
1
0
1
1
0
Soft Constraints
 0  
Programming Language Design and Implementation, 2014
6/10/2014

similar documents