Dynamic Purity Analysis for Java Programs

Dynamic Purity Analysis for Java
Haiying Xu, Christopher J.F. Pickett, Clark Verbrugge
School of Computer Science, McGill University
PASTE ’07 Conference, San Diego, CA
Presented by
Derek White
CSE 6329
Approach and Contributions
Design: Static Purity Analysis
Kinds of Dynamic Purity
Design: Dynamic Purity Analysis
Experimental Evaluation
• Functional programming emphasizes application of
functions and avoids mutable data (side effects)
• Popular functional languages include Scheme, Haskell,
F#, OCaml, Scala, etc
• But you can program in a functional style using other
• “Pure” methods are methods that have functional (side
effect free) behavior
– Several definitions for purity, either no externally visible
side effects or the extent of side effects is limited
– Constraints may also be placed on level of dependency on
previously available state
Introduction (2)
• Why do we care if a method is pure?
• Helpful in program understanding, allows us
to isolate side effect free parts
• Verification in model checking
• Can be used to guide compiler optimization
– Better method purity info allows for less
conservative assumptions
– Caching (memoization) of function calls
Introduction (3)
• Static analysis has allowed large classifications for
pure methods, there is variation in precise
definitions used
• Static analysis is conservative with respect to
runtime behavior
• It is unclear if some classes of pure methods have
any practical value
• So, the authors present a detailed examination of
method purity for Java
– Considering several definitions of purity
– Investigating both static and dynamic properties
Approach and Contributions
• Extending previous work on static analysis,
showing different forms of purity at different
frequencies in dynamic environment
• Design and implementation of dynamic purity
analysis, online and offline
– Scalable, handles SPECjvm98 at size 100 “with
acceptable overhead”
• Support for multiple purity definitions in order to
compare to static purity analysis, also identified
pure forms only observable dynamically
Approach and Contributions (2)
• Three metrics for the evaluation of extent of
dynamic purity
– Method, invocation, bytecode
– These are applied to a static analysis as well as
dynamic purity definitions
• Implementation of memoization on JVM, a
traditional consumer of purity information
– Doesn’t achieve any speedup, just a functional
test module
Design: Static Analysis
• Previous work has found that a large number of methods have
weak purity properties, stronger purity properties result in fewer
pure method
• Static work done here considers strong purity
– Method is “strongly pure” iff it doesn’t depend on OR change initial
state beyond primitive input values
– Must always return the same result for the same input
• Specifically, the method may not:
Read/write heap or static data
Allocate objects
Invoke native methods
Throw exceptions
Invoke any non-pure methods
Design: Static Analysis (2)
• Java class files used as input
• Flow-insensitive analysis done using Soot
Class files
Class files + attributes
Static Analysis
Attribute Parser
Dynamic Metrics
Figure 1. Static analysis framework
Design: Static Analysis (3)
• Instructions within a method are scanned, any instructions
found to be impure mark the method as impure
• Interprocedural analysis is done next, propagating impurity
up from leaves of a CHA-based call graph
• Assumption is made that exceptions do not propagate up
the call stack unchecked
Native code exec
native INVOKE*
Heap access
Static access
synchronized INVOKE*, synchronized *RETURN,
Design: Static Analysis (4)
• Easily extended for dynamic evaluation of strong
static purity analysis
• Soot writes purity information to class file
• SableVM reads attributes and records:
– Pure methods reached at runtime
– Frequency of pure method invocations
– Percentage of pure bytecode executed by pure
• Provides indications about how static results
correlate with dynamic runtime behavior
Design: Dynamic Analysis
• Under the static analysis, a method is determined
to be pure for all possible executions or is impure
otherwise – may be too conservative
• Methods that were flagged impure with static
analysis may only execute pure flow control at
• Goal of dynamic analysis is to identify pure
methods based on runtime behavior, increasing
number of pure methods found
Design: Dynamic Analysis (2)
Figure 2. Dynamic purity analysis framework
Design: Dynamic Analysis (3)
• Class files read into SableVM, instruction stream is
examined for purity
• Purity analysis module uses an online escape analysis
tracking writes to locally allocated objects
• Purity information can be used immediately by the VM
or written to a file as offline analysis for a later
• Offline analysis removes the execution overhead
• Clients of analysis are memoization and metrics used in
static analysis
• Four kinds of purity: strong, moderate, weak, onceimpure
Kinds of Dynamic Purity: Strong
Same criteria as strong static purity
Only executed instructions are considered
All methods start with unknown status
Impure method information propagates up
the call stack
• As with static, once a method is identified as
impure it is conservatively always considered
Kinds of Dynamic Purity: Moderate
• Objects can be created and altered as long as the objects do not escape
the method execution context
• A method may call an impure method as long as the impurity is contained
• Must not change behavior based on heap or global state, based
completely on primitive input arguments
• Methods still cannot:
Invoke native methods
Read/write existing heap or static objects
Perform monitor operations
Throw exceptions
Call moderately impure methods, unless modified data belongs to and is
contained in the caller
• Native System.arraycopy() and Object.clone() treated as heap access and
allocation instructions
Kinds of Dynamic Purity: Moderate (2)
• Analysis needs to take a closer look at *NEW*,
• *NEW* instructions used to determine object locality
– Objects of a method are local if they do not escape the method,
or if they escape from a callee
– Frames in the call stack have an object table storing all currently
local objects
• PUTFIELD can allow objects local to the callee to escape
to the caller (requires an update to the object table)
classified depending on a frame’s object table
• Moderately pure methods can only use object parameters
for reference comparisons
Kinds of Dynamic Purity: Weak
• Allows heap reads so a method can inspect
object parameters
• Maintains property that the method is
function on its input
• GETFIELD is always safe
• PUTFIELD still is considered in the context of
the escape analysis
Kinds of Dynamic Purity: Once-Impure
• Observed that some impure methods became
weakly pure after a first invocation
• Once-Impure is a weakly pure method that
was impure during its first execution
Memoization: Optimization with Purity
• All forms of purity mentioned previously ensure
that there is a unique result for any given input
• All are candidates for memoization
• Memoization caches argument to return value
mapping allowing the VM to bypass repeated
execution of a method with the same arguments
• Benefit from jumping past execution must
outweigh cost of looking up the return value in
Memoization (2)
• Method must be long enough to be worth optimizing
• After the first invocation, arguments are hashed together,
looked up in a hash table, and the stored return value is
substituted for invocation
• Primitive args stored directly, reference args are flattened
(gathering type and primitive fields)
– Done so that garbage collection doesn’t invalidate memo tables
• Direct object reference comparisons cannot be safely
memoized, so ACMP_* bytecodes must be considered
• Upper bounds on memory consumption limit the number
of method invocations that can be cached
Experimental Evaluation
• Experiments conducted using programs from
SPEC JVM98 benchmark
• Metrics
– Static method purity - percentage of all methods in
the call graph that are pure
– Dynamic method purity - percentage of methods
reached at runtime that are pure
– Dynamic invocation purity – percentage of method
invocations that are pure
– Dynamic bytecode purity – percentage of executed
bytecode stream belonging to pure methods
Experimental Evaluation: Static
• Experimental analysis includes both application and class library
code used
• On average, 13% of methods are found to be strongly pure
• Not all methods are invoked at runtime, dynamically it is found that
5-6% of reached methods are statically identified as pure
• Many of these methods are small (20 inst or less) or are executed
Table 2. Strong Static Purity: Static methods row shows percentage of all methods in the
call graph identified as statically pure. Dynamic methods row shows percentage
of all dynamic method invocations that execute a statically pure method. Bytecode
row shows the percentage of the bytecode stream that is executed by a statically
pure method
Experimental Evaluation: Dynamic
• Strong dynamic purity is a weaker than the static
• First row of Tables 3, 4, 5 show an improvement
over the runtime use of strong static purity in
rows 2-4 of Table 2
• Table 3 shows up to 4% more pure methods
reached with strong dynamic purity
• Some methods invoked with significant
frequency, Table 4 shows 13% more pure
invocations for db
Experimental Evaluation: Dynamic (2)
Table 3. Dynamic method purity: All reached methods
Table 4. Dynamic invocation purity: Invoked methods that
are pure for dynamic purity definitions
Table 5. Dynamic bytecode purity: Bytecode instruction
streams that are pure for dynamic purity definitions
Experimental Evaluation: Dynamic (3)
• Reasons for impurity
Table 8. Reasons for dynamic impurity
Experimental Evaluation: Memoization
• Once-impure dynamic purity analysis used, a
method is always invoked once prior to
• Only applied to methods meeting cost effective
Table 11. Memoized/memoizable methods: Minimum method
size setting shown in far left column
Experimental Evaluation: Execution
Figure 3. Execution times: Minimum method size for memoization is set to 50
• Dynamic purity analyses identify considerable
amounts of purity
• Actual program behavior is not predictable
based on only on static observations
• Little variation in purity over the benchmark
• May be the case that memoization is of
limited use for non-functional languages

similar documents