-
Notifications
You must be signed in to change notification settings - Fork 225
Pointer Analysis
WALA provides a framework for flow-insensitive Andersen-style pointer analysis. You can customize the pointer analysis context-sensitivity policy in various ways. We often find that for a particular client application, it is useful to customize the pointer analysis policy to focus analysis effort where it matters.
Currently, all pointer analysis implementations perform on-the-fly call graph construction.
A pointer analysis context-sensitivity policy can vary in two dimensions:
-
HeapModel
: how does the pointer analysis model abstract pointers and heap locations? -
ContextSelector
: how does the call graph construction clone methods based on context?
You can define your own pointer analysis policy by customizing a heap model and context selector. Also, WALA provides a number of built-in policies.
A HeapModel
tells the WALA pointer analysis how to abstract pointers
and heap locations. The key classes are
-
PointerKey
: a representation of an abstract pointer -
InstanceKey
: a representation of an abstract heap location.
For example, a PointerKey
may represent a local variable or a static field, or an instance field of objects from a particular allocation site, or other variants. More technically, a PointerKey
is the name for an equivalence class of pointers from the concrete program, which are collected into a single representative in the abstraction.
An InstanceKey
may represent all objects of a particular type or all objects from a particular allocation site, or all objects from a particular allocation site in a particular context, or other variants.
A HeapModel
provides call-backs for the pointer analysis to create
PointerKeys
and InstanceKeys
during analysis. You can customize the
policy by providing your own HeapModel
.
A HeapGraph
provides a convenient way to navigate the results of a
pointer analysis. The nodes in a HeapGraph
are PointerKeys
and
InstanceKeys
. There is an edge from a PointerKey
P to an
InstanceKey
I if and only if the pointer analysis indicates that P
may point to I. There is an edge from an InstanceKey
I to a
PointerKey
P if and only if P represents a field of an object
instance modeled by I, or P represents the array contents of array
instance I.
For example, given a HeapGraph
h, suppose you want to know all the
PointerKey
s that may alias a particular PointerKey
p. You first
use h.getSuccNodes(p)
to find all InstanceKeys
p may point to, and
for each such InstanceKey
i, use h.getPredNodes(i)
to find other
PointerKeys
that may alias p.
Recall that each call graph node represents a method in a particular
Context
. The
ContextSelector
object controls the policy by which the call graph builds contexts.
The simplest Context is
Everywhere.EVERYWHERE
,
which represents a single, global context for a method.
Other context-policies can represent call-string contexts, contexts naming the receiver object to implement object-sensitivity or other variants.
You can customize a context-sensitivity policy by providing a custom
ContextSelector
object.
Suppose we have a public entry point with the following signature:
public static void foo(Object x, Object y)
What will pointer analysis do with the parameters to this entry point? In
particular, what InstanceKey
abstractions will the pointer analysis
instantiate as the initial contents of the actual parameters x
and
y
?
WALA's pointer analysis uses the Entrypoint
interface to direct this
policy. The crucial method, in the Entrypoint
interface, is:
public abstract TypeReference[] getParameterTypes(int i);
getParameterTypes(i)
gives a set of types which will be instantiated
for the i
th parameter to the entry point.
For example, suppose we build an Entrypoint
implementation for foo
where
getParameterTypes(0) = { java.lang.Object, java.lang.String }
getParameterTypes(1) = { java.lang.Integer }
Then logically, the pointer analysis models the invocation of foo
as
follows:
Object o1 = new java.lang.Object();
Object o2 = new java.lang.String();
Object o3 = non-determinstic-choice ? o1 : o2;
Object o4 = new java.lang.Integer();
foo(o3,o4);
WALA generates synthetic IR instructions representing the above code in
a "fake root method" that invokes all the entrypoints; see FakeRootMethod
.
Given that IR, the pointer analysis will create InstanceKey
abstractions
according to the governing HeapModel
, as if the program encounters
allocations of the specified types in the FakeRootMethod
.
By default, most WALA pointer analysis implementation examples use the
DefaultEntrypoint
implementation of Entrypoint
. In this
implementation, getParameterTypes(i)
returns an array with a single
element, which is the declared type of the ith parameter. Most
production-level clients of WALA provide custom implementations of
Entrypoint
, specialized to the types of frameworks and clients being
analyzed.
Let's look at a few of WALA's built-in pointer analysis policies.
The ZeroCFA policy provides a simple, cheap, context-insensitive pointer
analysis. You can get a call graph builder using the ZeroCFA policy from
Util.makeZeroCFABuilder()
- The
HeapModel
introduces a singleInstanceKey
for every concrete type. That is, all objects of a particular type are represented by a single abstract object. - The
ContextSelector
uses a single global context for each method, except for some special cases dealing with reflection, described below.
The ZeroOneCFA policy provides an approximation of "standard" Andersen's pointer analysis, using allocation sites to name abstract objects. By default,
- The
HeapModel
introduces a singleInstanceKey
for every allocation site.
However, there are variants of ZeroOneCFA, depending on some
optimizations to the heap model provided by the
ZeroXInstanceKeys
class. See the source for more details. Briefly:
- VanillaZeroOneCFA (see
Util.makeVanillaZeroOneCFA
) turns off all optimizations; each allocation is handled separately. - ZeroOneCFA (see
Util.makeZeroOneCFABuilder
) turns on the following optimizations-
SMUSH_STRINGS:
individual
String
orStringBuffer
allocation sites are not disambiguated. A singleInstanceKey
represents allString
objects, and a singleInstanceKey
represents allStringBuffer
objects. - SMUSH_THROWABLES: individual exception objects are disambiguated by type, and not by allocation site.
-
SMUSH_PRIMITIVE_HOLDERS:
if a class has no reference fields, then all objects of that
type are represented by a single
InstanceKey
-
SMUSH_MANY:
if there are more than k (current k=25) allocation sites of
a particular type in a single method, then all these sites are
represented by a single
InstanceKey
. For example, a library class initializer method might allocate 10,000Font
objects; this optimization will not disambiguate the 10,000 separate allocation sites.
-
SMUSH_STRINGS:
individual
You can customize the policies in similar ways, depending on your client's needs.
The ZeroOneContainerCFA policy (see
Util.makeZeroOneContainerCFABuilder
)
extends the ZeroOneCFA policy with unlimited object-sensitivity for
collection objects. For any allocation sites in collection objects, the
allocation site is named by a tuple of allocation sites extending to the
outermost enclosing collection allocation. This policy can be relatively
expensive, but can be effective in disambiguating contents of the standard
collection classes. For this to be effective, note that ZeroOneContainer
modifies the ContextSelector
to clone methods based on the receiver
object's object-sensitive name. The
ContainerContextSelector
class manages this logic and the method
ContainerUtil.isContainer()
determines which classes are considered containers.
ZeroOneContainer also uses one level of call-string sensitivity for
certain methods that are either factories or are known to require such
precision, e.g., System.arraycopy()
.
ContainerContextSelector.isWellKnownStaticFactory()
determines which methods get this treatment.
For call-site sensitivity, The context of each CGNode consists of the last k call sites of the call chains. Wala uses a call CallString to record these call sites. You can use Util.makeNCFABuilder() to create a call graph builder based on call-site sensitivity. For the heap abstraction, Wala uses the CallStringContext of CGNode as the heap context to qualify it, and uses allocation sites (AllocationSiteInNode) to represent allocations.
Object-Sensitivity is regarded as arguably the best for pointer analysis in object-oriented languages (like Java). Contrary to K-CFA, a K-Obj analysis uses k object allocation sites as context elements. Wala uses AllocationString to record these allocation sites. Use Util.makeNObjBuilder() or Util.makeVanillaNObjBuilder() to create a call graph builder based on the obj sensitivity. The relevant Context information is as follows:
- For virtual method calls, the context is an AllocationStringContext, which contains an AllocationString inside.
- For static method calls, a few well-known static factory methods from the standard libraries use CallerSiteContext. Otherwise, directly copy the context of the last non-static method as the context of the called method.
- For an object(fixed at allocation) : The heap context is the same as the context of CGNode, has the same number of allocation sites.
To deal with reflection, WALA uses context-sensitivity policies, even if using a context-insensitive base policy like ZeroCFA.
Currently, all built-in pointer analysis policies treat calls to
Object.clone()
context sensitively, using the concrete type of the
receiver class as the context (see
CloneContextSelector
).
Also, clone()
has a different IR in each context, to properly model the semantics of the method for the corresponding receiver type (see CloneInterpreter
).
There is also significant support for other reflective constructs like
Class.forName()
, Class.newInstance()
, Method.invoke()
, etc. See
ReflectionContextSelector
and
ReflectionContextInterpreter
for the code that manages this reflection handling, and
AnalysisOptions.setReflectionOptions()
for tuning the handling.
Reflection usage and the size of modern libraries/frameworks make it very difficult to scale flow-insensitive points-to analysis to modern Java programs. For example, with default settings, WALA's pointer analyses cannot handle any program linked against the Java 6 standard libraries, due to extensive reflection in the libraries. To improve scalability (at the cost of some soundness in some cases) try the following:
- Use
AnalysisScope
exclusions to exclude classes / packages you know are irrelevant to the application. Exclusions are specified in an exclusions file passed to the appropriate method ofAnalysisScopeReader
. For an example, seeJava60RegressionExclusions.txt
. - Analyze an older version of the standard libraries, e.g., from Java 1.4.
- Reduce the context sensitivity of the pointer analysis (see above).
- Reduce WALA's reflection handling by calling
setReflectionOptions()
on theAnalysisOptions
an object used for pointer analysis. You may have to tinker to see what setting enables scalability for your application.
As of version 1.0.04, WALA also includes a demand-driven pointer analysis. See the Demand Pointer Analysis page for details.