Skip to content

Latest commit

 

History

History
402 lines (324 loc) · 14.7 KB

doop-101.md

File metadata and controls

402 lines (324 loc) · 14.7 KB

Doop 101

Our running example can be found in Example.java.

We will run a lightweight naive analysis (-a micro option) on the generated jar file (-i Example.jar option). This analysis has only a few basic rules but it's a good representative skeleton of actual analyses. Since Doop performs a whole program analysis, the library will be analyzed along with application code. We can specify the desired version for the library code of the platform (--platform java_8 or --platform android_24 option). If no platform is provided by the user, it is set to a default value (java_8).

#!bash
$ ./doop -a micro -i docs/doop-101-examples/Example.jar --stats none

After the analysis has run, we can gather results by examining the database. For example, we can view the table generated for the call graph relation by viewing the following file:

#!bash
$ less last-analysis/CallGraphEdge.csv

Analysis structure

Input facts are auto-generated by the framework (using Soot and then imported to the database so our rules can process them.

The input schema and the rules for the micro analysis can be found in file souffle-logic/analyses/micro/analysis.dl.

For example, the following rule states that when we have a heap allocation of a ?heap object that is assigned to variable ?var inside a method deemed reachable by our analysis, then we can infer that the variable may point to this heap object.

#!java
VarPointsTo(?var, ?heap) :-
  AssignHeapAllocation(?heap, ?var, ?inMethod),
  Reachable(?inMethod).

Furthermore, when we have some kind of assignment (direct or indirect) from one variable to another, and we know that the source variable may point to some heap object, then the target variable may point to the same heap object.

#!java
VarPointsTo(?to, ?heap) :-
  Assign(?to, ?from),
  VarPointsTo(?from, ?heap).

Notice here that this rule is recursive; previously known facts about the VarPointTo relation may lead to the inference of additional facts. Doop analysis rules are mutually recursive in complex ways.

Accessing the database

After the end of an analysis, a symbolic link for the resulting database can be found under the results directory. Also, for convenience, a second symbolic link is created at top level called last-analysis, each time pointing to the latest successful analysis.

Analysis schema and rules

Note that micro used here attempts to be self-contained, so it contains all core login in the same file. To examine the other analyses of the Doop framework, follow this structure:

  • Their input schema can be found in souffle-logic/facts/flow-insensitivite-schema.dl.
  • The rules for an analysis A, can be found in souffle-logic/analyses/$A/analysis.dl.

Running custom logic

There are two ways to process the analysis results: we can run custom logic alongside the analysis logic (while the analysis runs), or let the analysis compute its results and post-process them.

To illustrate these two approaches, assume we want to find all variable-points-to information for method Example.test(), when doing a context-insensitive analysis.

Running custom logic inside the analysis

Put the following logic in file extra.dl:

#!java
.decl Temp(v: Var, h: Value)

Temp(v, h) :-
  mainAnalysis.VarPointsTo(_, h, _, v),
  Var_DeclaringMethod(v, "<Example: void test(int)>").

.output Temp

Note that some relations belong to the "mainAnalysis" component and thus must be qualified.

Then, run the analysis using the custom logic:

#!bash
$ ./doop -a context-insensitive -i docs/doop-101-examples/Example.jar --stats none --extra-logic extra.dl

The new relation is now included in the results:

#!bash
$ cat last-analysis/Temp.csv
<Example: void test(int)>/@this <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l0#_0 <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l3#_32        <Example: void test(int)>/new Cat/1
<Example: void test(int)>/l4#_33        <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack5       <Example: void test(int)>/new Dog/0
<Example: void test(int)>/$stack6       <Example: void test(int)>/new Cat/1
<Example: void test(int)>/$stack7       <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack8       <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2#_26        <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_1#_28  <Example: void test(int)>/new Dog/0
<Example: void test(int)>/l2_$$A_2#_29  <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_2#_29  <Example: void test(int)>/new Dog/0

The advantage of running custom logic this way is that it has access to all analysis relations (not just those written on disk). The disadvantage is that changes to this custom logic require running the analysis again (and waiting for the analysis logic to be recompiled).

Running custom logic after the analysis

In a file temp.dl put the code:

#!java
.decl Var_DeclaringMethod(v: symbol, m: symbol)
.input Var_DeclaringMethod(IO="file", filename="Var-DeclaringMethod.facts", delimiter="\t")

.decl VarPointsTo(c1: symbol, h: symbol, c2: symbol, v: symbol)
.input VarPointsTo(IO="file", filename="VarPointsTo.csv", delimiter="\t")

.decl Temp(v: symbol, h: symbol)
Temp(v, h) :-
  VarPointsTo(_, h, _, v),
  Var_DeclaringMethod(v, "<Example: void test(int)>").

.output Temp

Copy the Var-DeclaringMethod.facts so that they are in the same directory as the output relation VarPointsTo (replace $id with your analysis id):

#!bash
$ cp out/$id/facts/Var-DeclaringMethod.facts out/$id/database/

Run the query and view its results:

#!bash
$ souffle -F out/$id/database/ temp.dl
$ cat Temp.csv
<Example: void test(int)>/@this <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l0#_0 <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l3#_32        <Example: void test(int)>/new Cat/1
<Example: void test(int)>/l4#_33        <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack5       <Example: void test(int)>/new Dog/0
<Example: void test(int)>/$stack6       <Example: void test(int)>/new Cat/1
<Example: void test(int)>/$stack7       <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack8       <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2#_26        <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_1#_28  <Example: void test(int)>/new Dog/0
<Example: void test(int)>/l2_$$A_2#_29  <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_2#_29  <Example: void test(int)>/new Dog/0

Doop & the DaCapo benchmarks suite

We frequently analyze various programs from the DaCapo Benchmarks suite using a variety of advanced analyses. E.g., let's analyze the antlr benchmark using a 2 type-sensitive analysis.

#!bash
$ ./doop -a 2-type-sensitive+heap -i benchmarks/dacapo-2006/antlr.jar --dacapo --platform java_8

Towards the end of execution, Doop will report a set of metrics gathered from the analyzed program. Those metrics are computed through the use of various queries on the resulting database. Those can be found under souffle-logic/addons/statistics.


Auxiliary tools

mkjar

You can use bin/mkjar to easily generate a jar file from a single java file. The generated jar file will contain local variable debugging information (e.g., variable names).

Example:

#!bash
$ ./bin/mkjar Example.java 1.8
added manifest
adding: Dog.class(in = 292) (out= 210)(deflated 28%)
adding: Animal.class(in = 434) (out= 280)(deflated 35%)
adding: Cat.class(in = 505) (out= 296)(deflated 41%)
adding: Example.class(in = 1055) (out= 653)(deflated 38%)

bytecode2jimple

You can use bytecode2jimple to easily generate Jimple (or Shimple--the ssa variant) from a jar file. For more information, invoke bytecode2jimple without arguments.

Example:

#!bash
$ ./bin/bytecode2jimple -lsystem -d jimple-dir Example.jar
$ ls jimple-dir
Animal.jimple  Cat.jimple  Dog.jimple  Example.jimple

LogicBlox-only functionality (legacy)

This section is about interacting with the legacy LogicBlox analysis logic. This analysis logic is no longer maintained but may be useful for experimenting with published artifacts or other past work.

Get a predicate's entries

As we already saw, the easiest way to interact with the database is to simply print all the entries of a certain predicate.

#!bash
$  bloxbatch -db last-analysis -print FieldPointsTo
predicate: FieldPointsTo(HeapAllocation, FieldSignature, HeapAllocation)
...
/--- start of FieldPointsTo facts ---\
  [113914]Example.test/new Cat/1, [11417]<Cat: Cat parent>, [113915]Example.test/new Cat/2
\---- end of FieldPointsTo facts ----/

Doop represents (and abstracts away) objects by using their allocation point in the program. Example.test/new Cat/1 refers to the second (zero-based indexing) Cat object allocated inside method Example.test().

As expected, the parent field of the second Cat object may point to the third Cat object.

Query 1

Our first query is to ask for VarPointTo entries of variables declared in Example.morePlay().

#!bash
$ bloxbatch -db last-analysis -query \
'_(?var, ?heap) <- VarPointsTo(?var, ?heap), Var:DeclaringMethod(?var, "<Example: void morePlay(Cat)>").'
  Example.morePlay/@this, Example.main/new Example/0
  Example.morePlay/r0, Example.main/new Example/0
  Example.morePlay/r3, Example.test/new Cat/1
  Example.morePlay/r4, Example.test/new Cat/1
  Example.morePlay/r1, Example.test/new Cat/1
  Example.morePlay/@param0, Example.test/new Cat/1
  Example.morePlay/r2, Example.test/new Cat/2
  Example.morePlay/r3, Example.test/new Cat/2
  Example.morePlay/r4, Example.test/new Cat/2

The string provided to the -query option can be a set of left and right arrow Datalog rules. Newly defined predicates have to start with _ since they will only exist for the duration of the query evaluation. Refmode values can be used directly, and the engine will automatically substitute them with their internal IDs. E.g., the following part

#!java
Var:DeclaringMethod(?var, "<Example: void morePlay(Cat)>")

is equivalent to

#!java
Var:DeclaringMethod(?var, ?method), MethodSignature:Value(?method:"<Example: void morePlay(Cat)>")

Note here that Doop analyzes Java bytecode. Input facts are generated using Soot, which transforms Java bytecode to Jimple, a language based on three address code. As a result new temporary variables are introduced and also original variable names might be lost (they can be retained through specific flags in javac and Soot).

Query 2

A more advanced query can be found in query2.logic. Essentially, we compute a transitive closure on the CallGraphEdge predicate. The logic used in a query can be as complicated as in any "normal" Datalog program.

#!bash
_path(?fromMethod, ?toMethod) <-
  CallGraphEdge(?invocation, ?toMethod),
  Instruction:Method[?invocation] = ?fromMethod.

_path(?fromMethod, ?toMethod) <-
  _path(?fromMethod, ?toMethodMid),
  CallGraphEdge(?invocation, ?toMethod),
  Instruction:Method[?invocation] = ?toMethodMid.
#!bash
$ bloxbatch -db last-analysis -query -file docs/doop-101-examples/query2.logic
  <Example: void main(java.lang.String[])>, <Example: void test(int)>
  <Example: void test(int)>, <Example: void morePlay(Cat)>
  <Example: void main(java.lang.String[])>, <Example: void morePlay(Cat)>
  <Example: void test(int)>, <Cat: void setParent(Cat)>
  <Example: void main(java.lang.String[])>, <Cat: void setParent(Cat)>
  <Example: void test(int)>, <Cat: Cat getParent()>
  <Example: void morePlay(Cat)>, <Cat: Cat getParent()>
  <Example: void main(java.lang.String[])>, <Cat: Cat getParent()>
  <Example: void test(int)>, <Cat: void play()>
  <Example: void morePlay(Cat)>, <Cat: void play()>
  <Animal: Animal playWith(Animal)>, <Cat: void play()>
  <Example: void main(java.lang.String[])>, <Cat: void play()>
  <Example: void test(int)>, <Dog: void play()>
  <Example: void main(java.lang.String[])>, <Dog: void play()>
  <Example: void test(int)>, <Animal: Animal playWith(Animal)>
  <Example: void morePlay(Cat)>, <Animal: Animal playWith(Animal)>
  <Example: void main(java.lang.String[])>, <Animal: Animal playWith(Animal)

The line Instruction:Method[?invocation] = ?fromMethod found in the previous query uses a special form of predicate known as a functional predicate. Those are similar to normal ones, but they act like a map. Values found between the square brackets are mapped to only on value on the right.

Query 3

For example, one metric is the computation of casts that potentially may fail. It joins input facts as well as facts computed during execution to infer casts where the related variable may point to an object that is incompatible with the type of the cast.

#!java
_Stats:Simple:PotentiallyFailingCast(?type, ?from, ?to) <-
    _Stats:Simple:ReachableCast(_, ?type, ?to, ?from),
    Stats:Simple:InsensVarPointsTo(?heap, ?from),
    HeapAllocation:Type[?heap] = ?heaptype,
    ! AssignCompatible(?type, ?heaptype).

The use of _ as a predicate parameter denotes that we don't care for a specific value. It represent a parameter that is not bound.

The above query can be found isolated in query3.logic.

#!bash
$ bloxbatch -db last-analysis -query -file docs/doop-101-examples/query3.logic

Aggregate functions

Datalog supports the use of aggregation functions. One such function is count. E.g., let's to compute the total number of VarPointsTo entries.

#!bash
$ bloxbatch -db last-analysis -query \
'_[] = ?n <- agg<<?n = count()>> VarPointsTo(_, _, _, _).'
  4569312

Log Analyzer

Using additional runtime flags, you can emit extra debug information related to the evaluation process of the Datalog engine. Explaining what kind of information is collected, is out of the scope of this tutorial, but this can be fed to bin/log-analyzer.py which will then emit profile information for each Datalog rule.

Example:

#!bash
$ ./doop -a context-insensitive -i Example.jar -- -logLevel debugDetail@factbus > log.txt
$ ./bin/log-analyzer.py log.txt 
 ImpossibleExceptionHandler(?handler :: ExceptionHandler, ?type :: Type, ?instruction :: Instruction) <-
   PossibleExceptionHandler(?handler :: ExceptionHandler, ?type :: Type, ?instruction :: Instruction),
   ExceptionHandler:Before(?previous :: ExceptionHandler, ?handler :: ExceptionHandler),
   PossibleExceptionHandler(?previous :: ExceptionHandler, ?type :: Type, ?instruction :: Instruction).
 => 7.27862s
...