Skip to content

Writing analyses using McSAF

Sameer Jagdale edited this page May 7, 2015 · 5 revisions

This page gives a tutorial on how to write analysis using McSAF. We will start with a simple Name Collector, which collects the names of the variables on the LHS. Next we will implement a reaching definitions analysis. Note that we assume knowledge of Java and theoretical knowledge of compiler analyses. ###Name Collector Analysis The analysis for collecting the names is fairly straightforward. We do not require a fixed point to implement it. Hence, we will be using the AbstractDepthFirstAnalysis[1]. We start by extending the AbstractDepthFirstAnalysis class.

public class NameCollector extends AbstractDepthFirstAnalysis<Set<Name>> 

We call our analysis class NameCollector. AbstractDepthFirstAnalysis uses generics to define the type of the data structures used to hold the information collected from the analysis. In this case, we use a set of Name objects. The Name class is a node of the McSAF AST.

Next we declare the class members that we need for the analysis.

protected Set<Name> fullSet;
protected boolean inLHS = false;

We define constructors for the class.

public NameCollector(ASTNode<?> tree)
    {
        super(tree);
        fullSet = newInitialFlow();
    }

The method newInitialFlow() is an abstract method from the AbstractDepthFirstAnalysis class which we are required to implement.

 @Override
    public Set<Name> newInitialFlow()
    {
        return new HashSet<>();
    }

We then define public methods to allow other classes to access the collected names.

public Set<Name> getAllNames()
    {
        return fullSet;
    }
    public Set<Name> getNames( AssignStmt node )
    {
      return flowSets.get(node);
    }

AbstractDepthFirstAnalysis provides a method for every node in the McSAF AST and defines the default behaviour for each of them. We will override the methods that we require to collect the names, namely, caseAssignStmt, caseParameterizedExpr, caseCelIndexExpr, caseDotExpr, caseFunction, caseGlobalStmt.

Let us look at the implementation of each method.

Assignment Statement
 @Override
    public void caseAssignStmt( AssignStmt node )
    {
        inLHS = true;
        currentSet = newInitialFlow();
        analyze( node.getLHS() );
        flowSets.put(node,currentSet);
        fullSet.addAll( currentSet );
        inLHS = false;
    }

In the above method, we set the class member variable inLHS to true. inLhs is used to ensure that only names that are on the LHS of the statements are collected. Members flowSets and currentSet are inherited through the AbstractDepthFirstAnalysis class. flowSets is a map which maps a ASTNode to its equivalent set of names. Note that the the type of the value in the map can differ depending on the generic type that is provided. currentSet is used to hold the set of names for the current node. fullSet is a member defined in this class and is a set of all the names on the LHS in the AST.

Index Operations
    @Override
    public void caseParameterizedExpr( ParameterizedExpr node )
    {
        analyze(node.getTarget());
    }

    @Override
    public void caseCellIndexExpr(CellIndexExpr node) {
      analyze(node.getTarget());
    }

    @Override
    public void caseDotExpr(DotExpr node) {
      analyze(node.getTarget());
    }

The above methods all traverse down to the child node referring to the index variable. For example, in case of the parametrised expression a(i), we traverse down the node referring to a.

Functions
    @Override
    public void caseFunction(Function f) {
      for (Name name : f.getInputParams()) {
        currentSet.add(name);
      }
      fullSet.addAll(currentSet);
      analyze(f.getStmts());
      analyze(f.getNestedFunctions());
    }

In case of functions, we analyse the input parameters, function body and the nested functions.

Global Statement
    @Override
    public void caseGlobalStmt(GlobalStmt node) {
      for (Name name : node.getNames()) {
        currentSet.add(name);
      }
      fullSet.addAll(currentSet);
    }

For the global statement, we analyse use the getNames method to fetch all the names and add them to the currentSet and the fullSet.

You can find the full code for this analysis here

###Definitely Assigned Analysis Next let us look at the definitely assigned analysis. You can find more information on how the analysis works in theory here. The analysis is a forward analysis and hence we will be extending the ForwardAnalysis class to implement the analysis. Similar to the AbstractDepthFirstAnalysis class, ForwardAnalysis is also generic and hence a type needs to be specified.

public class DefinitelyAssignedAnalysis extends ForwardAnalysis<Set<String>> 

We then define the constructor similar to the previous name collector analysis.

public DefinitelyAssignedAnalysis(ASTNode tree) {
		super(tree);
	}

Additionally ForwardAnalysis also provides the currentOutSet, currentInSet, inflowSets and outflowSets data structures for holding the intermediate and final results, similar to the name collector analysis. We have to override and implement the newInitialFlow method.

@Override
	public Set<String> newInitialFlow() {
		return new HashSet<>();
	}

Additionally, we also have to define the methods merge and copy. The merge method defines the operation to be carried out at a merge point and copy defines what needs to be done for a copy.

@Override
	public Set<String> copy(Set<String> source) {
		return new HashSet<>(source);
	}

	@Override
	public Set<String> merge(Set<String> in1, Set<String> in2) {
		Set<String> out = new HashSet<>(in1);
		out.retainAll(in2);
		return out;
	}

In our case, the copy operation simply returns a new copy of the input. The merge operation performs a set intersection. We then override the methods of handling assignment statements, global statements , scripts and functions. We also override the method for the statement class which is the parent class for all statements.

@Override
	public void caseScript(Script node) {
		currentInSet = newInitialFlow();
		currentOutSet = new HashSet<>(currentInSet);
		node.getStmts().analyze(this);
		outFlowSets.put(node, currentOutSet);
	}

	@Override
	public void caseFunction(Function node) {
		currentInSet = newInitialFlow();
		currentOutSet = new HashSet<>(currentInSet);

		for (Name n : node.getInputParams()) {
			currentInSet.add(n.getID());
		}

		node.getStmts().analyze(this);
		outFlowSets.put(node, currentOutSet);

		node.getNestedFunctions().analyze(this);
	}

	@Override
	public void caseAssignStmt(AssignStmt node) {
		inFlowSets.put(node, currentInSet);
		currentOutSet = new HashSet<>(currentInSet);
		currentOutSet.addAll(node.getLValues());
		outFlowSets.put(node, currentOutSet);
	}

	@Override
	public void caseGlobalStmt(GlobalStmt node) {
		inFlowSets.put(node, currentInSet);
		currentOutSet = new HashSet<>(currentInSet);
		for (Name name : node.getNames()) {
			currentOutSet.add(name.getID());
		}
		outFlowSets.put(node, currentOutSet);
	}

	@Override
	public void caseStmt(Stmt node) {
		inFlowSets.put(node, currentInSet);
		currentOutSet = currentInSet;
		outFlowSets.put(node, currentOutSet);
	}

For functions, the input parameters are considered to be already defined and hence are added to the currentInSet. The body and the nested functions are then analysed. Since a script does not have any input parameters, the currentInSet is initialised and copied over to the currentOutSet. For the global statement, all the names are copied over to the currentOutSet. For assignment statements, we add all the values on the LHS to the outset apart from copying over all the values from inset. The implementation for the rest of the statements is handled by the caseStmt method. The inset is directly copied over to the outset. The complete analysis can be found here