You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I've been implementing some static analysis (somewhat similar to taint analysis, but the propagated state is a bit more complex). During this, I've implemented some improvements to the various components that are involved, which vastly improved results for me, and I just wanted to share them in case you'd be interested in these changes, perhaps they are relevant 🙂
I wrote them in kotlin in our internal code, but I'd be happy to make a PR if the changes are something that would be useful to you.
Note that some of these changes are based on the assumption that the heap is not tracked.
1. Replacing virtual calls with calls to implementations
This greatly improved the results because I have a lot of cases like this:
The main method calls takeTheArgument on an Interface with a tainted argument, but the call is ignored, since the call is to an interface, and not the implementation.
What I did instead is that, when building the CFA, I take every invokevirtual, also add calls to the implementations of the method for each subclass that implements it. In the resulting CFA, the single invokevirtual then has multiple outgoing calls instead of one, which seems to be handled fine by the existing algorithms, and taint analysis will also look at the implementations of the methods.
2. Starting from taint sources instead of the main entrypoint
Instead of directly using a BamCpaRun, I implemented a top level CpaRun based on a custom transfer relation, which will:
If the current position is an exit node, backtrack to the method entry, use the reduce operator to create the return state, and then uses all the callers as the successors.
If the current position is a call, it uses a BamCpaRun to analyze that call. The BamCpaRun will have the called method as the main method.
Otherwise, it just uses the regular JvmModelTrackingTransferRelation.
As this sometimes starts analyzing from the middle of a method, the stack is modified to just return an empty state instead of throwing when it's empty and someone tries to pop an item from it.
When, after backtracking, the return value of a function is not tained, the algorithm stops, since the only thing that could cause the state to be tainted again is another call to a taint source, and that is analyzed separately.
This allows me to drastically reduce the amount of code that is analyzed, since, instead of starting from a main method, I can start analysis from inside my source method. It also circumvented some issues I had, where some pieces of code were not reached from the main method. The program I'm analyzing is fairly huge, so it's kind of expected that coverage would not be perfect.
3. Nested Call Filter
This one was again really useful to optimize the amount of code analyzed. Basically it is a simple predicate, which allows the implementation to decide whether to actually analyze a function call. The predicate is called here in this if statement in BamTransferRelation. If the method should not be entered, it is treated like an unknown method.
Given that:
The run is directly starting from the sources as entry points (see above)
The heap is not being tracked (i.e. a forgetful heap model is being used)
A call where none of the operands is tainted can be filtered out, as there is no way for any code in that call to be tained. If that code calls a taint source again, it will also be an entry point, and that will be analyzed separately.
4. Taint analysis from field access
This one is fairly straightforward and probably an intended functionality, but it seems that the taint analysis algorithm doesn't implement it currently:
By overriding the JvmForgetfulHeapAbstractState, and returning a tainted state from getFieldOrDefault depending on the passed fqn, it's possible to use a field as a taint source, even without requiring one of the more complex heap models.
The text was updated successfully, but these errors were encountered:
Hi! I've been implementing some static analysis (somewhat similar to taint analysis, but the propagated state is a bit more complex). During this, I've implemented some improvements to the various components that are involved, which vastly improved results for me, and I just wanted to share them in case you'd be interested in these changes, perhaps they are relevant 🙂
I wrote them in kotlin in our internal code, but I'd be happy to make a PR if the changes are something that would be useful to you.
Note that some of these changes are based on the assumption that the heap is not tracked.
1. Replacing virtual calls with calls to implementations
This greatly improved the results because I have a lot of cases like this:
The
main
method callstakeTheArgument
on anInterface
with a tainted argument, but the call is ignored, since the call is to an interface, and not the implementation.What I did instead is that, when building the CFA, I take every
invokevirtual
, also add calls to the implementations of the method for each subclass that implements it. In the resulting CFA, the singleinvokevirtual
then has multiple outgoing calls instead of one, which seems to be handled fine by the existing algorithms, and taint analysis will also look at the implementations of the methods.2. Starting from taint sources instead of the main entrypoint
Instead of directly using a
BamCpaRun
, I implemented a top levelCpaRun
based on a custom transfer relation, which will:BamCpaRun
to analyze that call. TheBamCpaRun
will have the called method as the main method.JvmModelTrackingTransferRelation
.As this sometimes starts analyzing from the middle of a method, the stack is modified to just return an empty state instead of throwing when it's empty and someone tries to pop an item from it.
When, after backtracking, the return value of a function is not tained, the algorithm stops, since the only thing that could cause the state to be tainted again is another call to a taint source, and that is analyzed separately.
This allows me to drastically reduce the amount of code that is analyzed, since, instead of starting from a main method, I can start analysis from inside my source method. It also circumvented some issues I had, where some pieces of code were not reached from the main method. The program I'm analyzing is fairly huge, so it's kind of expected that coverage would not be perfect.
3. Nested Call Filter
This one was again really useful to optimize the amount of code analyzed. Basically it is a simple predicate, which allows the implementation to decide whether to actually analyze a function call. The predicate is called here in this if statement in
BamTransferRelation
. If the method should not be entered, it is treated like an unknown method.Given that:
A call where none of the operands is tainted can be filtered out, as there is no way for any code in that call to be tained. If that code calls a taint source again, it will also be an entry point, and that will be analyzed separately.
4. Taint analysis from field access
This one is fairly straightforward and probably an intended functionality, but it seems that the taint analysis algorithm doesn't implement it currently:
By overriding the
JvmForgetfulHeapAbstractState
, and returning a tainted state fromgetFieldOrDefault
depending on the passedfqn
, it's possible to use a field as a taint source, even without requiring one of the more complex heap models.The text was updated successfully, but these errors were encountered: