Open
Description
I old incarnations of rematch, we've used a feature extractor that proved to be occasionally useful.
The feature extractor was, put simply, a histogram of basic-block sizes.
This could easily be augmented by adding other types of information into the vector, but was useful enough on it's own so seems like a good start.
Additional data piece: External cross-references (cross-references to objects outside the function, usually other functions through calls or global data access).
Tasks:
- Add Vector class
- Add distance matcher (based on HistMatch or brand new one)
- Include external cross references.
- Include types of cross-references in a different vector portion.
A simple example:
vector = np.zeros((4096,))
size = 0
for item_ea in FuncItems(ea):
if InternalXrefsToItem(item_ea):
vector[size] += 1
size = ItemSize(item_ea)
if InternalXrefsFromItem(item_ea):
vector[size] += 1
size = 0
else:
size += ItemSize(item_ea)
if np.sum(vector) > 8:
return vector
else:
return None