-
Notifications
You must be signed in to change notification settings - Fork 275
new_architecture
TODO : each architecture requires a C file for the analyzer.
Specific files for an architecture are in the folder plasma/lib/arch/<NEW_ARCH>
.
Four files are mandatory to add a new architecture :
-
utils.py
: it defines some functions to detect jump/return/call/compare instructions and how instruction symbols must be printed (example add for x86 is "+="). -
output.py
: this is the implementation of the abstract classplasma.lib.output
. -
process_ast.py
: you can define functions to modify the ast after a decompilation. -
__init__.py
: it contains the list of all functions defined inprocess_ast.py
.
Define two global variables :
OP_IMM = <ARCH>_OP_IMM
OP_MEM = <ARCH>_OP_MEM
Define a list of known function prologs. Due to a limitation in plasma.lib.analyzer.has_analyzer
, one
instruction cannot have more than 4 bytes.
PROLOGS = [
[b"\x12\x34\x56"], # inst1, inst2, ...
...
]
Define a list containing all condition id with their opposite.
OPPOSITES = [
[X86_INS_JE, X86_INS_JNE],
...
]
OPPOSITES = dict(OPPOSITES + [i[::-1] for i in OPPOSITES])
Define a dictionnary containing a string for each instruction you want to print differently.
INST_SYMB = {
X86_INS_JE: "==",
X86_INS_JNE: "!=",
...
X86_INS_XOR: "^=",
X86_INS_OR: "|=",
...
}
Then implement all these functions :
def is_cmp(i):
return i.id == <COMPARE_ID_INSTRUCTION>
def is_jump(i):
return i.group(CS_GRP_JUMP)
def is_cond_jump(i):
return i.group(CS_GRP_JUMP) and i.id != <UNCONDITIONAL_JUMP>
def is_uncond_jump(i):
return i.id == <UNCONDITIONAL_JUMP>
def is_ret(i):
return i.group(CS_GRP_RET)
def is_call(i):
return i.group(CS_GRP_CALL)
def cond_symbol(ty):
return INST_SYMB.get(ty, "UNKNOWN")
def inst_symbol(i):
return INST_SYMB.get(i.id, "UNKNOWN")
Generally the condition is the same as the instruction id. But for ARM a condition can be set
on each instruction, in this case use i.cc
.
def invert_cond(i):
return OPPOSITES.get(i.id, -1)
def get_cond(i):
return i.id
Two functions from plasma.lib.output may be useful : _imm
and _add
. The first is used to print
an immediate value and the second to print a string. For RISC architectures you can get the operand
size by doing self.gctx.dis.mode & CS_MODE_32
.
COND_ADD_ZERO
is a list of condition id. It means that after each instruction with this cond id,
we have to add a 0. Example for mips : beqz $t1, label -> if == 0
.
ASSIGNMENT_OPS
is a list of instruction id indicating which instruction can be fused with a
conditional instruction. An instruction must be an assignment, not a comparison (example add, and, ...).
The fusion must be implemented in <NEW_ARCH>.process_ast
, if not you can let this list empty.
from capstone import CS_MODE_32
from capstone.<ARCH> import ...
from plasma.lib.output import OutputAbs
from plasma.lib.arch.<NEW_ARCH>.utils import (inst_symbol, is_call, is_jump, is_ret, is_uncond_jump, cond_symbol)
COND_ADD_ZERO = [ ... ]
ASSIGNMENT_OPS = [ ... ]
class Output(OutputAbs):
In the function _sub_asm_inst
you can define a specific display for each instruction.
ret/call/jumps are printed later in the function _asm_inst
so you can't rewrite them here.
def _sub_asm_inst(self, i, tab=0):
modified = False
if self.gctx.capstone_string == 0:
if i.id == <INSTRUCTION_ID>:
# do something ...
modified = True
...
if not modified:
if len(i.operands) > 0:
self._add("%s " % i.mnemonic)
self._operand(i, 0)
k = 1
while k < len(i.operands):
self._add(", ")
self._operand(i, k)
k += 1
else:
self._add(i.mnemonic)
For a first test you can let empty the function _operand
and just do :
def _sub_asm_inst(self, i, tab=0):
self._add(self.get_inst_str(i))
The function _operand
is called on each operands of each instructions.
-
i
: capstone instruction -
num_op
: the nth operand to print from i.operands -
hexa
: if the operand is an immediate and must be printed in hexa -
show_deref
: used with memory access, it indicates if it should print*()
. For example, thelea
instruction in x86 set show_deref to False. -
force_dont_print_data
: if False and if the operand is a pointer (immediate) to a string, it will print the string near. Set it to True is used for call and jumps : a string is never printed.
_
def _operand(self, i, num_op, hexa=False, show_deref=True, force_dont_print_data=False):
def inv(n):
return n == CS_OP_INVALID
op = i.operands[num_op]
if op.type == CS_OP_IMM:
self._imm(op.value.imm, op_size, hexa, force_dont_print_data=force_dont_print_data)
elif op.type == CS_OP_REG:
self._add(i.reg_name(op.value.reg))
elif op.type == MIPS_OP_MEM:
mm = op.mem
printed = False
# Is the access contains a register with a known value ?
# example : for x86 we can compute any access [eip + DISP]
# We should call `self.deref_if_offset` for any known address.
# This code is more or less generic, you just need to adapt it to the
# architecture. (memory access can have a base, segment, index, disp,
# shift (for arm), ...
if show_deref:
self._add("*(")
if not inv(mm.base):
self._add("%s" % i.reg_name(mm.base))
printed = True
if mm.disp != 0:
section = self._binary.get_section(mm.disp)
if self.is_label(mm.disp) or section is not None:
if printed:
self._add(" + ")
self._imm(mm.disp, 0, True, section=section, print_data=False,
force_dont_print_data=force_dont_print_data)
else:
if printed:
if mm.disp < 0:
self._add(" - %d" % (-mm.disp))
else:
self._add(" + %d" % mm.disp)
else:
self._add("%d" % mm.disp)
if show_deref:
self._add(")")
# Is there any op.type in the architecture ?
The function _if_cond
is used to print the statement if (...)
in the decompilation mode.
cond
is the condition id (returned by <NEW_ARCH>.utils.get_cond
). fused_inst
is the instruction
which is fused with the jump (example cmp with jne). It's equal to None if no fusion was done.
If the fusion was not implemented, you can ignore this parameter.
This function must be reimplemented for the moment because in x86, there is a special case with the
instruction test. Only test reg1, reg1
is used.
def _if_cond(self, cond, fused_inst):
if fused_inst is None:
self._add(cond_symbol(cond))
if cond in COND_ADD_ZERO:
self._add(" 0")
return
assignment = fused_inst.id in ASSIGNMENT_OPS
if assignment:
self._add("(")
self._add("(")
self._operand(fused_inst, 0)
self._add(" ")
if assignment:
self._add(inst_symbol(fused_inst))
self._add(" ")
self._operand(fused_inst, 1)
self._add(") ")
self._add(cond_symbol(jump_cond))
else:
self._add(cond_symbol(cond))
self._add(" ")
self._operand(fused_inst, 1)
if (fused_inst.id != <CMP_INSTRUCTION> and \
(cond in COND_ADD_ZERO or assignment)):
self._add(" 0")
self._add(")")
Define all functions to process the ast after a deocmpilation. You can fuse instructions here.
import plasma.lib.arch.<NEW_ARCH>.output
import plasma.lib.arch.<NEW_ARCH>.utils
import plasma.lib.arch.<NEW_ARCH>.process_ast
registered = [
process_ast.function_1,
...
]
-
lib.disassembler
: update the functionload_arch_module
. -
lib.fileformat.[elf, raw]
: update variablesarch_lookup
andarch_mode_lookup
. Check also functionsload_static_sym
andload_dyn_sym
in elf if they are correct. -
lib.ui.console
: update the function__exec_info
. -
lib.analyzer
: update the functionset
. Search the wordis_x86
, you will see where it's arch-dependant. -
lib.__init__.py
: update the help (think about the --raw)