Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved enum type propagation #6709

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kkots
Copy link

@kkots kkots commented Jul 10, 2024

Ghidra decompiler is having problems decompiling usage of bit-fields or flag-fields that I've defined in Ghidra as enums. For example, I would have my data types set up like this:

enum MyEnum {
  ENUM_MEMBER_1 = 1,
  ENUM_MEMBER_2 = 2
};

struct MyStruct {
  MyEnum enumField;
};

, and Ghidra would decompile it sometimes like this:

if (*(byte*)&myStructWithAnEnum->enumField & 1) == 0) {

, if the original program only wanted to check the lower 8 bits. This is fine, except that the 1 is not being recognized as an enum.

Another issue is even when this doesn't happen, the constants would not always be parsed as enum elements. For example, if the enum elements are defined like this:

enum MyEnum {
  ENUM_MEMBER_1 = 1,
  ENUM_MEMBER_2 = 2,
  ENUM_MEMBER_3 = 4,
  ENUM_MEMBER_4 = 8,
  ENUM_MEMBER_1_and_2 = 3
};

, Ghidra would always fail to parse constants that contain bit No.0 or bit No.1 + any other bits and possibly other constant values. You could solve this by removing the ENUM_MEMBER_1_and_2, but if you load enums from a C source or just don't know about this trick it becomes a major inconvenience. Besides, some extra enum elements sometimes contain a large amount of flags and serve as shortcuts.

Another issue is with enums of size 8 that are used in a 32-bit x86 program. It would actually break the field into two 4-byte sized fields, and when it tries to do read/write the upper 4 bytes of the enum the decompiled code would look like this:

if ((*(byte *)((int)&myStructWithAnEnum->enumField + 4) & 1) != 0) {

This is fine except that the 1 here doesn't get parsed as an enum element (or a |-combination of them) and it's actually supposed to be a (((MyEnum)(1<<32))>>32) (this is not how it is in the source code, but for decompilation purposes this representation would suffice).

This PR makes the following changes to the decompiler:

  1. Better matching algorithm of numbers to enum element values, breaking a number into a (possibly ~'ed) |-combination of elements even when the enum does not only have elements that are power-of-2.
  2. Removed distinction between enums that only have elements that are power-of-2 and all the other enums.
  3. Added new rule that allows varnodes that are constants become enum types through &, |, ==, !=, CPUI_STORE using a limited algorithm that covers a very select set of situations. The rule could even allow enums of size 8 that are read at offset 4 (upper 4 bytes only) to propagate in this way to constants.
  4. Constants of type enum of size 8 that had been read from offset 4 (upper 4 bytes only) would be rendered with an added >>0x20 at the end if represented successfully using enum element names.
  5. Limited ability of &, | and ^ to propagate types in direction from constants of enum type towards a non-constant.

EDIT: Now also allows consts to becomes enum types when they're used in situations like this:

return X >> 0x12 & 1;

Example bytecode (corresponds to the above output):

FUN_00501330
        00501330 8b 81 28        MOV        EAX,dword ptr [ECX + 0x128]
                 01 00 00
        00501336 c1 e8 12        SHR        EAX,0x12
        00501339 83 e0 01        AND        EAX,0x1
        0050133c c3              RET

ECX is the 'this' of the __thiscall function FUN_00501330. ECX+0x128 points to a member of type Enum. The function returns flag 0x40000 from the enum as bool (0 or 1).

My fix turns it into:

return X >> 0x12 & ENUM_MEMBER_0x40000 >> 0x12;

VarnodeToScan& newVnToScan = varnodes.back();
newVnToScan.varnode = def->getOut();
newVnToScan.slot = -1;
newVnToScan.operationSize = def->getOut()->getSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can crash here if def->getOut() == 0:

Program received signal SIGSEGV, Segmentation fault.
ghidra::Varnode::getSize (this=0x0) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/include/ghidra/varnode.hh:185
185       int4 getSize(void) const { return size; } ///< Get the number of bytes this Varnode stores
(gdb) bt full
#0  ghidra::Varnode::getSize (this=0x0) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/include/ghidra/varnode.hh:185
No locals.
#1  0x000055aaa79e18a9 in ghidra::ActionPropagateEnums::apply (this=0x55aaca1268c0, data=...) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/ruleaction.cc:11132
        newVnToScan = @0x55aaca129b80: {varnode = 0x0, operationSize = 0, subpieceUpper4Bytes = false, slot = -1}
        code = ghidra::CPUI_CALL
        constant = 0x55aaca136520
        def = 0x55aaca129ff0
        varnodes = std::vector of length 1, capacity 1 = {{varnode = 0x0, operationSize = 0, subpieceUpper4Bytes = false, slot = -1}}
        bb = 0x55aaca12e040
        j = 2
        iter = 0x55aaca139f90
        op = 0x55aaca139f90
        basicblocks = @0x55aaca157a28: {<ghidra::FlowBlock> = {_vptr$FlowBlock = 0x55aaa79f8b28 <vtable for ghidra::BlockGraph+16>, flags = 0, parent = 0x0, immed_dom = 0x0, 
            copymap = 0x6574756269, index = 0, visitcount = 0, numdesc = 8, intothis = std::vector of length 0, capacity 0, outofthis = std::vector of length 0, capacity 0}, 
          list = std::vector of length 4, capacity 4 = {0x55aaca12d9c0, 0x55aaca12dde0, 0x55aaca12e040, 0x55aaca12e220}}
#2  0x000055aaa76b08ad in ghidra::Action::perform (this=0x55aaca1268c0, data=...) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/action.cc:319
        res = 32766
#3  0x000055aaa76b153e in ghidra::ActionGroup::apply (this=0x55aaca116bf0, data=...) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/action.cc:512
        res = 0
#4  0x000055aaa76b178c in ghidra::ActionRestartGroup::apply (this=0x55aaca116bf0, data=...) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/action.cc:558
        iter = 0x7ffe423356f0
        res = 32766
#5  0x000055aaa76b08ad in ghidra::Action::perform (this=0x55aaca116bf0, data=...) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/action.cc:319
        res = 32766
#6  0x000055aaa77c9098 in ghidra::DecompileAt::rawAction (this=0x55aaca0fbe00) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/ghidra_process.cc:317
        fd = 0x55aaca157740
#7  0x000055aaa77c81ba in ghidra::GhidraCommand::doit (this=0x55aaca0fbe00) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/ghidra_process.cc:144
        type = 3
        err = <error reading variable: Cannot access memory at address 0x0>
        err = @0x7fda79360dec: {<ghidra::LowlevelError> = {explain = <incomplete type>}, type = <incomplete type>}
        err = @0x7ffe42335b40: {<ghidra::LowlevelError> = {explain = <incomplete type>}, <No data fields>}
        err = @0x55aaa76b8ba1: {explain = <incomplete type>}
#8  0x000055aaa77ca0e1 in ghidra::GhidraCapability::readCommand (sin=..., out=...) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/ghidra_process.cc:491
        function = <incomplete type>
        type = 2
        iter = {first = <incomplete type>, second = 0x55aaca0fbe00}
#9  0x000055aaa77caacf in main (argc=1, argv=0x7ffe42335df8) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/ghidra_process.cc:533
        status = 0
(gdb) frame 1
#1  0x000055aaa79e18a9 in ghidra::ActionPropagateEnums::apply (this=0x55aaca1268c0, data=...) at /tmp/ghidra/Ghidra/Features/Decompiler/src/decompile/cpp/ruleaction.cc:11132
11132           newVnToScan.operationSize = def->getOut()->getSize();
(gdb) print def->getOut()
$1 = (ghidra::Varnode *) 0x0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jobermayr Thank you! I've made a commit that adds checks for 0 on this.

@jobermayr
Copy link
Contributor

Can you please adapt it to commit 520dc99?

@kkots
Copy link
Author

kkots commented Aug 23, 2024

@jobermayr Updated!

@DualTachyon
Copy link

Dear @kkots,

Thanks for working on this pull request and the one at #6722. I believe I am affected by the same issues and think your PRs would fix my problems.

I have tried to apply both PRs in my local build however, one of the hunks doesn't apply correctly for me. There is a conflict with AddTreeState::apply(void) @ ruleaction.cc, which is being modified by both the PRs. The block "if (multiple..." is not compatible with the changes that #6722 applies in the same area.

Best regards.

@kkots
Copy link
Author

kkots commented Oct 1, 2024

@DualTachyon I have tried to resolve this by updating my branches with Ghidra's latest changes, but there is still a conflict when applying both merges. However, this conflict can be ignored because the two PRs are not actually making changes in the same places. Please let me know if there are still any problems.

@DualTachyon
Copy link

Oops. I made a mistake into which issue I replied to. I had all 3 of yours open (#6709, #6718, #6722). The 2 PRs that don't merge together are 6718 and 6722.

@kkots
Copy link
Author

kkots commented Oct 2, 2024

@DualTachyon Now I see. I made changes to both PRs - fix_indexing_into_arrays (PR 6722) and fix_shifted_structure_offset (PR 6718) so that the latter includes changes from the former and it's a bit more obvious how to merge them. In case there's still a conflict, override everything with changes from PR 6718, because it includes all the fixes from PR 6722.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Triage Information is being gathered
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants