Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: C++ symbol name mangling type extraction #3

Open
arizvisa opened this issue Sep 23, 2018 · 6 comments
Open

Feature: C++ symbol name mangling type extraction #3

arizvisa opened this issue Sep 23, 2018 · 6 comments

Comments

@arizvisa
Copy link
Owner

Mangled C++ names contain a load type information in them. Unfortunately all of the C++ demanglers (and the idc.Demangle funtion in IDA) only provide an API for converted a string between its mangled and unmangled form. The internal.declaration module was actually initially split up into its own module in order to support these different mangling formats. So, instead of demangling or mangling a symbol, the idea was to have the capability to extract the different components from a mangled symbol name.

This way if a Borland, GNU, or a Microsoft Visual C++ mangled symbol is known, one would be able to extract type information from it such as what type of method it is, its arguments + types, its calling convention, and its return type. This can then be exposed to the user or used by the function module to eventually group different symbols together.

As mentioned before, the internal.declaration module was originally for this, but (due to laziness on part of the author) it was hacked together with regexes. This, of course, is a completely non-elegant and incorrect solution to this problem. To complete this, a proper parser will need to be implemented for the different mangling formats which can then be integrated into internal.declaration.

@arizvisa arizvisa changed the title Feature: C++ symbol mangling type extraction Feature: C++ symbol name mangling type extraction Sep 23, 2018
@arizvisa
Copy link
Owner Author

Found some good implementations of this:

@arizvisa
Copy link
Owner Author

arizvisa commented Nov 28, 2021

As someone mentioned it on discord, here's a few more references:
https://github.com/gimli-rs/cpp_demangle
https://mearie.org/documents/mscmangle/

This python implementation seems to convert things to an ast, but it's from 2017 and misses a number of token types.
https://github.com/AVGTechnologies/cppmangle/tree/master/cppmangle

@rui314
Copy link

rui314 commented Nov 29, 2021

Hi, rui314 here. My demangler implementation is not complete and I cannot recommend using it. If you are looking for a more comprehensive, production-quality demangler, please take a look at LLVM's source code.

@arizvisa
Copy link
Owner Author

Hey @rui314, no worries. I wasn't planning on lifting it, although I definitely truly appreciate the implementation.

I need an actual AST that's produced to extract the tokens I need for reverse-engineering which is why I'm just collecting notes of other implementations in this 3 yr. old issue.

@arizvisa
Copy link
Owner Author

arizvisa commented May 3, 2023

Just started working towards this..like 2 weekends ago iirc. It's not the exact solution, but it will end up being easier to maintain for a single human. I have the core parts that I need already written and residing within in the persistence-refactor branch (master...persistence-refactor). Just need to apply it and get a variety of samples from somewhere to test it against more complicated names than just random things I have IDBs for.

@arizvisa
Copy link
Owner Author

So far, the work on this seems to have been effective. The only thing that is missing are wrappers that enable the core logic to process more than just function types. There's a few edge-cases with some of the backtick-single-quote names that the disassembler demangles, but personally I've only found it useful to just strip those parts out from the final name.

Essentially, there's an object that was introduced to the internal.declaration module that lets you process a mangled name associated with an address so that you can separate the parameters, qualifiers, method name, and namespaces from it. With the way it currently works, however, another object would need to be created for non-function types. Presently, my only intention is to use this for fixing the names that are included when rendering an idaapi.tinfo_t to a tag. I believe the next thing that I need to do with this is to write a wrapper around the mangled name that cleans it up and strips the unparseable characters that are generated by the disassembler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants