-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: C++ symbol name mangling type extraction #3
Comments
Found some good implementations of this:
|
As someone mentioned it on discord, here's a few more references: This python implementation seems to convert things to an ast, but it's from 2017 and misses a number of token types. |
Hi, rui314 here. My demangler implementation is not complete and I cannot recommend using it. If you are looking for a more comprehensive, production-quality demangler, please take a look at LLVM's source code. |
Hey @rui314, no worries. I wasn't planning on lifting it, although I definitely truly appreciate the implementation. I need an actual AST that's produced to extract the tokens I need for reverse-engineering which is why I'm just collecting notes of other implementations in this 3 yr. old issue. |
Just started working towards this..like 2 weekends ago iirc. It's not the exact solution, but it will end up being easier to maintain for a single human. I have the core parts that I need already written and residing within in the persistence-refactor branch (master...persistence-refactor). Just need to apply it and get a variety of samples from somewhere to test it against more complicated names than just random things I have IDBs for. |
So far, the work on this seems to have been effective. The only thing that is missing are wrappers that enable the core logic to process more than just function types. There's a few edge-cases with some of the backtick-single-quote names that the disassembler demangles, but personally I've only found it useful to just strip those parts out from the final name. Essentially, there's an object that was introduced to the |
Mangled C++ names contain a load type information in them. Unfortunately all of the C++ demanglers (and the
idc.Demangle
funtion in IDA) only provide an API for converted a string between its mangled and unmangled form. Theinternal.declaration
module was actually initially split up into its own module in order to support these different mangling formats. So, instead of demangling or mangling a symbol, the idea was to have the capability to extract the different components from a mangled symbol name.This way if a Borland, GNU, or a Microsoft Visual C++ mangled symbol is known, one would be able to extract type information from it such as what type of method it is, its arguments + types, its calling convention, and its return type. This can then be exposed to the user or used by the
function
module to eventually group different symbols together.As mentioned before, the
internal.declaration
module was originally for this, but (due to laziness on part of the author) it was hacked together with regexes. This, of course, is a completely non-elegant and incorrect solution to this problem. To complete this, a proper parser will need to be implemented for the different mangling formats which can then be integrated intointernal.declaration
.The text was updated successfully, but these errors were encountered: