-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More i386 xrefs #899
base: master
Are you sure you want to change the base?
More i386 xrefs #899
Conversation
floss/language/utils.py
Outdated
@@ -465,6 +465,34 @@ def get_struct_string_candidates(pe: pefile.PE) -> Iterable[StructString]: | |||
# dozens of seconds or more (suspect many minutes). | |||
|
|||
|
|||
def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]: | |||
""" | |||
scan for raw xrefs in .rdata section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are raw xrefs? can you add an example disassembly listing and add some comments, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the screenshots in #885 (comment) I don't see if these are strings and it would help to have some comments explaining what the algorithm looks for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw xrefs refer to unprocessed xrefs in the binary file, indicating points where strings can be divided. I'll add an example with comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, can you share a few example binary hashes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -465,6 +465,48 @@ def get_struct_string_candidates(pe: pefile.PE) -> Iterable[StructString]: | |||
# dozens of seconds or more (suspect many minutes). | |||
|
|||
|
|||
def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a few tests for these strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]: | |
def get_raw_xrefs_rdata(pe: pefile.PE, buf: bytes) -> Iterable[VA]: |
This routine doesn't seem limited to i386
so lets remove that from the function name. Otherwise, we should add a check to the PE architecture to restrict it to i386.
If the data are virtual addresses (rather than RVAs), we could additionally use relocation entries to find pointers and/or verify this data is in fact a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm but let's also wait for @mr-tz
.rdata:004D6240 dd offset unk_4C85B3 | ||
|
||
From the disassembly, they are called as follows: | ||
.text:00498E56 push ds:off_4D61E0[ecx*4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does the length get stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not; the raw xrefs are stored without explicit length information. Lengths are not included in this context. Do we need them, or is there a specific reason for considering length storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I don't think we need them. but I wondered how Go is able to use the string data without an associated length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I couldn't find any specific information for Go, but I did come across a similar approach in Rust. I've kept it in the utils.py file, considering the possibility that we might encounter a similar scenario in the future when exploring other languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still curious how this data can be used as a string without the length being stored somewhere.
Hey @mr-tz, can we merge this? |
Co-authored-by: Vasco Schiavo <[email protected]>
floss/language/utils.py
Outdated
def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]: | ||
""" | ||
scan for raw xrefs in .rdata section. | ||
raw xrefs are 32-bit absolute addresses to strings in .rdata section (i386). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw xrefs are 32-bit absolute addresses to strings in .rdata section (i386).
This routine doesn't validate that the destination is string-like data. Should it? If not, lets remove this part of the documentation.
if not buf: | ||
return | ||
|
||
low, high = get_image_range(pe) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to be the range of the entire file, not the .rdata
section. please update the logic or documentation to make things consistent.
), | ||
], | ||
) | ||
def test_raw_xrefs(request, string, offset, encoding, rust_strings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its good that we have integration tests that show the whole system working together to find the strings. I think we should also have some tests for the specific routines that you added, so we can verify their behavior directly. something like test_get_raw_xrefs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, what should be the best approach to that? As there are not just test_get_raw_xrefs
, but also others such as find_i386_push_xrefs
, find_lea_xrefs
, etc., should we test them too in a separate file or another PR? What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept here seems reasonable, though the documentation is inconsistent with the logic, so please fix that and then we can merge.
Referring to #885 (comment), this PR supplements additional xrefs discovered separately. For #885, the focus is solely on the UTF-decoder segment. 😄