-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More i386 xrefs #899
base: master
Are you sure you want to change the base?
More i386 xrefs #899
Changes from all commits
afe9a48
555b606
d29ee1a
ef13b33
ce80cd8
c4458cb
47c79f1
606d01a
3fdef6c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -473,6 +473,42 @@ def get_struct_string_candidates(pe: pefile.PE) -> Iterable[StructString]: | |
# dozens of seconds or more (suspect many minutes). | ||
|
||
|
||
def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]: | ||
""" | ||
scan for raw xrefs that are 32-bit absolute addresses in the PE file (i386). | ||
They are not encoded as struct String instances. | ||
|
||
example: | ||
.rdata:004D6234 dd offset unk_4C85C9 | ||
.rdata:004D6238 dd offset unk_4C85C3 | ||
.rdata:004D623C dd offset unk_4C85BB | ||
.rdata:004D6240 dd offset unk_4C85B3 | ||
|
||
From the disassembly, they are called as follows: | ||
.text:00498E56 push ds:off_4D61E0[ecx*4] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where does the length get stored? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not; the raw xrefs are stored without explicit length information. Lengths are not included in this context. Do we need them, or is there a specific reason for considering length storage? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, I don't think we need them. but I wondered how Go is able to use the string data without an associated length. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, I couldn't find any specific information for Go, but I did come across a similar approach in Rust. I've kept it in the utils.py file, considering the possibility that we might encounter a similar scenario in the future when exploring other languages. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am still curious how this data can be used as a string without the length being stored somewhere. |
||
|
||
The above are not struct String instances, but are references to strings in the PE file. | ||
They can be used to divide the string blobs into smaller chunks. | ||
""" | ||
format = "I" | ||
|
||
if not buf: | ||
return | ||
|
||
low, high = get_image_range(pe) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this appears to be the range of the entire file, not the |
||
|
||
# using array module as a high-performance way to access the data as fixed-sized words. | ||
words = iter(array.array(format, buf)) | ||
|
||
last = next(words) | ||
for current in words: | ||
address = last | ||
last = current | ||
|
||
if address != 0x0 and low <= address < high: | ||
yield address | ||
|
||
|
||
def get_extract_stats( | ||
pe: pefile, all_ss_strings: List[StaticString], lang_strings: List[StaticString], min_len: int, min_blob_len=0 | ||
) -> float: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -80,3 +80,21 @@ def test_push(request, string, offset, encoding, rust_strings): | |
) | ||
def test_mov_jmp(request, string, offset, encoding, rust_strings): | ||
assert StaticString(string=string, offset=offset, encoding=encoding) in request.getfixturevalue(rust_strings) | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"string,offset,encoding,rust_strings", | ||
[ | ||
# .rdata:004BFA48 dd offset unk_4BA13A | ||
# .rdata:004BFA4C dd offset unk_4BA100 | ||
pytest.param("Invalid branch target in DWARF expression", 0xB813A, StringEncoding.UTF8, "rust_strings32"), | ||
pytest.param( | ||
"Expected to find an FDE pointer, but found a CIE pointer instead.", | ||
0xB8163, | ||
StringEncoding.UTF8, | ||
"rust_strings32", | ||
), | ||
], | ||
) | ||
def test_raw_xrefs(request, string, offset, encoding, rust_strings): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i think its good that we have integration tests that show the whole system working together to find the strings. I think we should also have some tests for the specific routines that you added, so we can verify their behavior directly. something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi, what should be the best approach to that? As there are not just |
||
assert StaticString(string=string, offset=offset, encoding=encoding) in request.getfixturevalue(rust_strings) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a few tests for these strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This routine doesn't seem limited to
i386
so lets remove that from the function name. Otherwise, we should add a check to the PE architecture to restrict it to i386.If the data are virtual addresses (rather than RVAs), we could additionally use relocation entries to find pointers and/or verify this data is in fact a pointer.