Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More i386 xrefs #899

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions floss/language/rust/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,13 @@
import binary2strings as b2s

from floss.results import StaticString, StringEncoding
from floss.language.utils import find_lea_xrefs, find_mov_xrefs, find_push_xrefs, get_struct_string_candidates
from floss.language.utils import (
find_lea_xrefs,
find_mov_xrefs,
find_push_xrefs,
get_raw_xrefs_rdata_i386,
get_struct_string_candidates,
)

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -151,7 +157,8 @@ def get_string_blob_strings(pe: pefile.PE, min_length: int) -> Iterable[StaticSt
xrefs_lea = find_lea_xrefs(pe)
xrefs_push = find_push_xrefs(pe)
xrefs_mov = find_mov_xrefs(pe)
xrefs = itertools.chain(struct_string_addrs, xrefs_lea, xrefs_push, xrefs_mov)
xrefs_raw_rdata = get_raw_xrefs_rdata_i386(pe, rdata_section.get_data())
xrefs = itertools.chain(struct_string_addrs, xrefs_lea, xrefs_push, xrefs_mov, xrefs_raw_rdata)

elif pe.FILE_HEADER.Machine == pefile.MACHINE_TYPE["IMAGE_FILE_MACHINE_AMD64"]:
xrefs_lea = find_lea_xrefs(pe)
Expand Down
28 changes: 28 additions & 0 deletions floss/language/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,34 @@ def get_struct_string_candidates(pe: pefile.PE) -> Iterable[StructString]:
# dozens of seconds or more (suspect many minutes).


def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a few tests for these strings

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:
def get_raw_xrefs_rdata(pe: pefile.PE, buf: bytes) -> Iterable[VA]:

This routine doesn't seem limited to i386 so lets remove that from the function name. Otherwise, we should add a check to the PE architecture to restrict it to i386.

If the data are virtual addresses (rather than RVAs), we could additionally use relocation entries to find pointers and/or verify this data is in fact a pointer.

"""
scan for raw xrefs in .rdata section
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are raw xrefs? can you add an example disassembly listing and add some comments, please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the screenshots in #885 (comment) I don't see if these are strings and it would help to have some comments explaining what the algorithm looks for

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw xrefs refer to unprocessed xrefs in the binary file, indicating points where strings can be divided. I'll add an example with comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you share a few example binary hashes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"""
format = "I"

if not buf:
return

low, high = get_image_range(pe)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be the range of the entire file, not the .rdata section. please update the logic or documentation to make things consistent.


# using array module as a high-performance way to access the data as fixed-sized words.
words = iter(array.array(format, buf))

last = next(words)
for current in words:
address = last
last = current

if address == 0x0:
continue

if not (low <= address < high):
continue

yield address
Arker123 marked this conversation as resolved.
Show resolved Hide resolved


def get_extract_stats(
pe: pefile, all_ss_strings: List[StaticString], lang_strings: List[StaticString], min_len: int, min_blob_len=0
) -> float:
Expand Down