More i386 xrefs #899

Arker123 · 2023-11-09T16:49:03Z

Referring to #885 (comment), this PR supplements additional xrefs discovered separately. For #885, the focus is solely on the UTF-decoder segment. 😄

mr-tz · 2023-11-10T08:42:18Z

floss/language/utils.py

@@ -465,6 +465,34 @@ def get_struct_string_candidates(pe: pefile.PE) -> Iterable[StructString]:
            # dozens of seconds or more (suspect many minutes).


+def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:
+    """
+    scan for raw xrefs in .rdata section


what are raw xrefs? can you add an example disassembly listing and add some comments, please?

from the screenshots in #885 (comment) I don't see if these are strings and it would help to have some comments explaining what the algorithm looks for

Raw xrefs refer to unprocessed xrefs in the binary file, indicating points where strings can be divided. I'll add an example with comments.

Thanks, can you share a few example binary hashes?

https://github.com/mandiant/flare-floss-testfiles/blob/master/language/rust/rust-hello/bin/rust-hello.exe

mr-tz · 2023-11-11T19:08:52Z

floss/language/utils.py

@@ -465,6 +465,48 @@ def get_struct_string_candidates(pe: pefile.PE) -> Iterable[StructString]:
            # dozens of seconds or more (suspect many minutes).


+def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:


please add a few tests for these strings

Suggested change

def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:

def get_raw_xrefs_rdata(pe: pefile.PE, buf: bytes) -> Iterable[VA]:

This routine doesn't seem limited to i386 so lets remove that from the function name. Otherwise, we should add a check to the PE architecture to restrict it to i386.

If the data are virtual addresses (rather than RVAs), we could additionally use relocation entries to find pointers and/or verify this data is in fact a pointer.

williballenthin

lgtm but let's also wait for @mr-tz

williballenthin · 2023-12-24T13:10:50Z

floss/language/utils.py

+    .rdata:004D6240                 dd offset unk_4C85B3
+
+    From the disassembly, they are called as follows:
+    .text:00498E56                 push    ds:off_4D61E0[ecx*4]


where does the length get stored?

It's not; the raw xrefs are stored without explicit length information. Lengths are not included in this context. Do we need them, or is there a specific reason for considering length storage?

no, I don't think we need them. but I wondered how Go is able to use the string data without an associated length.

Currently, I couldn't find any specific information for Go, but I did come across a similar approach in Rust. I've kept it in the utils.py file, considering the possibility that we might encounter a similar scenario in the future when exploring other languages.

I am still curious how this data can be used as a string without the length being stored somewhere.

Arker123 · 2024-06-23T12:45:46Z

Hey @mr-tz, can we merge this?

floss/language/utils.py

Co-authored-by: Vasco Schiavo <[email protected]>

williballenthin · 2024-06-24T06:57:04Z

floss/language/utils.py

+def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:
+    """
+    scan for raw xrefs in .rdata section.
+    raw xrefs are 32-bit absolute addresses to strings in .rdata section (i386).


raw xrefs are 32-bit absolute addresses to strings in .rdata section (i386).

This routine doesn't validate that the destination is string-like data. Should it? If not, lets remove this part of the documentation.

williballenthin · 2024-06-24T06:59:44Z

floss/language/utils.py

+    if not buf:
+        return
+
+    low, high = get_image_range(pe)


this appears to be the range of the entire file, not the .rdata section. please update the logic or documentation to make things consistent.

williballenthin · 2024-06-24T07:01:27Z

tests/test_language_extract_rust.py

+        ),
+    ],
+)
+def test_raw_xrefs(request, string, offset, encoding, rust_strings):


i think its good that we have integration tests that show the whole system working together to find the strings. I think we should also have some tests for the specific routines that you added, so we can verify their behavior directly. something like test_get_raw_xrefs.

Hi, what should be the best approach to that? As there are not just test_get_raw_xrefs, but also others such as find_i386_push_xrefs, find_lea_xrefs, etc., should we test them too in a separate file or another PR? What are your thoughts?

williballenthin

The concept here seems reasonable, though the documentation is inconsistent with the logic, so please fix that and then we can merge.

…6-xrefs

More i386 xrefs

afe9a48

mr-tz reviewed Nov 10, 2023

View reviewed changes

Arker123 added 2 commits November 10, 2023 17:14

Added Comments for raw xrefs

555b606

Tweaks

d29ee1a

mr-tz reviewed Nov 11, 2023

View reviewed changes

Arker123 and others added 2 commits December 24, 2023 14:06

Merge branch 'master' into Discovered-more-i386-xrefs

ef13b33

Add tests

ce80cd8

williballenthin approved these changes Dec 24, 2023

View reviewed changes

VascoSch92 reviewed Jun 23, 2024

View reviewed changes

floss/language/utils.py Outdated Show resolved Hide resolved

Arker123 and others added 2 commits June 24, 2024 10:20

Update floss/language/utils.py

c4458cb

Co-authored-by: Vasco Schiavo <[email protected]>

Code Style

47c79f1

williballenthin reviewed Jun 24, 2024

View reviewed changes

williballenthin requested changes Jun 24, 2024

View reviewed changes

Arker123 added 2 commits June 26, 2024 10:31

Merge remote-tracking branch 'origin/master' into Discovered-more-i38…

606d01a

…6-xrefs

Update Documentation

3fdef6c

Arker123 requested a review from williballenthin June 30, 2024 20:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More i386 xrefs #899

More i386 xrefs #899

Arker123 commented Nov 9, 2023

mr-tz Nov 10, 2023

mr-tz Nov 10, 2023

Arker123 Nov 10, 2023

mr-tz Nov 10, 2023

Arker123 Nov 10, 2023

mr-tz Nov 11, 2023

williballenthin Jun 24, 2024

williballenthin left a comment

williballenthin Dec 24, 2023

Arker123 Dec 26, 2023

williballenthin Dec 26, 2023

Arker123 Dec 26, 2023

williballenthin Jun 24, 2024

Arker123 commented Jun 23, 2024

williballenthin Jun 24, 2024

williballenthin Jun 24, 2024

williballenthin Jun 24, 2024

Arker123 Jun 26, 2024

williballenthin left a comment

		@@ -465,6 +465,48 @@ def get_struct_string_candidates(pe: pefile.PE) -> Iterable[StructString]:
		# dozens of seconds or more (suspect many minutes).


		def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:

	def get_raw_xrefs_rdata_i386(pe: pefile.PE, buf: bytes) -> Iterable[VA]:
	def get_raw_xrefs_rdata(pe: pefile.PE, buf: bytes) -> Iterable[VA]:

More i386 xrefs #899

Are you sure you want to change the base?

More i386 xrefs #899

Conversation

Arker123 commented Nov 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williballenthin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Arker123 commented Jun 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

williballenthin left a comment

Choose a reason for hiding this comment