Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong/confusing memory addrs on Windows #76

Open
Swatinem opened this issue Nov 16, 2021 · 4 comments
Open

Wrong/confusing memory addrs on Windows #76

Swatinem opened this issue Nov 16, 2021 · 4 comments

Comments

@Swatinem
Copy link

This came up in getsentry/sentry-rust#386, with a possible workaround in getsentry/sentry-rust#387.

On Windows, we currently get values such as these:

actual_load_addr() = "0x7ff6b4091000"
stated_load_addr() = "0x1000"
virtual_memory_bias() = "0x7ff6b4090000"

Trying to use the actual_load_addr for symbolication yields the wrong results, and virtual_memory_bias() would give us the right value, though the name is very confusing in that case, and does not match the impl on other OSs.

virtual_memory_bias internally uses the module_base, and that value is also in line with what minidumps, and other minidump-related tools like breakpad and crashpad are yielding, and our symbolication is based on those.

CC @mitsuhiko (original author of #37) @jan-auer

We had a small talk yesterday, and concluded that it probably does not make sense to have this distinction on Windows at all, and essentially stated should always be 0, and thus actual == bias.

However I have looked a bit at the impl here, and it seems that those values are based on iterating/finding the executable segments of an executable, so not sure what the right approach to fix this might be right now.

@bjorn3
Copy link

bjorn3 commented Nov 16, 2021

The actual load address is the stated load address + the bias on all platforms on all platforms. The docs of actual_load_addr and vrtual_load_addr say that they return the address of the first segment, which on unix often starts at the first byte, but seemingly doesn't on windows.

ProgramHeaders(11):
  Idx   Type              Flags   Offset    Vaddr     Paddr     Filesz    Memsz     Align   
  0     PT_PHDR           R       0x40      0x40      0x40      0x268     0x268     0x8     
  1     PT_INTERP         R       0x2a8     0x2a8     0x2a8     0x1c      0x1c      0x1     
  2     PT_LOAD           R       0x0       0x0       0x0       0x1240    0x1240    0x1000  
  3     PT_LOAD           R+X     0x2000    0x2000    0x2000    0x3fb9    0x3fb9    0x1000  
  4     PT_LOAD           R       0x6000    0x6000    0x6000    0x2220    0x2220    0x1000  
  5     PT_LOAD           RW      0x8d90    0x9d90    0x9d90    0x470     0x470     0x1000

The first PT_LOAD has vaddr of 0, which is what findshlibs returns for the first segment as stated_virtual_memory_address and thus as virtual_load_addr. On windows the first segment doesn't cover the file header and starts at stated_virtual_memory_address 0x1000.

Basically the behavior matches the documented behavior of stated_load_addr and actual_load_addr, but I can understand that it is confusing, especially as on unix the stated_virtual_memory_address for the first segment is often zero. Maybe these methods should be removed and if a user needs them they can write it themself?

@philipc
Copy link
Contributor

philipc commented Nov 17, 2021

So there are two issues:

  1. virtual_memory_bias is documented to return the difference between an actual VA and a stated VA, but on Windows it is returning the difference between an actual VA and a stated RVA (i.e this difference is the module base). The problem here is that I don't think that findshlibs can determine the stated VA on Windows. My comment on Windows support #37 was that the bias should be module_base() - nt_headers.OptionalHeader.ImageBase, but it appears that the in memory copy of ImageBase is updated to be the same as module_base(), so that doesn't work. So I think the best we can do is document the behaviour and require the user to read the PE file to determine ImageBase (e.g. Fix tests for Windows addr2line#240), and maybe rename the function to make this clearer. For sentry's purposes, it actually wants the module base for Windows, so calling this function to get the module base makes sense (but see below).

  2. SharedLibrary::stated_load_addr and SharedLibrary::actual_load_addr return the address of the first segment as documented, but sentry wants a different meaning for Windows. So as @bjorn3 says, one option is to deprecate these APIs and require the user to write what they need themselves, or we can change the documentation and return stated = 0 and actual = module_base on Windows. These APIs were added purely for use by sentry. I don't mind having them in findshlibs, but they do seem quite specific to the conventions for minidumps/breakpad, and if we keep them then we should document that.

cc @vthib (who presumably is using this code too).

@vthib
Copy link
Contributor

vthib commented Nov 17, 2021

AFAICT There is an issue in the windows support, the bias is returned as module_base() and not as module_base() - ImageBase. So it looks like i forgot to apply this review comment when remaking this MR, sorry about that.
This change should fix the issues i believe? Looking at getsentry/sentry-rust#387 It definitely looks that way

@philipc
Copy link
Contributor

philipc commented Nov 17, 2021

That would fix the issue, except that I don't know how to determine ImageBase. Getting it from nt_headers doesn't work because it's not the same as the value that was in the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants