-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are the special magic rules around malloc
?
#535
Comments
The issue with this magic that I see is if you implement malloc itself in Rust.
Another issue is LTO or even cross-language LTO. |
I agree that this magic is potentially problematic. I don't know if LLVM has a way to disable it though. |
Fair enough. But I do believe rust / llvm need an answer for how to properly handle the above scenarios. How do I do these things soundly in Rust? Can I or can I not use LTO when making a libc for example? Also, as I understand it, any soundness issues that cannot be traced to an unsafe block (or unsafe attribute, unsafe command line flags (though I don't think those exist yet?), etc) are compiler bugs? Though in this case I guess the unsafe bit is the no-mangle export of a function called |
Could I dig a bit more into why this is important? Could we avoid such issues by having the malloc implementation explicitly "carve out" an existing allocation and give it back to the Abstract Machine, minting a new allocation? I imagine this "carving out" would come with significant limitations, such as no access being allowed to that region of memory until it is returned. In this model, the |
It's important because LLVM does optimizations and we have to ensure they don't break our model. This is a descriptivist issue, not a prescriptivist one. There are special magic rules for
That's already part of the model for regular Rust global allocators. This issue is about malloc magic that goes beyond this. My understanding is that LLVM does more things for |
Right, I'm trying to understand what part of the model these optimizations would break if malloc were implemented in the same compilation unit.
I think I was misunderstanding the nature of this issue, due to:
I thought that meant this was something to do with LLVM's intrinsic understanding of memory allocation and deallocation, but actually it could happen with any function that LLVM "knows" doesn't access any IR visible value. LLVM has chosen to only special-case malloc and related functions because they are already "known" and are presumably the most common example of such a function? Do we know when LLVM considers something to be |
I am not an expert on this just yet, but here is what I do know about LLVM and
So for example, https://godbolt.org/z/hdPYGf73v
This is adding the I would expect any optimizations LLVM does to be in accordance with 7.22.3 of the C spec, but of course, it's not like any set of optimizations is perfect. That's about where my understanding ends though, for example, I am curious as to why this doesn't have the |
AIUI, all functions defined by the C and C++ standards work like this. In practice a number of items are provided by a normal looking implementation, but the standard wording is careful to allow non-indirected calls to utilize a different INVOKE mechanism than typical user declared functions. In C, this is primarily noticeable in that standard functions may be provided as function macros as long as behavior is not impacted and the macro can be suppressed to get a nonmacro implementation.
I believe LLVM will still recognize an unmangled This is needed for optimizing header-declared versions of the libc functions, which is explicitly allowed by the C standard (at least before C23 which might have changed some things there; something changed about function addresses IIRC, at least for C++’s versions, if not C itself).
AIUI, LLVM treats the absence of a specified |
As long as it still doesn't read or write any state accessed by outside code, that should be fine.
LLVM knows two kinds of allocation functions: those marked with the |
Taken from #534:
Currently, LLVM doesn't do the second optimization. However, it does perform it if you manually set
System
to be the global allocator: https://rust.godbolt.org/z/a77PWjeKE 1. This is due to this line, which is used by their GVN pass.There are clearly special magic rules applying specifically for
malloc
that mean that its memory must be truly fresh for the Abstract Machine, and cannot be part of any previously existing stack/heap/other allocation. This is "fine" as long asmalloc
is called via FFI and all the state it works in is completely hidden from the current compilation unit. It becomes rather incoherent if there is ever a chance ofmalloc
itself being inlined into surrounding code, or exchanging data with surrounding code via global state -- so we better have rules in place against things like that. I think we should say thatmalloc
is reserved to be provided by the underlying runtime system, and it must be called via FFI in a way that no inlining is possible.Note that this is separate from Rust's
#[global_allocator]
attribute, which does not get all the same magic thatmalloc
gets. See #442 for discussion of the semantics of that attribute.Footnotes
You also get the
malloc
->calloc
transformation for types other than these hardcoded ones if you setSystem
to be the global allocator manually. ↩The text was updated successfully, but these errors were encountered: