-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add alloca support #1105
Comments
I also need this to implement jyn514/saltwater#63. |
…dealliance#1105) * clif-util: Make the `-t` flag work with the `wasm` subcommand The presence of the flag was checked in the code, so it was essentially already supported, but it was not properly configured when constructing the CLI arguments parser. * clif-util: also enable the `-c` flag for the `wasm` subcommand Similar to the parent commit, this functionality is already supported, just a mix up of the CLI parser construction made it not show up.
How hard would this be to implement? I'm willing to take a shot at it. |
@bnjbvr, @cfallin, @julian-seward1, can you comment on this? |
I'm assuming the question arises in the context of the new backend. From looking at LLVM's docs, it seems that If one can pass a dynamic input value that's the amount to allocate, it is likely to be much trickier, because we need to be able to track precisely the running SP value within the function's body: that's what the nominal SP offset does in a static manner. It should be implementable, but it might require using a register for this purpose. |
No, it also allows a dynamic input. It is just that a static input is equivalent to using stack slots in Cranelift. |
It's definitely possible to implement this with the new backends. It interacts with the way we address stack slots and spill slots; at least on aarch64, we address function arguments with The most straightforward approach would probably be to (i) detect when an alloca (or just a dynamic alloca) is present; then if so, (ii) allocate a separate scratch register in the prologue and copy nominal-SP to that; then (iii) access all stack and spill slots relative to that register. We lose a register in that case but I think that's unavoidable unless we revert to negative offsets from FP (which has a higher cost -- a few percent degradation at least, because it forces Happy to point out the bits that would need to change in more detail if you would like! |
Related to this, at least for the x86-64 ABIs, I would like to see Cranelift stop using RBP as a "traditional" frame pointer as both DWARF and Windows unwind encode enough information to properly describe frame layout without having to establish a frame pointer for frames of static size. This would free RBP to be used as a GPR for functions that do not have dynamic stack allocations or as the "nominal-SP" register for functions that have dynamic stack allocations. In fact, on Windows x64, a "frame pointer" is supposed to be exactly what you describe the "nominal-SP" register as: a register pointing at the base (or somewhere inside) of the "static" part of the local frame and used to reference args/locals (and CSRs for unwind) by positive offset. For that ABI, a frame pointer is therefore generally only established for frames calling Right now the x64 prologue/epilogue instructions relating to the establishment of a traditional frame pointer are simply wasted instructions on Windows. |
This should be an option in my opinion. Using DWARF unwinding for perf profiles as opposed to frame pointers results in much bigger |
Definitely, but I think omitting a traditional frame pointer should be default for these ABIs, at least for optimized compilations. An option to opt-in when they are legitimately needed (like in the case of a tool relying on them for fast stack walks) makes sense to me. |
@peterhuene that's a good point -- could you create a separate issue for that? I definitely agree that |
I opened #1149 a while back specific to Windows. Should we create a more general "omit frame pointers when permitted" issue? |
Sure, I think it makes sense to track with a separate issue; it's a distinct thing that we'd want to do on any platform when we're allowed to (by ABIs and by debug requirements). |
I've opened #2073. |
As per rust-lang/compiler-team#630 unsized locals will be removed from the compiler. No need to implement them in cg_clif any longer. It would still be necessary for implementing a C compiler based on Cranelift though. |
I no longer maintain saltwater-cc and don't have time to work on this. |
Okay, I guess let's close this issue. If somebody wants this feature in the future, feel free to re-open this issue then. |
I want to implement something similar to Swift's approach to unboxed polymorphism using Value Witness Tables.[1] [1] Implementing Swift Generics @ 2017 LLVM Developer's Meeting |
That use-case makes sense to me, @bryal. I gather your interest is related to https://git.sr.ht/~jojo/kapreolo/commit/93672f5, right? I like your current workaround of allocating a fixed-size stack slot and falling back to a heap allocation if you need more space, but we can definitely discuss how The suggestions that folks made several years ago have some associated costs, both in runtime when accessing stack slots, and in maintenance time. We'll just have to consider those costs carefully in this discussion. |
Exactly, @jameysharp, that's the one. I'm not intimately familiar with any concrete ISAs. Before Cranelift, my only experience with code at this level was using LLVM. I've had to consider calling conventions, but not much more than that. That is to say, I'm not sure I have much to contribute to the discussion of how That being said, if we manage to come up with a clear plan, I'd be happy to help out with the manual labour. |
I talked with several of the other people working on Cranelift (@cfallin, @fitzgen, @elliottt, and @lpereira) about this today and there is quite a bit to say about it, which I will try to organize here. If I misrepresent any of their positions I hope they will speak up. First off, we would welcome a PR demonstrating how this could work! But at least among the people I talked with, working on There are several reasons why a fixed-size stack frame is much easier to deal with. One is that Windows requires stack-probing in the function prologue, and while I assume there's a way to make that work with A larger reason is that accessing stack slots for spilled registers needs to be as cheap as possible. Currently there are two registers we can add fixed offsets to in order to find any stack slot in the current frame: specifically, each target has a frame pointer and a stack pointer. On x86-64 we could choose to use the frame pointer to access everything, which would allow You already have a workaround where you fall back to malloc for large allocations, which is a good approach. In a comment you noted that you're "not sure we can [free] this manually. The temporary may be passed as an arg in a tail call." I'd note that To avoid the performance cost of heap allocations, one suggestion that we came up with is that you could allocate a separate stack that is under your compiler's control, rather than trying to share it with the stack used for calling conventions and register allocation. This kind of "shadow stack" is a common solution when data lifetimes are tied to function call scopes. Allocation and deallocation are constant-time in the common and amortized cases, just like I'm going to go ahead and close this issue again to reflect that this is not currently planned, but we still welcome further discussion. |
Thank you @jameysharp for your work and thank you all for your input! Indeed I had not yet stopped to consider how my stack temporaries would play with tail calls. As I intend to employ optimized tail calls extensively in my generated code, you're of course right that this approach will not work as I had planned. I assume Swift does not (or, in 2017, did not) guarantee TCO in the same way, for the I'll see about using a shadow stack instead -- thanks for the suggestion! Now I need to think a bit about indirectly stored temporaries, tail recursion, and memory leaks. For my part, there's no need to implement this anymore. |
This is necessary to implement the
unsized_locals
rust feature.cc https://github.com/bjorn3/rustc_codegen_cranelift/issues/15
The text was updated successfully, but these errors were encountered: