-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement some form of escape analysis to allocate values on the stack #776
Comments
A related and older idea is to bump allocate small objects using a custom allocator (#542). The idea here was that by using our own allocator we can avoid the cost of going through |
As for what constitutes as an escape: a conservative approach would be to just consider any move (except for a drop) an escape. This does mean that moving stack value S₁ into stack value S₂ results in S₁ being heap allocated, but it might be good enough as a start. |
The way we generate MIR poses a challenge, such that we can't just look at
Results in something like this:
This means that if we look at just |
Assuming the moves are all in the same scopes, it doesn't really matter how many moves there are, since we can copy the data if needed. Whether copying actually happens depends largely on how LLVM decides to compile/optimize the code. Things do get tricky if we borrow in between such moves, because the move would invalidate the borrow. |
Given the following instructions:
R₁ should be stack allocated, but the decision to do that depends on R₂, R₃ and others not escaping the scope R₁ is defined in. So given a move chain of For this to work, for every register R that's assigned the result of an |
For #677 I have a WIP parser that uses the commonly used Stack allocating these values would likely greatly improve performance. However, I'm not sure it's enough to just rely on escape analysis. That is, for that to work the allocations need to be inlined, and given the complexity of the parser that's not a guarantee. An alternative might be to not rely on escape analysis at all, but instead look at the type definition. That is, if the type size is e.g. less than or equal to 64 bytes, and all its sub values are trivial to copy, we just always stack allocate it and copy it upon borrowing. This does mean we need some sort of |
I've spent some time building a bump allocator for small objects based on Immix, as found on the For the work in progress I'm now wondering if we perhaps could take a simpler approach. That is, instead of relying on escape analysis we instead promote types to value types based on their definition rather than their usage. For example, types such as The downside is that "value type promotion" only works for simple types such as |
Another issue with value type promotion: if we just blindly promote a type to a value type (assuming it stores other value types), then we might break circular types (i.e. a linked list). |
For the bump allocation scheme, I think we can make the following changes:
|
One option is to flip things around: values go on the stack by default, and a hypothetical To make that safe, we'd have to check the borrow count upon moving a stack value such that we don't invalidate any. In theory we can elide such checks if we can statically determine no borrows are present at the time of the move, but this won't be able to remove all such checks. Moving Implementation wise this is much simpler compared to escape analysis and value type promotion, and we drastically reduce the pressure on the system allocator. This in turn is likely to improve performance significantly. The downsides are:
|
The above idea won't work well for resizable types such as arrays: if you have an array of N stack values and you create a borrow to a value, then resizing the array will invalidate that borrow. To make that safe as well, we'd have to check the borrow count for every value, which will be terrible performance wise. |
Enums could be promoted to stack based value types if the following conditions are met:
The reason this would be safe is that enums themselves can't be modified in-place, i.e. you can't change the constructor tag or the constructor's arguments. You can modify those in-place technically, but for value types that would result in a copy being modified. Essentially the idea here is that in addition to introducing an For types such as A challenge here is specialization: if we specialize for such specific types we can remove the need for object headers, as generics can use static dispatch. This however may result in more copies of methods that operate on different types with the same memory layout. We'd also need a form of auto boxing when such values are cast to a trait, though we could also just straight up disallow casting If we keep the headers this won't be necessary, but then e.g. For |
#778 is merged, so the bulk of the work is done. I do think it's still worth implementing stack allocations on top of this. This way if a heap allocated type is used but we can trivially determine it should go on the stack, we can do just that automatically. |
Escape analysis may be tricky to implement in an effective way: when encountering a dynamic dispatch call site we'd have to conservatively assume any arguments passed (including the receiver) escape the call, as we have no way of knowing which method is actually called at runtime. Since we use dynamic dispatch for most generics, this means that escape analysis might end up not being that useful. |
Another challenge: iterators are often composed together, combined with using closures. These types may be constructed in function |
A different approach is to do the following: instead of escape analysis, we introduce the syntax/semantics This would be safe because it's never possible to construct a regular Because of these restrictions we can allow field assignments, which in turn makes Note that In case of
When calling methods on a This approach does have some downsides:
The first downside is the biggest one, as I really don't like it when there are multiple competing ways of doing (more or less) the same thing, because it results in a mess. For example, Python famously claims "there should be only one way to do things", only to end up with urllib/urllib2/urllib3 and similar situations. On the other hand, the benefit of this approach is that the resulting behavior is consistent/predictable: a |
We also run into an issue similar to the one described in #787: trait default methods may borrow I think that in order to solve both, we essentially have to treat |
Skimming through some recent papers related to escape analysis (link), it seems that the usual stack allocation percentages hovers around 10% on average, with some cases of 20-30%. IIRC this matches findings from papers published in the late 90s/early 2000s. If this holds true across languages (or more specifically, Inko), then I'm not sure a 10% reduction in heap allocations is worth the compile-time complexity. |
I think that for closures we can just make What we should do is change the capturing logic such that one closure capturing another closure captures it by value, as this new setup wouldn't allow capturing by borrow. This really should be the default anyway as capturing closures by borrow is almost always going to result in them being dropped prematurely. One challenge is that given a |
My current thinking is to introduce |
There's in fact an area in the standard library where we mutably borrow a closure: |
For more information, refer to #776. Changelog: changed
For more information, refer to #776. Changelog: changed
For more information, refer to #776. Changelog: changed
Per #786 (comment) we'll look into stack allocating closures as part of 0.19.0. |
I quickly hacked together a method that checks how many allocations are dropped in the same scope, taking into account any moves of the original register containing the allocation. For the standard library there are 5407 heap allocation sites, but only 150 (2.77%) of those are dropped in the same method. If I increase the inline weight from 100 to 200 this only goes up to 2.86%. If I increase the threshold to 500, we're at 7.46%. If I force inlining of methods that accept any closures as arguments, the percentage is only 4.17% (423 out of 10156 allocations). If I combine this with only counting closure allocations, the percentage goes up to 15%. Without that forceful inlining the percentage is only 5.91%. In other words, we'd have to inline much more aggressively for this to have any impact. Even then I doubt it's worth it, as stack allocating only 15% of the closures might not be worth it. |
OpenFlow seems to benefit a bit more from this approach: if we only count closure allocations, then without the inlining hack mentioned above then about 16% of the closure allocations can go on the stack. Enabling the closure inlining hack only bumps that up to 19%. For clogs the results are very similar. |
Another way of approaching this is the following: the allocation/drop check hack mentioned is very basic and likely a lot less accurate compared to escape analysis. If such a basic hack can reduce allocations by (in the best case) around 20%, then perhaps escape analysis can do a much better job compared to the usual 8-10% mentioned by the various papers linked in the issue description. For example, if we can stack allocate say 30% of all allocations that's still pretty good, provided the escape analysis implementation isn't overly complex. |
A unique type is a type that is allocated on the stack and subject to move semantics (i.e. it's not copied), but severely restricts borrows. This in turn makes it safe to assign fields new values (unlike with inline types), as the assignments are always correctly observed (because they always act on the same reference). Unique types are useful when creating mutable state machines and streams such as files, sockets, and iterators. Similar to borrows of `uni T` _values_, one can use borrows of unique types as the receiver of a method. For example, if a unique type is stored in a field it's exposed as a `ref T` or `mut T`, and one can still call methods on such a borrow. What isn't possible is assigning such borrows, passing them around as arguments or returning them. Refer to #776 for more information. Changelog: changed
Working on #798 I realized the following: since we have single ownership and move semantics, determining if values escape should be easier than thought: given some value V, if V is unconditionally dropped at every return then we know for a fact it doesn't escape the call. If V is merely moved or dropped conditionally, we have to assume that it might escape. Using this information we can then annotate arguments as "escape" or "doesn't escape" and use that for inter-procedural escape analysis. For dynamic dispatch we could produce this information for all potential targets, then "union" the results together: if given an argument A and for one or more potential targets A might escape, we treat it as escaping for all targets. Only if it doesn't escape for any of the targets would we consider it as non-escaping. With that said, we'd likely still be looking at maybe 10-20% fewer allocations in the best case. That's not bad, but I'm not sure it's worth it either. |
Description
Types defined using the
class
keyword are heap allocated. While #750 talks about giving users the option to explicitly opt-in to stack allocating data, I feel we also need a mechanism to determine if we can do this automatically. That is, we need some form of escape analysis and stack allocate values if we determine this is safe to do so.An area of the standard library that can really benefit from this is the
std.iter
module. Iterators are used heavily and incur a heap allocation. When used in tight loops, it's not uncommon for most time to be dominated by the memory management necessary for these iterators.The basic idea is simple: if an allocated value
V
doesn't escape the scopeS
it's allocated into (meaninglifetime(V) <= lifetime(S)
), we can allocate it onto the stack instead.The implementation however is a bit more tricky. For example, we can only stack allocate values if they are allocated directly in their scope and not through a method call. This means that good inlining is crucial for escape analysis to have a meaningful impact. In addition, the implementation of escape analysis itself may prove a bit of a challenge.
Note that it's perfectly fine for borrows of stack allocated data to outlive the stack value, as dropping the stack value would still incur a runtime borrow check and produce a panic in such cases. In other words, such cases don't inhibit the ability to stack allocate.
Related issues
Related work
Points-to Calculation for Golang
Glasgow Haskell Compiler
The text was updated successfully, but these errors were encountered: