-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
The GC root placement pass 1.0 deserves #21888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7227007
3a9fa43
cff6a5c
eefbde7
226bca3
b2d0caf
b0a162c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -107,3 +107,184 @@ study it and the pass of interest in isolation. | |
4. Strip the debug metadata and fix up the TBAA metadata by hand. | ||
|
||
The last step is labor intensive. Suggestions on a better way would be appreciated. | ||
|
||
## The jlcall calling convention | ||
|
||
Julia has a generic calling convention for unoptimized code, which looks somewhat | ||
as follows: | ||
``` | ||
jl_value_t *any_unoptimized_call(jl_value_t *, jl_value_t **, int); | ||
``` | ||
where the first argument is the boxed function object, the second argument is | ||
an on-stack array of arguments and the third is the number of arguments. Now, | ||
we could perform a straightforward lowering and emit an alloca for the argument | ||
array. However, this would betray the SSA nature of the uses at the callsite, | ||
making optimizations (including GC root placement), significantly harder. | ||
Instead, we emit it as follows: | ||
``` | ||
%bitcast = bitcast @any_unoptimized_call to %jl_value_t *(*)(%jl_value_t *, %jl_value_t *) | ||
call cc 37 %jl_value_t *%bitcast(%jl_value_t *%arg1, %jl_value_t *%arg2) | ||
``` | ||
The special `cc 37` annotation marks the fact that this call site is really using | ||
jlcall calling convention. This allows us to retain the SSA-ness of the | ||
uses throughout the optimizer. GC root placement will later lower this call to | ||
the original C ABI. In the code the calling convention number is represented by | ||
the `JLCALL_F_CC` constant. In addition, there ist the `JLCALL_CC` calling | ||
convention which functions similarly, but omits the first argument. | ||
|
||
## GC root placement | ||
|
||
GC root placement is done by an LLVM late in the pass pipeline. Doing GC root | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. an LLVM pass late in the pipeline |
||
placement this late enables LLVM to make more aggressive optimizations around | ||
code that requires GC roots, as well as allowing us to reduce the number of | ||
required GC roots and GC root store operations (since LLVM doesn't understand | ||
our GC, it wouldn't otherwise know what it is and is not allowed to do with | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not allowed to do what? |
||
values stored to the GC frame, so it'll conservatively do very little). As an | ||
example, consider an error path | ||
``` | ||
if some_condition() | ||
#= Use some variables maybe =# | ||
error("An error occurred") | ||
end | ||
``` | ||
During constant folding, LLVM may discover that the condition is always false, | ||
and can remove the basic block. However, if GC root lowering is done early, | ||
the GC root slots used in the deleted block, as well as any values kept alive | ||
in those slots only because they were used in the error path, would be kept | ||
alive by LLVM. By doing GC root lowering late, we give LLVM the license to do | ||
any of its usual optimizations (constant folding, dead code elimination, etc.), | ||
without having to worry (too much) about which values may or may not be gc | ||
tracked. | ||
|
||
However, in order to be able to do late GC root placement, we need to be able to | ||
identify a) which pointers are gc tracked and b) all uses of such pointers. The | ||
goal of the GC placement pass is thus simple: | ||
|
||
Minimize the number of needed gc roots/stores to them subject to the constraint | ||
that at every safepoint, any live gc-tracked pointer (i.e. for which there is | ||
a path after this point that contains a use of this pointer) is in some gc slot. | ||
|
||
### Representation | ||
|
||
The primary difficulty is thus choosing an IR representation that allows us to | ||
identify gc-tracked pointers and their uses, even after the program has been | ||
run through the optimizer. Our design makes use of three LLVM features to achieve | ||
this: | ||
- Custom address spaces | ||
- Operand Bundles | ||
- non-integral pointers | ||
|
||
Custom address spaces allow us to tag every point with an integer that needs | ||
to be preserved through optimizations. The compiler may not insert casts between | ||
address spaces that did not exist in the original program and it must never | ||
change the address space of a pointer on a load/store/etc operation. This allows | ||
us to annotate which pointers are gc-tracked in an optimizer-resistant way. Note | ||
that metadata would not be able to achieve the same purpose. Metadata is supposed | ||
to always be discardable without altering the semantics of the program. However, | ||
failing to identify a gc-tracked pointer alters the resulting program behavior | ||
dramatically - it'll probably crash or return wrong results. We currently use | ||
three different addressspaces (their numbers are defined in src/codegen_shared.cpp): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. address spaces (used as 2 words in the rest of this paragraph) |
||
|
||
- GC Tracked Pointers (currently 10): These are pointers to boxed values that may be put | ||
into a GC frame. It is loosely equivalent to a `jl_value_t*` pointer on the C | ||
side. N.B. It is illegal to ever have a pointer in this address space that may | ||
not be stored to a GC slot. | ||
- Derived Pointers (currently 11): These are pointers that are derived from some GC | ||
tracked pointer. Uses of these pointers generate uses of the original pointer. | ||
However, they need not themselves be known to the GC. The GC root placement | ||
pass MUST always find the GC tracked pointer from which this pointer is | ||
derived and use that as the pointer to root. | ||
- Callee Rooted Pointers (currently 12): This is a utility address space to express the | ||
notion of a callee rooted value. All values of this address space MUST be | ||
storable to a GC root (though it is possible to relax this condition in the | ||
future), but unlike the other pointers need not be rooted if passed to a | ||
call (they do still need to be rooted if they are live across another safepoint | ||
between the definition and the call). | ||
|
||
### Invariants. | ||
The GC root placement pass makes use of several invariants, which need | ||
to be observed by the frontend and are preserved by the optimizer. | ||
|
||
First, only the following addressspace casts are allowed | ||
- 0->{Tracked,Derived,CalleeRooted}: It is allowable to decay an untracked pointer to any of the | ||
other. However, do note that the optimizer has broad license to not root | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. others |
||
such a value. It is never safe to have a value in addressspace 0 in any part | ||
of the program if it is (or is derived from) a value that requires a GC root. | ||
- Tracked->Derived: This is the standard decay route for interior values. The placement | ||
pass will look for these to identify the base pointer for any use. | ||
- Tracked->CalleeRooted: Addrspace CalleeRooted serves merely as a hint that a GC root is not | ||
required. However, do note that the Derived->CalleeRooted decay is prohibited, since | ||
pointers should generally be storable to a GC slot, even in this address space. | ||
|
||
Now let us consider what constitutes a use: | ||
- Loads whose loaded values is in one of the address spaces | ||
- Stores of a value in one of the address spaces to a location | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. these should be up to two uses, right? (the value and the slot) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, will add a separate bullet point |
||
- Stores to a pointer in one of the address spaces | ||
- Calls for which a value in one of the address spaces is an operand | ||
- Calls in jlcall ABI, for which the argument array contains a value | ||
- Return instructions. | ||
|
||
We explicitly allow load/stores and simple calls in address spaces Tracked/Derived. Elements of jlcall | ||
argument arrays must always be in address space Tracked (it is required by the ABI that | ||
they are valid `jl_value_t*` pointers). The same is true for return instructions | ||
(though note that struct return arguments are allowed to have any of the address | ||
spaces). The only allowable use of an address space CalleRooted pointer is to pass it to | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. CalleeRooted |
||
a call (which must have an appropriately typed operand). | ||
|
||
Further, we disallow getelementptr in addrspace Tracked. This is because unless | ||
the operation is a noop, the resulting pointer will not be validly storable | ||
to a GC slot and may thus not be in this address space. If such a pointer | ||
is required, it should be decayed to addrspace Derived first. | ||
|
||
Lastly, we disallow inttoptr/ptrtoint instructions in these address spaces. | ||
Having these instructions would mean that some i64 values are really gc tracked. | ||
This is problematic, because it breaks that stated requirement that we're able | ||
to identify gc-relevant pointers. This invariant is accomplished using the LLVM | ||
"non-integral pointers" feature, which is new in LLVM 5.0. It prohibits the | ||
optimizer from making optimizations that would introduce these operations. Note | ||
we can still insert static constants at JIT time by using inttoptr in address | ||
space 0 and then decaying to the appropriate address space afterwards. | ||
|
||
### Supporting ccall | ||
One important aspect missing from the discussion so far is the handling of | ||
`ccall`. `ccall` has the peculiar feature that the location and scope of a use | ||
do not coincide. As an example consider: | ||
``` | ||
A = randn(1024) | ||
ccall(:foo, Void, (Ptr{Float64},), A) | ||
``` | ||
In lowering, the compiler will insert a conversion from the array to the | ||
pointer which drops the reference to the array value. However, we of course | ||
need to make sure that the array does stay alive while we're doing the ccall. | ||
To understand how this is done, first recall the lowering of the above code: | ||
``` | ||
return $(Expr(:foreigncall, :(:foo), Void, svec(Ptr{Float64}), :($(Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), :(A), 0))), :(A))) | ||
``` | ||
The last `:(A)`, is an extra argument list inserted during lowering that informs | ||
the code generator which julia level values need to be kept alive for the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. capitalize Julia since referring to the language, not the command line executable |
||
duration of this ccall. We then take this information and represent it in an | ||
"operand bundle" at the IR level. An operand bundle is essentially a fake use | ||
that is attached to the call site. At the IR level, this looks like so: | ||
``` | ||
call void inttoptr (i64 ... to void (double*)*)(double* %5) [ "jl_roots"(%jl_value_t addrspace(10)* %A) ] | ||
``` | ||
The GC root placement pass will treat the jl_roots operand bundle as if it were | ||
a regular operand. However, as a final step, after the gc roots are inserted, | ||
it will drop the operand bundle to avoid confusing instruction selection. | ||
|
||
### Supporting pointer_from_objref | ||
`pointer_from_objref` is special because it requires the user to take explicit | ||
control of GC rooting. By our above invariants, this function is illegal, | ||
because it performs an addressspace cast from 10 to 0. However, it can be useful, | ||
in certain situations, so we provide a special intrinsic: | ||
``` | ||
declared %jl_value_t *julia.pointer_from_objref(%jl_value_t addrspace(10)*) | ||
``` | ||
which is lowered to the corresponding address space cast after gc root lowering. | ||
Do note however that by using this intrinsic, the caller assumes all responsibility | ||
for making sure that the value in question is rooted. Further this intrinsic is | ||
not considered a use, so the GC root placement pass will not provide a GC root | ||
for the function. As a result, the external rooting must be arranged while the | ||
value is still tracked by the system. I.e. it is not valid to attempt use the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to attempt to use |
||
result of this operation to establish a global root - the optimizer may have | ||
already dropped the value. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make any sense for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the whole point of that function is to escape the GC. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This escape semantic would also be required so that we can stack allocate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is the