Skip to content

Commit 42e1f63

Browse files
authored
Merge pull request #21888 from JuliaLang/kf/gcroots
The GC root placement pass 1.0 deserves
2 parents b9316b2 + b0a162c commit 42e1f63

17 files changed

+2647
-1455
lines changed

doc/src/devdocs/llvm.md

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,3 +107,184 @@ study it and the pass of interest in isolation.
107107
4. Strip the debug metadata and fix up the TBAA metadata by hand.
108108

109109
The last step is labor intensive. Suggestions on a better way would be appreciated.
110+
111+
## The jlcall calling convention
112+
113+
Julia has a generic calling convention for unoptimized code, which looks somewhat
114+
as follows:
115+
```
116+
jl_value_t *any_unoptimized_call(jl_value_t *, jl_value_t **, int);
117+
```
118+
where the first argument is the boxed function object, the second argument is
119+
an on-stack array of arguments and the third is the number of arguments. Now,
120+
we could perform a straightforward lowering and emit an alloca for the argument
121+
array. However, this would betray the SSA nature of the uses at the callsite,
122+
making optimizations (including GC root placement), significantly harder.
123+
Instead, we emit it as follows:
124+
```
125+
%bitcast = bitcast @any_unoptimized_call to %jl_value_t *(*)(%jl_value_t *, %jl_value_t *)
126+
call cc 37 %jl_value_t *%bitcast(%jl_value_t *%arg1, %jl_value_t *%arg2)
127+
```
128+
The special `cc 37` annotation marks the fact that this call site is really using
129+
jlcall calling convention. This allows us to retain the SSA-ness of the
130+
uses throughout the optimizer. GC root placement will later lower this call to
131+
the original C ABI. In the code the calling convention number is represented by
132+
the `JLCALL_F_CC` constant. In addition, there ist the `JLCALL_CC` calling
133+
convention which functions similarly, but omits the first argument.
134+
135+
## GC root placement
136+
137+
GC root placement is done by an LLVM late in the pass pipeline. Doing GC root
138+
placement this late enables LLVM to make more aggressive optimizations around
139+
code that requires GC roots, as well as allowing us to reduce the number of
140+
required GC roots and GC root store operations (since LLVM doesn't understand
141+
our GC, it wouldn't otherwise know what it is and is not allowed to do with
142+
values stored to the GC frame, so it'll conservatively do very little). As an
143+
example, consider an error path
144+
```
145+
if some_condition()
146+
#= Use some variables maybe =#
147+
error("An error occurred")
148+
end
149+
```
150+
During constant folding, LLVM may discover that the condition is always false,
151+
and can remove the basic block. However, if GC root lowering is done early,
152+
the GC root slots used in the deleted block, as well as any values kept alive
153+
in those slots only because they were used in the error path, would be kept
154+
alive by LLVM. By doing GC root lowering late, we give LLVM the license to do
155+
any of its usual optimizations (constant folding, dead code elimination, etc.),
156+
without having to worry (too much) about which values may or may not be gc
157+
tracked.
158+
159+
However, in order to be able to do late GC root placement, we need to be able to
160+
identify a) which pointers are gc tracked and b) all uses of such pointers. The
161+
goal of the GC placement pass is thus simple:
162+
163+
Minimize the number of needed gc roots/stores to them subject to the constraint
164+
that at every safepoint, any live gc-tracked pointer (i.e. for which there is
165+
a path after this point that contains a use of this pointer) is in some gc slot.
166+
167+
### Representation
168+
169+
The primary difficulty is thus choosing an IR representation that allows us to
170+
identify gc-tracked pointers and their uses, even after the program has been
171+
run through the optimizer. Our design makes use of three LLVM features to achieve
172+
this:
173+
- Custom address spaces
174+
- Operand Bundles
175+
- non-integral pointers
176+
177+
Custom address spaces allow us to tag every point with an integer that needs
178+
to be preserved through optimizations. The compiler may not insert casts between
179+
address spaces that did not exist in the original program and it must never
180+
change the address space of a pointer on a load/store/etc operation. This allows
181+
us to annotate which pointers are gc-tracked in an optimizer-resistant way. Note
182+
that metadata would not be able to achieve the same purpose. Metadata is supposed
183+
to always be discardable without altering the semantics of the program. However,
184+
failing to identify a gc-tracked pointer alters the resulting program behavior
185+
dramatically - it'll probably crash or return wrong results. We currently use
186+
three different addressspaces (their numbers are defined in src/codegen_shared.cpp):
187+
188+
- GC Tracked Pointers (currently 10): These are pointers to boxed values that may be put
189+
into a GC frame. It is loosely equivalent to a `jl_value_t*` pointer on the C
190+
side. N.B. It is illegal to ever have a pointer in this address space that may
191+
not be stored to a GC slot.
192+
- Derived Pointers (currently 11): These are pointers that are derived from some GC
193+
tracked pointer. Uses of these pointers generate uses of the original pointer.
194+
However, they need not themselves be known to the GC. The GC root placement
195+
pass MUST always find the GC tracked pointer from which this pointer is
196+
derived and use that as the pointer to root.
197+
- Callee Rooted Pointers (currently 12): This is a utility address space to express the
198+
notion of a callee rooted value. All values of this address space MUST be
199+
storable to a GC root (though it is possible to relax this condition in the
200+
future), but unlike the other pointers need not be rooted if passed to a
201+
call (they do still need to be rooted if they are live across another safepoint
202+
between the definition and the call).
203+
204+
### Invariants.
205+
The GC root placement pass makes use of several invariants, which need
206+
to be observed by the frontend and are preserved by the optimizer.
207+
208+
First, only the following addressspace casts are allowed
209+
- 0->{Tracked,Derived,CalleeRooted}: It is allowable to decay an untracked pointer to any of the
210+
other. However, do note that the optimizer has broad license to not root
211+
such a value. It is never safe to have a value in addressspace 0 in any part
212+
of the program if it is (or is derived from) a value that requires a GC root.
213+
- Tracked->Derived: This is the standard decay route for interior values. The placement
214+
pass will look for these to identify the base pointer for any use.
215+
- Tracked->CalleeRooted: Addrspace CalleeRooted serves merely as a hint that a GC root is not
216+
required. However, do note that the Derived->CalleeRooted decay is prohibited, since
217+
pointers should generally be storable to a GC slot, even in this address space.
218+
219+
Now let us consider what constitutes a use:
220+
- Loads whose loaded values is in one of the address spaces
221+
- Stores of a value in one of the address spaces to a location
222+
- Stores to a pointer in one of the address spaces
223+
- Calls for which a value in one of the address spaces is an operand
224+
- Calls in jlcall ABI, for which the argument array contains a value
225+
- Return instructions.
226+
227+
We explicitly allow load/stores and simple calls in address spaces Tracked/Derived. Elements of jlcall
228+
argument arrays must always be in address space Tracked (it is required by the ABI that
229+
they are valid `jl_value_t*` pointers). The same is true for return instructions
230+
(though note that struct return arguments are allowed to have any of the address
231+
spaces). The only allowable use of an address space CalleRooted pointer is to pass it to
232+
a call (which must have an appropriately typed operand).
233+
234+
Further, we disallow getelementptr in addrspace Tracked. This is because unless
235+
the operation is a noop, the resulting pointer will not be validly storable
236+
to a GC slot and may thus not be in this address space. If such a pointer
237+
is required, it should be decayed to addrspace Derived first.
238+
239+
Lastly, we disallow inttoptr/ptrtoint instructions in these address spaces.
240+
Having these instructions would mean that some i64 values are really gc tracked.
241+
This is problematic, because it breaks that stated requirement that we're able
242+
to identify gc-relevant pointers. This invariant is accomplished using the LLVM
243+
"non-integral pointers" feature, which is new in LLVM 5.0. It prohibits the
244+
optimizer from making optimizations that would introduce these operations. Note
245+
we can still insert static constants at JIT time by using inttoptr in address
246+
space 0 and then decaying to the appropriate address space afterwards.
247+
248+
### Supporting ccall
249+
One important aspect missing from the discussion so far is the handling of
250+
`ccall`. `ccall` has the peculiar feature that the location and scope of a use
251+
do not coincide. As an example consider:
252+
```
253+
A = randn(1024)
254+
ccall(:foo, Void, (Ptr{Float64},), A)
255+
```
256+
In lowering, the compiler will insert a conversion from the array to the
257+
pointer which drops the reference to the array value. However, we of course
258+
need to make sure that the array does stay alive while we're doing the ccall.
259+
To understand how this is done, first recall the lowering of the above code:
260+
```
261+
return $(Expr(:foreigncall, :(:foo), Void, svec(Ptr{Float64}), :($(Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), :(A), 0))), :(A)))
262+
```
263+
The last `:(A)`, is an extra argument list inserted during lowering that informs
264+
the code generator which julia level values need to be kept alive for the
265+
duration of this ccall. We then take this information and represent it in an
266+
"operand bundle" at the IR level. An operand bundle is essentially a fake use
267+
that is attached to the call site. At the IR level, this looks like so:
268+
```
269+
call void inttoptr (i64 ... to void (double*)*)(double* %5) [ "jl_roots"(%jl_value_t addrspace(10)* %A) ]
270+
```
271+
The GC root placement pass will treat the jl_roots operand bundle as if it were
272+
a regular operand. However, as a final step, after the gc roots are inserted,
273+
it will drop the operand bundle to avoid confusing instruction selection.
274+
275+
### Supporting pointer_from_objref
276+
`pointer_from_objref` is special because it requires the user to take explicit
277+
control of GC rooting. By our above invariants, this function is illegal,
278+
because it performs an addressspace cast from 10 to 0. However, it can be useful,
279+
in certain situations, so we provide a special intrinsic:
280+
```
281+
declared %jl_value_t *julia.pointer_from_objref(%jl_value_t addrspace(10)*)
282+
```
283+
which is lowered to the corresponding address space cast after gc root lowering.
284+
Do note however that by using this intrinsic, the caller assumes all responsibility
285+
for making sure that the value in question is rooted. Further this intrinsic is
286+
not considered a use, so the GC root placement pass will not provide a GC root
287+
for the function. As a result, the external rooting must be arranged while the
288+
value is still tracked by the system. I.e. it is not valid to attempt use the
289+
result of this operation to establish a global root - the optimizer may have
290+
already dropped the value.

src/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ endif
5353
LLVMLINK :=
5454

5555
ifeq ($(JULIACODEGEN),LLVM)
56-
SRCS += codegen jitlayers disasm debuginfo llvm-simdloop llvm-ptls llvm-gcroot llvm-lower-handlers cgmemmgr
56+
SRCS += codegen jitlayers disasm debuginfo llvm-simdloop llvm-ptls llvm-late-gc-lowering llvm-lower-handlers llvm-gc-invariant-verifier llvm-propagate-addrspaces cgmemmgr
5757
FLAGS += -I$(shell $(LLVM_CONFIG_HOST) --includedir)
5858
LLVM_LIBS := all
5959
ifeq ($(USE_POLLY),1)

src/ccall.cpp

Lines changed: 43 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -459,7 +459,9 @@ static Value *llvm_type_rewrite(
459459
// sizes.
460460
Value *from;
461461
Value *to;
462-
#if JL_LLVM_VERSION >= 30600
462+
#if JL_LLVM_VERSION >= 40000
463+
const DataLayout &DL = jl_data_layout;
464+
#elif JL_LLVM_VERSION >= 30600
463465
const DataLayout &DL = jl_ExecutionEngine->getDataLayout();
464466
#else
465467
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout();
@@ -485,8 +487,8 @@ static Value *runtime_apply_type(jl_value_t *ty, jl_unionall_t *unionall, jl_cod
485487
args[0] = literal_pointer_val(ty);
486488
args[1] = literal_pointer_val((jl_value_t*)ctx->linfo->def.method->sig);
487489
args[2] = builder.CreateInBoundsGEP(
488-
LLVM37_param(T_pjlvalue)
489-
emit_bitcast(ctx->spvals_ptr, T_ppjlvalue),
490+
LLVM37_param(T_prjlvalue)
491+
emit_bitcast(decay_derived(ctx->spvals_ptr), T_pprjlvalue),
490492
ConstantInt::get(T_size, sizeof(jl_svec_t) / sizeof(jl_value_t*)));
491493
return builder.CreateCall(prepare_call(jlapplytype_func), makeArrayRef(args));
492494
}
@@ -639,7 +641,7 @@ static Value *julia_to_native(Type *to, bool toboxed, jl_value_t *jlto, jl_union
639641
// We're passing Any
640642
if (toboxed) {
641643
assert(!byRef); // don't expect any ABI to pass pointers by pointer
642-
return boxed(jvinfo, ctx);
644+
return maybe_decay_untracked(boxed(jvinfo, ctx));
643645
}
644646
assert(jl_is_datatype(jlto) && julia_struct_has_layout((jl_datatype_t*)jlto, jlto_env));
645647

@@ -1208,7 +1210,9 @@ static jl_cgval_t mark_or_box_ccall_result(Value *result, bool isboxed, jl_value
12081210
Value *runtime_dt = runtime_apply_type(rt, unionall, ctx);
12091211
// TODO: is this leaf check actually necessary, or is it structurally guaranteed?
12101212
emit_leafcheck(runtime_dt, "ccall: return type must be a leaf DataType", ctx);
1211-
#if JL_LLVM_VERSION >= 30600
1213+
#if JL_LLVM_VERSION >= 40000
1214+
const DataLayout &DL = jl_data_layout;
1215+
#elif JL_LLVM_VERSION >= 30600
12121216
const DataLayout &DL = jl_ExecutionEngine->getDataLayout();
12131217
#else
12141218
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout();
@@ -1306,7 +1310,7 @@ std::string generate_func_sig()
13061310
#else
13071311
paramattrs.push_back(AttributeSet::get(jl_LLVMContext, 1, retattrs));
13081312
#endif
1309-
fargt_sig.push_back(PointerType::get(lrt, 0));
1313+
fargt_sig.push_back(PointerType::get(lrt, AddressSpace::Derived));
13101314
sret = 1;
13111315
prt = lrt;
13121316
}
@@ -1349,6 +1353,8 @@ std::string generate_func_sig()
13491353
}
13501354

13511355
t = julia_struct_to_llvm(tti, unionall_env, &isboxed);
1356+
if (isboxed)
1357+
t = T_prjlvalue;
13521358
if (t == NULL || t == T_void) {
13531359
std::stringstream msg;
13541360
msg << "ccall: the type of argument ";
@@ -1369,7 +1375,7 @@ std::string generate_func_sig()
13691375
pat = t;
13701376
}
13711377
else if (byRef) {
1372-
pat = PointerType::get(t, 0);
1378+
pat = PointerType::get(t, AddressSpace::Derived);
13731379
}
13741380
else {
13751381
pat = abi->preferred_llvm_type((jl_datatype_t*)tti, false);
@@ -1459,6 +1465,8 @@ static const std::string verify_ccall_sig(size_t nargs, jl_value_t *&rt, jl_valu
14591465
lrt = julia_struct_to_llvm(rt, unionall_env, &retboxed);
14601466
if (lrt == NULL)
14611467
return "ccall: return type doesn't correspond to a C type";
1468+
else if (retboxed)
1469+
lrt = T_prjlvalue;
14621470

14631471
// is return type fully statically known?
14641472
if (unionall_env == NULL) {
@@ -1652,8 +1660,16 @@ static jl_cgval_t emit_ccall(jl_value_t **args, size_t nargs, jl_codectx_t *ctx)
16521660
ary = emit_unbox(largty, emit_expr(argi, ctx), tti);
16531661
}
16541662
JL_GC_POP();
1655-
return mark_or_box_ccall_result(emit_bitcast(ary, lrt),
1656-
retboxed, rt, unionall, static_rt, ctx);
1663+
if (!retboxed) {
1664+
return mark_or_box_ccall_result(
1665+
emit_bitcast(emit_pointer_from_objref(
1666+
emit_bitcast(ary, T_prjlvalue)), lrt),
1667+
retboxed, rt, unionall, static_rt, ctx);
1668+
} else {
1669+
return mark_or_box_ccall_result(maybe_decay_untracked(
1670+
emit_bitcast(ary, lrt)),
1671+
retboxed, rt, unionall, static_rt, ctx);
1672+
}
16571673
}
16581674
else if (is_libjulia_func(jl_cpu_pause)) {
16591675
// Keep in sync with the julia_threads.h version
@@ -1977,6 +1993,7 @@ jl_cgval_t function_sig_t::emit_a_ccall(
19771993
ai + 1, ctx, &needStackRestore);
19781994
bool issigned = jl_signed_type && jl_subtype(jargty, (jl_value_t*)jl_signed_type);
19791995
if (byRef) {
1996+
v = decay_derived(v);
19801997
// julia_to_native should already have done the alloca and store
19811998
assert(v->getType() == pargty);
19821999
}
@@ -1992,6 +2009,13 @@ jl_cgval_t function_sig_t::emit_a_ccall(
19922009
}
19932010
v = julia_to_address(largty, jargty_in_env, unionall_env, arg,
19942011
ai + 1, ctx, &needStackRestore);
2012+
if (isa<UndefValue>(v)) {
2013+
JL_GC_POP();
2014+
return jl_cgval_t();
2015+
}
2016+
// A bit of a hack, but we're trying to get rid of this feature
2017+
// anyway.
2018+
v = emit_bitcast(emit_pointer_from_objref(v), pargty);
19952019
assert((!toboxed && !byRef) || isa<UndefValue>(v));
19962020
}
19972021

@@ -2019,7 +2043,7 @@ jl_cgval_t function_sig_t::emit_a_ccall(
20192043
literal_pointer_val((jl_value_t*)rt));
20202044
sretboxed = true;
20212045
}
2022-
argvals[0] = emit_bitcast(result, fargt_sig.at(0));
2046+
argvals[0] = emit_bitcast(decay_derived(result), fargt_sig.at(0));
20232047
}
20242048

20252049
Instruction *stacksave = NULL;
@@ -2107,9 +2131,11 @@ jl_cgval_t function_sig_t::emit_a_ccall(
21072131
// Mark GC use before **and** after the ccall to make sure the arguments
21082132
// are alive during the ccall even if the function called is `noreturn`.
21092133
mark_gc_uses(gc_uses);
2134+
OperandBundleDef OpBundle("jl_roots", gc_uses);
21102135
// the actual call
21112136
Value *ret = builder.CreateCall(prepare_call(llvmf),
2112-
ArrayRef<Value*>(&argvals[0], nargs + sret));
2137+
ArrayRef<Value*>(&argvals[0], nargs + sret),
2138+
ArrayRef<OperandBundleDef>(&OpBundle, gc_uses.empty() ? 0 : 1));
21132139
((CallInst*)ret)->setAttributes(attributes);
21142140

21152141
if (cc != CallingConv::C)
@@ -2151,6 +2177,9 @@ jl_cgval_t function_sig_t::emit_a_ccall(
21512177
}
21522178
else {
21532179
Type *jlrt = julia_type_to_llvm(rt, &jlretboxed); // compute the real "julian" return type and compute whether it is boxed
2180+
if (jlretboxed) {
2181+
jlrt = T_prjlvalue;
2182+
}
21542183
if (type_is_ghost(jlrt)) {
21552184
return ghostValue(rt);
21562185
}
@@ -2166,7 +2195,9 @@ jl_cgval_t function_sig_t::emit_a_ccall(
21662195
Value *strct = emit_allocobj(ctx, rtsz, runtime_bt);
21672196
int boxalign = jl_gc_alignment(rtsz);
21682197
#ifndef JL_NDEBUG
2169-
#if JL_LLVM_VERSION >= 30600
2198+
#if JL_LLVM_VERSION >= 40000
2199+
const DataLayout &DL = jl_data_layout;
2200+
#elif JL_LLVM_VERSION >= 30600
21702201
const DataLayout &DL = jl_ExecutionEngine->getDataLayout();
21712202
#else
21722203
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout();

0 commit comments

Comments
 (0)