@@ -107,3 +107,184 @@ study it and the pass of interest in isolation.
107
107
4 . Strip the debug metadata and fix up the TBAA metadata by hand.
108
108
109
109
The last step is labor intensive. Suggestions on a better way would be appreciated.
110
+
111
+ ## The jlcall calling convention
112
+
113
+ Julia has a generic calling convention for unoptimized code, which looks somewhat
114
+ as follows:
115
+ ```
116
+ jl_value_t *any_unoptimized_call(jl_value_t *, jl_value_t **, int);
117
+ ```
118
+ where the first argument is the boxed function object, the second argument is
119
+ an on-stack array of arguments and the third is the number of arguments. Now,
120
+ we could perform a straightforward lowering and emit an alloca for the argument
121
+ array. However, this would betray the SSA nature of the uses at the callsite,
122
+ making optimizations (including GC root placement), significantly harder.
123
+ Instead, we emit it as follows:
124
+ ```
125
+ %bitcast = bitcast @any_unoptimized_call to %jl_value_t *(*)(%jl_value_t *, %jl_value_t *)
126
+ call cc 37 %jl_value_t *%bitcast(%jl_value_t *%arg1, %jl_value_t *%arg2)
127
+ ```
128
+ The special ` cc 37 ` annotation marks the fact that this call site is really using
129
+ jlcall calling convention. This allows us to retain the SSA-ness of the
130
+ uses throughout the optimizer. GC root placement will later lower this call to
131
+ the original C ABI. In the code the calling convention number is represented by
132
+ the ` JLCALL_F_CC ` constant. In addition, there ist the ` JLCALL_CC ` calling
133
+ convention which functions similarly, but omits the first argument.
134
+
135
+ ## GC root placement
136
+
137
+ GC root placement is done by an LLVM late in the pass pipeline. Doing GC root
138
+ placement this late enables LLVM to make more aggressive optimizations around
139
+ code that requires GC roots, as well as allowing us to reduce the number of
140
+ required GC roots and GC root store operations (since LLVM doesn't understand
141
+ our GC, it wouldn't otherwise know what it is and is not allowed to do with
142
+ values stored to the GC frame, so it'll conservatively do very little). As an
143
+ example, consider an error path
144
+ ```
145
+ if some_condition()
146
+ #= Use some variables maybe =#
147
+ error("An error occurred")
148
+ end
149
+ ```
150
+ During constant folding, LLVM may discover that the condition is always false,
151
+ and can remove the basic block. However, if GC root lowering is done early,
152
+ the GC root slots used in the deleted block, as well as any values kept alive
153
+ in those slots only because they were used in the error path, would be kept
154
+ alive by LLVM. By doing GC root lowering late, we give LLVM the license to do
155
+ any of its usual optimizations (constant folding, dead code elimination, etc.),
156
+ without having to worry (too much) about which values may or may not be gc
157
+ tracked.
158
+
159
+ However, in order to be able to do late GC root placement, we need to be able to
160
+ identify a) which pointers are gc tracked and b) all uses of such pointers. The
161
+ goal of the GC placement pass is thus simple:
162
+
163
+ Minimize the number of needed gc roots/stores to them subject to the constraint
164
+ that at every safepoint, any live gc-tracked pointer (i.e. for which there is
165
+ a path after this point that contains a use of this pointer) is in some gc slot.
166
+
167
+ ### Representation
168
+
169
+ The primary difficulty is thus choosing an IR representation that allows us to
170
+ identify gc-tracked pointers and their uses, even after the program has been
171
+ run through the optimizer. Our design makes use of three LLVM features to achieve
172
+ this:
173
+ - Custom address spaces
174
+ - Operand Bundles
175
+ - non-integral pointers
176
+
177
+ Custom address spaces allow us to tag every point with an integer that needs
178
+ to be preserved through optimizations. The compiler may not insert casts between
179
+ address spaces that did not exist in the original program and it must never
180
+ change the address space of a pointer on a load/store/etc operation. This allows
181
+ us to annotate which pointers are gc-tracked in an optimizer-resistant way. Note
182
+ that metadata would not be able to achieve the same purpose. Metadata is supposed
183
+ to always be discardable without altering the semantics of the program. However,
184
+ failing to identify a gc-tracked pointer alters the resulting program behavior
185
+ dramatically - it'll probably crash or return wrong results. We currently use
186
+ three different addressspaces (their numbers are defined in src/codegen_shared.cpp):
187
+
188
+ - GC Tracked Pointers (currently 10): These are pointers to boxed values that may be put
189
+ into a GC frame. It is loosely equivalent to a ` jl_value_t* ` pointer on the C
190
+ side. N.B. It is illegal to ever have a pointer in this address space that may
191
+ not be stored to a GC slot.
192
+ - Derived Pointers (currently 11): These are pointers that are derived from some GC
193
+ tracked pointer. Uses of these pointers generate uses of the original pointer.
194
+ However, they need not themselves be known to the GC. The GC root placement
195
+ pass MUST always find the GC tracked pointer from which this pointer is
196
+ derived and use that as the pointer to root.
197
+ - Callee Rooted Pointers (currently 12): This is a utility address space to express the
198
+ notion of a callee rooted value. All values of this address space MUST be
199
+ storable to a GC root (though it is possible to relax this condition in the
200
+ future), but unlike the other pointers need not be rooted if passed to a
201
+ call (they do still need to be rooted if they are live across another safepoint
202
+ between the definition and the call).
203
+
204
+ ### Invariants.
205
+ The GC root placement pass makes use of several invariants, which need
206
+ to be observed by the frontend and are preserved by the optimizer.
207
+
208
+ First, only the following addressspace casts are allowed
209
+ - 0->{Tracked,Derived,CalleeRooted}: It is allowable to decay an untracked pointer to any of the
210
+ other. However, do note that the optimizer has broad license to not root
211
+ such a value. It is never safe to have a value in addressspace 0 in any part
212
+ of the program if it is (or is derived from) a value that requires a GC root.
213
+ - Tracked->Derived: This is the standard decay route for interior values. The placement
214
+ pass will look for these to identify the base pointer for any use.
215
+ - Tracked->CalleeRooted: Addrspace CalleeRooted serves merely as a hint that a GC root is not
216
+ required. However, do note that the Derived->CalleeRooted decay is prohibited, since
217
+ pointers should generally be storable to a GC slot, even in this address space.
218
+
219
+ Now let us consider what constitutes a use:
220
+ - Loads whose loaded values is in one of the address spaces
221
+ - Stores of a value in one of the address spaces to a location
222
+ - Stores to a pointer in one of the address spaces
223
+ - Calls for which a value in one of the address spaces is an operand
224
+ - Calls in jlcall ABI, for which the argument array contains a value
225
+ - Return instructions.
226
+
227
+ We explicitly allow load/stores and simple calls in address spaces Tracked/Derived. Elements of jlcall
228
+ argument arrays must always be in address space Tracked (it is required by the ABI that
229
+ they are valid ` jl_value_t* ` pointers). The same is true for return instructions
230
+ (though note that struct return arguments are allowed to have any of the address
231
+ spaces). The only allowable use of an address space CalleRooted pointer is to pass it to
232
+ a call (which must have an appropriately typed operand).
233
+
234
+ Further, we disallow getelementptr in addrspace Tracked. This is because unless
235
+ the operation is a noop, the resulting pointer will not be validly storable
236
+ to a GC slot and may thus not be in this address space. If such a pointer
237
+ is required, it should be decayed to addrspace Derived first.
238
+
239
+ Lastly, we disallow inttoptr/ptrtoint instructions in these address spaces.
240
+ Having these instructions would mean that some i64 values are really gc tracked.
241
+ This is problematic, because it breaks that stated requirement that we're able
242
+ to identify gc-relevant pointers. This invariant is accomplished using the LLVM
243
+ "non-integral pointers" feature, which is new in LLVM 5.0. It prohibits the
244
+ optimizer from making optimizations that would introduce these operations. Note
245
+ we can still insert static constants at JIT time by using inttoptr in address
246
+ space 0 and then decaying to the appropriate address space afterwards.
247
+
248
+ ### Supporting ccall
249
+ One important aspect missing from the discussion so far is the handling of
250
+ ` ccall ` . ` ccall ` has the peculiar feature that the location and scope of a use
251
+ do not coincide. As an example consider:
252
+ ```
253
+ A = randn(1024)
254
+ ccall(:foo, Void, (Ptr{Float64},), A)
255
+ ```
256
+ In lowering, the compiler will insert a conversion from the array to the
257
+ pointer which drops the reference to the array value. However, we of course
258
+ need to make sure that the array does stay alive while we're doing the ccall.
259
+ To understand how this is done, first recall the lowering of the above code:
260
+ ```
261
+ return $(Expr(:foreigncall, :(:foo), Void, svec(Ptr{Float64}), :($(Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), :(A), 0))), :(A)))
262
+ ```
263
+ The last ` :(A) ` , is an extra argument list inserted during lowering that informs
264
+ the code generator which julia level values need to be kept alive for the
265
+ duration of this ccall. We then take this information and represent it in an
266
+ "operand bundle" at the IR level. An operand bundle is essentially a fake use
267
+ that is attached to the call site. At the IR level, this looks like so:
268
+ ```
269
+ call void inttoptr (i64 ... to void (double*)*)(double* %5) [ "jl_roots"(%jl_value_t addrspace(10)* %A) ]
270
+ ```
271
+ The GC root placement pass will treat the jl_roots operand bundle as if it were
272
+ a regular operand. However, as a final step, after the gc roots are inserted,
273
+ it will drop the operand bundle to avoid confusing instruction selection.
274
+
275
+ ### Supporting pointer_from_objref
276
+ ` pointer_from_objref ` is special because it requires the user to take explicit
277
+ control of GC rooting. By our above invariants, this function is illegal,
278
+ because it performs an addressspace cast from 10 to 0. However, it can be useful,
279
+ in certain situations, so we provide a special intrinsic:
280
+ ```
281
+ declared %jl_value_t *julia.pointer_from_objref(%jl_value_t addrspace(10)*)
282
+ ```
283
+ which is lowered to the corresponding address space cast after gc root lowering.
284
+ Do note however that by using this intrinsic, the caller assumes all responsibility
285
+ for making sure that the value in question is rooted. Further this intrinsic is
286
+ not considered a use, so the GC root placement pass will not provide a GC root
287
+ for the function. As a result, the external rooting must be arranged while the
288
+ value is still tracked by the system. I.e. it is not valid to attempt use the
289
+ result of this operation to establish a global root - the optimizer may have
290
+ already dropped the value.
0 commit comments