@@ -50,8 +50,8 @@ development version of LLVM.
50
50
51
51
## Passing options to LLVM
52
52
53
- You can pass options to LLVM using * debug* builds of Julia. To create a debug build, run ` make debug ` .
54
- The resulting executable is ` usr/bin/julia-debug ` . You can pass LLVM options to this executable
53
+ You can pass options to LLVM using * debug* builds of Julia. To create a debug build, run ` make debug ` .
54
+ The resulting executable is ` usr/bin/julia-debug ` . You can pass LLVM options to this executable
55
55
via the environment variable ` JULIA_LLVM_ARGS ` . Here are example settings using ` bash ` syntax:
56
56
57
57
* ` export JULIA_LLVM_ARGS = -print-after-all ` dumps IR after each pass.
@@ -63,29 +63,29 @@ via the environment variable `JULIA_LLVM_ARGS`. Here are example settings using
63
63
## Debugging LLVM transformations in isolation
64
64
65
65
On occasion, it can be useful to debug LLVM's transformations in isolation from
66
- the rest of the julia system, e.g. because reproducing the issue inside julia
66
+ the rest of the Julia system, e.g. because reproducing the issue inside Julia
67
67
would take too long, or because one wants to take advantage of LLVM's tooling
68
68
(e.g. bugpoint). To get unoptimized IR for the entire system iamge, pass the
69
69
` --output-unopt-bc unopt.bc ` option to the system image build process, which will
70
70
output the unoptimized IR to an ` unopt.bc ` file. This file can then be passed to
71
71
LLVM tools as usual. ` libjulia ` can function as an LLVM pass plugin and can be
72
72
loaded into LLVM tools, to make julia-specific passes available in this
73
73
environment. In addition, it exposes the ` -julia ` meta-pass, which runs the
74
- entire julia pass-pipeline over the IR. As an example, to generate a system
74
+ entire Julia pass-pipeline over the IR. As an example, to generate a system
75
75
image, one could do:
76
76
```
77
77
opt -load libjulia.so -julia -o opt.bc unopt.bc
78
78
llc -o sys.o opt.bc
79
79
cc -shared -o sys.so sys.o
80
80
```
81
- This system image can then be loaded by julia as usual.
81
+ This system image can then be loaded by Julia as usual.
82
82
83
- It is also possible to dump an LLVM IR module for just one julia function,
83
+ It is also possible to dump an LLVM IR module for just one Julia function,
84
84
using:
85
- ```
85
+ ``` julia
86
86
f, T = + , Tuple{Int,Int} # Substitute your function of interest here
87
87
optimize = false
88
- open("plus.ll","w") do f
88
+ open (" plus.ll" , " w" ) do f
89
89
println (f, Base. _dump_function (f, T, false , false , false , true , :att , optimize))
90
90
end
91
91
```
@@ -112,36 +112,36 @@ The last step is labor intensive. Suggestions on a better way would be apprecia
112
112
113
113
Julia has a generic calling convention for unoptimized code, which looks somewhat
114
114
as follows:
115
- ```
115
+ ``` c
116
116
jl_value_t *any_unoptimized_call (jl_value_t * , jl_value_t ** , int);
117
117
```
118
118
where the first argument is the boxed function object, the second argument is
119
119
an on-stack array of arguments and the third is the number of arguments. Now,
120
120
we could perform a straightforward lowering and emit an alloca for the argument
121
- array. However, this would betray the SSA nature of the uses at the callsite ,
121
+ array. However, this would betray the SSA nature of the uses at the call site ,
122
122
making optimizations (including GC root placement), significantly harder.
123
123
Instead, we emit it as follows:
124
- ```
124
+ ```llvm
125
125
%bitcast = bitcast @any_unoptimized_call to %jl_value_t *(*)(%jl_value_t *, %jl_value_t *)
126
126
call cc 37 %jl_value_t *%bitcast(%jl_value_t *%arg1, %jl_value_t *%arg2)
127
127
```
128
128
The special ` cc 37 ` annotation marks the fact that this call site is really using
129
- jlcall calling convention. This allows us to retain the SSA-ness of the
129
+ the jlcall calling convention. This allows us to retain the SSA-ness of the
130
130
uses throughout the optimizer. GC root placement will later lower this call to
131
131
the original C ABI. In the code the calling convention number is represented by
132
- the ` JLCALL_F_CC ` constant. In addition, there ist the ` JLCALL_CC ` calling
132
+ the ` JLCALL_F_CC ` constant. In addition, there is the ` JLCALL_CC ` calling
133
133
convention which functions similarly, but omits the first argument.
134
134
135
135
## GC root placement
136
136
137
- GC root placement is done by an LLVM late in the pass pipeline. Doing GC root
137
+ GC root placement is done by an LLVM pass late in the pass pipeline. Doing GC root
138
138
placement this late enables LLVM to make more aggressive optimizations around
139
139
code that requires GC roots, as well as allowing us to reduce the number of
140
140
required GC roots and GC root store operations (since LLVM doesn't understand
141
141
our GC, it wouldn't otherwise know what it is and is not allowed to do with
142
142
values stored to the GC frame, so it'll conservatively do very little). As an
143
143
example, consider an error path
144
- ```
144
+ ``` julia
145
145
if some_condition ()
146
146
#= Use some variables maybe =#
147
147
error (" An error occurred" )
@@ -153,37 +153,37 @@ the GC root slots used in the deleted block, as well as any values kept alive
153
153
in those slots only because they were used in the error path, would be kept
154
154
alive by LLVM. By doing GC root lowering late, we give LLVM the license to do
155
155
any of its usual optimizations (constant folding, dead code elimination, etc.),
156
- without having to worry (too much) about which values may or may not be gc
156
+ without having to worry (too much) about which values may or may not be GC
157
157
tracked.
158
158
159
159
However, in order to be able to do late GC root placement, we need to be able to
160
160
identify a) which pointers are gc tracked and b) all uses of such pointers. The
161
161
goal of the GC placement pass is thus simple:
162
162
163
- Minimize the number of needed gc roots/stores to them subject to the constraint
164
- that at every safepoint, any live gc -tracked pointer (i.e. for which there is
165
- a path after this point that contains a use of this pointer) is in some gc slot.
163
+ Minimize the number of needed GC roots/stores to them subject to the constraint
164
+ that at every safepoint, any live GC -tracked pointer (i.e. for which there is
165
+ a path after this point that contains a use of this pointer) is in some GC slot.
166
166
167
167
### Representation
168
168
169
169
The primary difficulty is thus choosing an IR representation that allows us to
170
- identify gc -tracked pointers and their uses, even after the program has been
170
+ identify GC -tracked pointers and their uses, even after the program has been
171
171
run through the optimizer. Our design makes use of three LLVM features to achieve
172
172
this:
173
173
- Custom address spaces
174
174
- Operand Bundles
175
- - non -integral pointers
175
+ - Non -integral pointers
176
176
177
177
Custom address spaces allow us to tag every point with an integer that needs
178
178
to be preserved through optimizations. The compiler may not insert casts between
179
179
address spaces that did not exist in the original program and it must never
180
180
change the address space of a pointer on a load/store/etc operation. This allows
181
- us to annotate which pointers are gc -tracked in an optimizer-resistant way. Note
181
+ us to annotate which pointers are GC -tracked in an optimizer-resistant way. Note
182
182
that metadata would not be able to achieve the same purpose. Metadata is supposed
183
183
to always be discardable without altering the semantics of the program. However,
184
- failing to identify a gc -tracked pointer alters the resulting program behavior
184
+ failing to identify a GC -tracked pointer alters the resulting program behavior
185
185
dramatically - it'll probably crash or return wrong results. We currently use
186
- three different addressspaces (their numbers are defined in src/codegen_shared.cpp):
186
+ three different address spaces (their numbers are defined in ` src/codegen_shared.cpp ` ):
187
187
188
188
- GC Tracked Pointers (currently 10): These are pointers to boxed values that may be put
189
189
into a GC frame. It is loosely equivalent to a ` jl_value_t* ` pointer on the C
@@ -201,14 +201,15 @@ three different addressspaces (their numbers are defined in src/codegen_shared.c
201
201
call (they do still need to be rooted if they are live across another safepoint
202
202
between the definition and the call).
203
203
204
- ### Invariants.
204
+ ### Invariants
205
+
205
206
The GC root placement pass makes use of several invariants, which need
206
207
to be observed by the frontend and are preserved by the optimizer.
207
208
208
- First, only the following addressspace casts are allowed
209
+ First, only the following address space casts are allowed:
209
210
- 0->{Tracked,Derived,CalleeRooted}: It is allowable to decay an untracked pointer to any of the
210
- other . However, do note that the optimizer has broad license to not root
211
- such a value. It is never safe to have a value in addressspace 0 in any part
211
+ others . However, do note that the optimizer has broad license to not root
212
+ such a value. It is never safe to have a value in address space 0 in any part
212
213
of the program if it is (or is derived from) a value that requires a GC root.
213
214
- Tracked->Derived: This is the standard decay route for interior values. The placement
214
215
pass will look for these to identify the base pointer for any use.
@@ -228,63 +229,65 @@ We explicitly allow load/stores and simple calls in address spaces Tracked/Deriv
228
229
argument arrays must always be in address space Tracked (it is required by the ABI that
229
230
they are valid ` jl_value_t* ` pointers). The same is true for return instructions
230
231
(though note that struct return arguments are allowed to have any of the address
231
- spaces). The only allowable use of an address space CalleRooted pointer is to pass it to
232
+ spaces). The only allowable use of an address space CalleeRooted pointer is to pass it to
232
233
a call (which must have an appropriately typed operand).
233
234
234
- Further, we disallow getelementptr in addrspace Tracked. This is because unless
235
+ Further, we disallow ` getelementptr ` in addrspace Tracked. This is because unless
235
236
the operation is a noop, the resulting pointer will not be validly storable
236
237
to a GC slot and may thus not be in this address space. If such a pointer
237
238
is required, it should be decayed to addrspace Derived first.
238
239
239
- Lastly, we disallow inttoptr/ ptrtoint instructions in these address spaces.
240
- Having these instructions would mean that some i64 values are really gc tracked.
240
+ Lastly, we disallow ` inttoptr ` / ` ptrtoint ` instructions in these address spaces.
241
+ Having these instructions would mean that some ` i64 ` values are really GC tracked.
241
242
This is problematic, because it breaks that stated requirement that we're able
242
- to identify gc -relevant pointers. This invariant is accomplished using the LLVM
243
+ to identify GC -relevant pointers. This invariant is accomplished using the LLVM
243
244
"non-integral pointers" feature, which is new in LLVM 5.0. It prohibits the
244
245
optimizer from making optimizations that would introduce these operations. Note
245
- we can still insert static constants at JIT time by using inttoptr in address
246
+ we can still insert static constants at JIT time by using ` inttoptr ` in address
246
247
space 0 and then decaying to the appropriate address space afterwards.
247
248
248
249
### Supporting ccall
250
+
249
251
One important aspect missing from the discussion so far is the handling of
250
252
` ccall ` . ` ccall ` has the peculiar feature that the location and scope of a use
251
253
do not coincide. As an example consider:
252
- ```
254
+ ``` julia
253
255
A = randn (1024 )
254
256
ccall (:foo , Void, (Ptr{Float64},), A)
255
257
```
256
258
In lowering, the compiler will insert a conversion from the array to the
257
259
pointer which drops the reference to the array value. However, we of course
258
- need to make sure that the array does stay alive while we're doing the ccall.
260
+ need to make sure that the array does stay alive while we're doing the ` ccall ` .
259
261
To understand how this is done, first recall the lowering of the above code:
260
- ```
262
+ ``` julia
261
263
return $ (Expr (:foreigncall , :(:foo ), Void, svec (Ptr{Float64}), :($ (Expr (:foreigncall , :(:jl_array_ptr ), Ptr{Float64}, svec (Any), :(A), 0 ))), :(A)))
262
264
```
263
265
The last ` :(A) ` , is an extra argument list inserted during lowering that informs
264
- the code generator which julia level values need to be kept alive for the
265
- duration of this ccall. We then take this information and represent it in an
266
+ the code generator which Julia level values need to be kept alive for the
267
+ duration of this ` ccall ` . We then take this information and represent it in an
266
268
"operand bundle" at the IR level. An operand bundle is essentially a fake use
267
269
that is attached to the call site. At the IR level, this looks like so:
268
- ```
270
+ ``` llvm
269
271
call void inttoptr (i64 ... to void (double*)*)(double* %5) [ "jl_roots"(%jl_value_t addrspace(10)* %A) ]
270
272
```
271
- The GC root placement pass will treat the jl_roots operand bundle as if it were
272
- a regular operand. However, as a final step, after the gc roots are inserted,
273
+ The GC root placement pass will treat the ` jl_roots ` operand bundle as if it were
274
+ a regular operand. However, as a final step, after the GC roots are inserted,
273
275
it will drop the operand bundle to avoid confusing instruction selection.
274
276
275
277
### Supporting pointer_from_objref
278
+
276
279
` pointer_from_objref ` is special because it requires the user to take explicit
277
280
control of GC rooting. By our above invariants, this function is illegal,
278
- because it performs an addressspace cast from 10 to 0. However, it can be useful,
281
+ because it performs an address space cast from 10 to 0. However, it can be useful,
279
282
in certain situations, so we provide a special intrinsic:
280
- ```
283
+ ``` llvm
281
284
declared %jl_value_t *julia.pointer_from_objref(%jl_value_t addrspace(10)*)
282
285
```
283
- which is lowered to the corresponding address space cast after gc root lowering.
286
+ which is lowered to the corresponding address space cast after GC root lowering.
284
287
Do note however that by using this intrinsic, the caller assumes all responsibility
285
288
for making sure that the value in question is rooted. Further this intrinsic is
286
289
not considered a use, so the GC root placement pass will not provide a GC root
287
290
for the function. As a result, the external rooting must be arranged while the
288
- value is still tracked by the system. I.e. it is not valid to attempt use the
291
+ value is still tracked by the system. I.e. it is not valid to attempt to use the
289
292
result of this operation to establish a global root - the optimizer may have
290
293
already dropped the value.
0 commit comments