Skip to content

Commit eac043a

Browse files
committed
add union-splitting commit comment to devdocs
starting a document describing the overall code-generator structure
1 parent 6131a0f commit eac043a

File tree

4 files changed

+126
-5
lines changed

4 files changed

+126
-5
lines changed

doc/make.jl

+3-2
Original file line numberDiff line numberDiff line change
@@ -87,12 +87,13 @@ const PAGES = [
8787
"devdocs/reflection.md",
8888
"Documentation of Julia's Internals" => [
8989
"devdocs/init.md",
90-
"devdocs/eval.md",
9190
"devdocs/ast.md",
9291
"devdocs/types.md",
9392
"devdocs/object.md",
94-
"devdocs/functions.md",
93+
"devdocs/eval.md",
9594
"devdocs/callconv.md",
95+
"devdocs/compiler.md",
96+
"devdocs/functions.md",
9697
"devdocs/cartesian.md",
9798
"devdocs/meta.md",
9899
"devdocs/subarrays.md",

doc/src/devdocs/compiler.md

+119
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
# High-level Overview of the Native-Code Generation Process
2+
3+
4+
<placeholder>
5+
6+
7+
## Representation of Pointers
8+
9+
When emitting code to an object file, pointers will be emitted as relocations.
10+
The deserialization code will ensure any object that pointed to one of these constants
11+
gets recreated and contains the right runtime pointer.
12+
13+
Otherwise, they will be emitted as literal constants.
14+
15+
To emit one of these objects, call `literal_pointer_val`.
16+
It'll handle tracking the Julia value and the LLVM global,
17+
ensuring they are valid both for the current runtime and after deserialization.
18+
19+
When emitted into the object file, these globals are stored as references
20+
in a large `gvals` table. This allows the deserializer to reference them by index,
21+
and implement a custom manual GOT-like mechanism to restore them.
22+
23+
Function pointers are handled similarly.
24+
They are stored as values in a large `fvals` table.
25+
Like globals, this allows the deserializer to reference them by index.
26+
27+
Note that extern functions are handled separately,
28+
with names, via the usual symbol resolution mechanism in the linker.
29+
30+
Note too that ccall functions are also handled separately,
31+
via a manual GOT + PLT.
32+
33+
34+
## Representation of Intermediate Values
35+
36+
Values are passed around in a `jl_cgval_t` struct.
37+
This represents an R-value, and includes enough information to
38+
determine how to assign or pass it somewhere.
39+
40+
They are created via one of the helper constructors, usually:
41+
`mark_julia_type` (for immediate values) and `mark_julia_slot` (for pointers to values).
42+
43+
The function `convert_julia_type` can transform between any two types.
44+
It returns an R-value with `cgval.typ` set to `typ`.
45+
It'll cast the object to the requested representation,
46+
making heap boxes, allocating stack copies, and computing tagged unions as
47+
needed to change the representation.
48+
49+
By contrast `update_julia_type` will change `cgval.typ` to `typ`,
50+
only if it can be done at zero-cost (i.e. without emitting any code).
51+
52+
53+
## Union representation
54+
55+
Inferred union types may be stack allocated via a tagged type representation.
56+
57+
The primitive routines that need to be able to handle tagged unions are:
58+
- mark-type
59+
- load-local
60+
- store-local
61+
- isa
62+
- is
63+
- emit_typeof
64+
- emit_sizeof
65+
- boxed
66+
- unbox
67+
- specialized cc-ret
68+
69+
Everything else should be possible to handle in inference by using these
70+
primitives to implement union-splitting.
71+
72+
The representation of the tagged-union is as a pair
73+
of `< void* union, byte selector >`.
74+
The selector is fixed-size as `byte & 0x7f`,
75+
and will union-tag the first 126 isbits.
76+
It records the one-based depth-first count into the type-union of the
77+
isbits objects inside. An index of zero indicates that the `union*` is
78+
actually a tagged heap-allocated `jl_value_t*`,
79+
and needs to be treated as normal for a boxed object rather than as a
80+
tagged union.
81+
82+
The high bit of the selector (`byte & 0x80`) can be tested to determine if the
83+
`void*` is actually a heap-allocated (`jl_value_t*`) box,
84+
thus avoiding the cost of re-allocating a box,
85+
while maintaining the ability to efficiently handle union-splitting based on the low bits.
86+
87+
It is guaranteed that `byte & 0x7f` is an exact test for the type,
88+
if the value can be represented by a tag – it will never be marked `byte = 0x80`.
89+
It is not necessary to also test the type-tag when testing `isa`.
90+
91+
The `union*` memory region may be allocated at *any* size.
92+
The only constraint is that it is big enough to contain the data
93+
currently specified by `selector`.
94+
It might not be big enough to contain the union of all types that
95+
could be stored there according to the associated Union type field.
96+
Use appropriate care when copying.
97+
98+
99+
## Specialized Calling Convention Signature Representation
100+
101+
A `jl_returninfo_t` object describes the calling convention details of any callable.
102+
103+
If any of the arguments or return type of a method can be represented unboxed,
104+
and the method is not varargs, it'll be given an optimized calling convention
105+
signature based on its `specTypes` and `rettype` fields.
106+
107+
The general principles are that:
108+
109+
- Primitive types get passed in int/float registers.
110+
- Tuples of VecElement types get passed in vector registers.
111+
- Structs get passed on the stack.
112+
- Return values are handle similarly to arguments,
113+
with a size-cutoff at which they will instead be returned via a hidden sret argument.
114+
115+
The total logic for this is implemented by `get_specsig_function` and `deserves_sret`.
116+
117+
Additionally, if the return type is a union, it may be returned as a pair of values (a pointer and a tag).
118+
If the union values can be stack-allocated, then sufficient space to store them will also be passed as a hidden first argument.
119+
It is up to the callee whether the returned pointer will point to this space, a boxed object, or even other constant memory.

doc/src/index.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -71,12 +71,13 @@
7171
* [Reflection and introspection](@ref)
7272
* Documentation of Julia's Internals
7373
* [Initialization of the Julia runtime](@ref)
74-
* [Eval of Julia code](@ref)
7574
* [Julia ASTs](@ref)
7675
* [More about types](@ref)
7776
* [Memory layout of Julia Objects](@ref)
78-
* [Julia Functions](@ref)
77+
* [Eval of Julia code](@ref)
7978
* [Calling Conventions](@ref)
79+
* [High-level Overview of the Native-Code Generation Process](@ref)
80+
* [Julia Functions](@ref)
8081
* [Base.Cartesian](@ref)
8182
* [Talking to the compiler (the `:meta` mechanism)](@ref)
8283
* [SubArrays](@ref)

src/codegen.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -448,7 +448,7 @@ struct jl_cgval_t {
448448
bool isboxed; // whether this value is a jl_value_t* allocated on the heap with the right type tag
449449
bool isghost; // whether this value is "ghost"
450450
bool isimmutable; // V points to something that is definitely immutable (e.g. single-assignment, but including memory)
451-
MDNode *tbaa; // The related tbaa node. Non-NULL iff this is not a pointer.
451+
MDNode *tbaa; // The related tbaa node. Non-NULL iff this holds an address.
452452
bool ispointer() const
453453
{
454454
// whether this value is compatible with `data_pointer`

0 commit comments

Comments
 (0)