Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference-directed codegen via CodeInstance edges and typeinf_ext_toplevel APIs #56880

Merged
merged 5 commits into from
Jan 2, 2025

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Dec 21, 2024

This is building upon the many efforts around using CodeInstance everywhere (especially as the return value from jl_type_infer and the input format to edges) by moving a lot of hard-coded algorithms that were previously in C (such as recursive_compile_graph and jl_ci_cache_lookup), and which were therefore previously also slightly broken (especially with concurrent environments), into Julia's Compiler.jl code, where we can most likely maintain them much better going forward. See descriptions in the individual commits for some of the specifics of the changes and fixes, and how to change existing code to use these API correctly. In followup stages, most code relevant to precompile_utils, trim, and even the allocation-checker should consider being moved into Julia also now, since being written in C/C++ is currently providing negative value for maintaining those, and the change in the API boundary should now make that additional conversion easier.

Gives a considerably smaller system image, despite having more code, by being better algorithms which avoid allocating permanent garbage: 155 MB -> 147MB in .text

Makes a slightly larger Pkg.ji file cache, hopefully mainly due to being more strategic about what code is compiled (because that logic is mostly now in Julia instead of C), as it appear to have both inferred and compiled about 10% more code according to high level analysis of it:
$ du -sh usr/share/julia/compiled/v1.12/
237M # on PR
222M # on master

@vtjnash vtjnash added compiler:codegen Generation of LLVM IR and native code don't squash Don't squash merge labels Dec 21, 2024
src/aotcompile.cpp Outdated Show resolved Hide resolved
src/aotcompile.cpp Show resolved Hide resolved
@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Dec 30, 2024
@vtjnash vtjnash force-pushed the jn/inference-direct-codegen branch from cf7be9c to 1496acf Compare December 30, 2024 02:50
@vtjnash
Copy link
Member Author

vtjnash commented Dec 30, 2024

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@giordano
Copy link
Contributor

giordano commented Dec 30, 2024

The global RNG seed was 0x12ee4f054a7adbbbed0a9b894db3b578.
Error in testset SparseArrays/sparsematrix_ops:
Test Failed at /cache/build/tester-demeter6-12/julialang/julia-master/julia-1496acf023/share/julia/stdlib/v1.12/SparseArrays/test/sparsematrix_ops.jl:532
  Expression: f() == 0
   Evaluated: 64 == 0

failed 3 4 times in a row on i686 Linux, looks relevant. For reference, the test is measuring the bytes allocated in a function, which apparently went from 0 to 64.

@giordano giordano removed the merge me PR is reviewed. Merge when all tests are passing label Dec 30, 2024
@vtjnash vtjnash force-pushed the jn/inference-direct-codegen branch from 1496acf to 7d176fd Compare December 30, 2024 23:25
@vtjnash
Copy link
Member Author

vtjnash commented Dec 30, 2024

Turned out to be easily reproducible on any platform with

using SparseArrays, LinearAlgebra
lu(sprandn(10, 10, 0.99) + I)
function f()
    A = sprandn(10, 10, 0.1)
    X = copy(A)
    return @allocated transpose!(X, A)
end
@assert f() == 0

and also easily fixed (just forgot to copy a conditional correctly). Hopefully good now?

@giordano
Copy link
Contributor

Analyzegc is unhappy

/cache/build/tester-amdci5-15/julialang/julia-master/src/jitlayers.cpp:441:44: error: Argument value may have been GCed [julia.GCChecker]
  441 |                 jl_method_instance_t *mi = jl_get_ci_mi(codeinst);
      |                                            ^            ~~~~~~~~

bool toplevel = !jl_is_method(jl_get_ci_mi(codeinst)->def.method);
if (!toplevel) {
// don't remember toplevel thunks because
// they may not be rooted in the gc for the life of the program,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a finalizer for this instead? (thread safety might make this annoying) Though it would be quite a bit of added complexity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but that introduces some other consistency issues too since the lookup into this map is async. It is probably best just to keep it the way things were.

return ci
end

# This is a bridge for the C code calling `jl_typeinf_func()` on set of Method matches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it method matches or method instances?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A MethodInstance is equivalent to a MethodMatch, since they are trivially convertible (one is the memoization of the other)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does it accept both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, if someone wanted it to. This is hopefully just a temporary bridge until someone rewrites all of precompile_utils.c in julia

Sometimes an edge (especially from precompile file, but sometimes from
inference) will specify a CodeInstance that does not need to be compiled
for its ABI and simply needs to be cloned to point to the existing copy
of it.
This was failing the h_world_age test sometimes.
This avoids unnecessary compression when running (not generating code).
While generating code, we continue the legacy behavior of storing
compressed code, since restarting from a ji without that is quite slow.
Eventually, we should also remove that code also once we have generated
the object file from it.

This replaces the defective SOURCE_MODE_FORCE_SOURCE option with a new
`typeinf_ext_toplevel` batch-mode interface for compilation which
returns all required source code. Only two options remain now:
SOURCE_MODE_NOT_REQUIRED :
    Require only that the IPO information (e.g. rettype and friends) is
    present.
SOURCE_MODE_FORCE_ABI :
    Require that the IPO information is present (for ABI computation)
    and that the returned CodeInstance can be invoked on the host target
    (preferably after inference, called directly, but perfectly
    acceptable for Base.Compiler to instead force the runtime to use a
    stub there or call into it with the interpreter instead by having
    failed to provide any code).

This replaces the awkward `jl_create_native` interface (which is now
just a shim for calling the new batch-mode `typeinf_ext_toplevel`) with
a simpler `jl_emit_native` API, which does not do any inference or other
callbacks, but simply is a batch-mode call to `jl_emit_codeinfo` and
the work to build the external wrapper around them for linkage.
@vtjnash vtjnash force-pushed the jn/inference-direct-codegen branch from 7d176fd to 9876823 Compare January 2, 2025 16:22
@vtjnash vtjnash merged commit ed2cb49 into master Jan 2, 2025
7 checks passed
vtjnash added a commit that referenced this pull request Jan 2, 2025
…level APIs (#56880)

This is building upon the many efforts around using CodeInstance
everywhere (especially as the return value from jl_type_infer and the
input format to edges) by moving a lot of hard-coded algorithms that
were previously in C (such as `recursive_compile_graph` and
`jl_ci_cache_lookup`), and which were therefore previously also slightly
broken (especially with concurrent environments), into Julia's
Compiler.jl code, where we can most likely maintain them much better
going forward. See descriptions in the individual commits for some of
the specifics of the changes and fixes, and how to change existing code
to use these API correctly. In followup stages, most code relevant to
precompile_utils, trim, and even the allocation-checker should consider
being moved into Julia also now, since being written in C/C++ is
currently providing negative value for maintaining those, and the change
in the API boundary should now make that additional conversion easier.

Gives a considerably smaller system image, despite having more code, by
being better algorithms which avoid allocating permanent garbage: 155 MB
-> 147MB in `.text`

Makes a slightly larger Pkg.ji file cache, hopefully mainly due to being
more strategic about what code is compiled (because that logic is mostly
now in Julia instead of C), as it appear to have both inferred and
compiled about 10% more code according to high level analysis of it:
$ du -sh usr/share/julia/compiled/v1.12/
237M    # on PR
222M    # on master
@vtjnash vtjnash deleted the jn/inference-direct-codegen branch January 2, 2025 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code don't squash Don't squash merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants