Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add precompile statements #203

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
*.jl.cov
*.jl.*.cov
*.jl.mem
Manifest.toml
Manifest.toml
LocalPreferences.toml
6 changes: 5 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
name = "TensorOperations"
uuid = "6aa20fa7-93e2-5fca-9bc0-fbd0db3c71a2"
authors = ["Lukas Devos <[email protected]>", "Maarten Van Damme <[email protected]>", "Jutho Haegeman <[email protected]>"]
version = "5.1.4"
version = "5.2.0"

[deps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
LRUCache = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
PackageExtensionCompat = "65ce6f38-6b18-4e1d-a461-8949797d7930"
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
Preferences = "21216c6a-2e73-6563-6e65-726566657250"
PtrArrays = "43287f4e-b6f4-7ad1-bb20-aadabca52c3d"
Strided = "5e0ebb24-38b0-5f93-81fe-25c709ecae67"
StridedViews = "4db3bf67-4bd7-4b4e-b153-31dc3fb37143"
Expand Down Expand Up @@ -38,6 +40,8 @@ LRUCache = "1"
LinearAlgebra = "1.6"
Logging = "1.6"
PackageExtensionCompat = "1"
PrecompileTools = "1.1"
Preferences = "1.4"
PtrArrays = "1.2"
Random = "1"
Strided = "2.2"
Expand Down
3 changes: 2 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ makedocs(; modules=[TensorOperations],
"man/interface.md",
"man/backends.md",
"man/autodiff.md",
"man/implementation.md"],
"man/implementation.md",
"man/precompilation.md"],
"Index" => "index/index.md"])

# Documenter can also automatically deploy documentation to gh-pages.
Expand Down
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
## Table of contents

```@contents
Pages = ["index.md", "man/indexnotation.md", "man/functions.md", "man/interface.md", "man/backends.md", "man/autodiff.md", "man/implementation.md"]
Pages = ["index.md", "man/indexnotation.md", "man/functions.md", "man/interface.md", "man/backends.md", "man/autodiff.md", "man/implementation.md", "man/precompilation.md"]
Depth = 4
```

Expand Down
50 changes: 50 additions & 0 deletions docs/src/man/precompilation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Precompilation

TensorOperations.jl has some support for precompiling commonly called functions.
The guiding philosophy is that often, tensor contractions are (part of) the bottlenecks of typical workflows,
and as such we want to maximize performance. As a result, we are choosing to specialize many functions which
may lead to a rather large time-to-first-execution (TTFX). In order to mitigate this, some of that work can
be moved to precompile-time, avoiding the need to re-compile these specializations for every fresh Julia session.

Nevertheless, TensorOperations is designed to work with a large variety of input types, and simply enumerating
all of these tends to lead to prohibitively large precompilation times, as well as large system images.
Therefore, there is some customization possible to tweak the desired level of precompilation, trading in
faster precompile times for fast TTFX for a wider range of inputs.

!!! compat "TensorOperations v5.2.0"

Precompilation support requires at least TensorOperations v5.2.0.

## Defaults

By default, precompilation is enabled for "tensors" of type `Array{T,N}`, where `T` and `N` range over the following values:

* `T` is either `Float64` or `ComplexF64`
* `tensoradd!` is precompiled up to `N = 5`
* `tensortrace!` is precompiled up to `4` free output indices and `2` pairs of traced indices
* `tensorcontract!` is precompiled up to `3` free output indices on both inputs, and `2` contracted indices

## Custom settings

The default precompilation settings can be tweaked to allow for more or less expansive coverage. This is achieved
through a combination of `PrecompileTools`- and `Preferences`-based functionality.

To disable precompilation altogether, for example during development or when you prefer to have small binaries,
you can *locally* change the `"precompile_workload"` key in the preferences.

```julia
using TensorOperations, Preferences
set_preferences!(TensorOperations, "precompile_workload" => false; force=true)
```

Alternatively, you can keep precompilation enabled, change the settings above through the same machinery, via:

* `"precomple_eltypes"`: a `Vector{String}` that evaluate to the desired values of `T<:Number`
* `"precompile_add_ndims"`: an `Int` to specify the maximum `N` for `tensoradd!`
* `"precompile_trace_ndims"`: a `Vector{Int}` of length 2 to specify the maximal number of free and traced indices for `tensortrace!`.
* `"precompile_contract_ndims"`: a `Vector{Int}` of length 2 to specify the maximal number of free and contracted indices for `tensorcontract!`.

!!! note "Backends"

Currently, there is no support for precompiling methods that do not use the default backend. If this is a
feature you would find useful, feel free to contact us or open an issue.
10 changes: 10 additions & 0 deletions ext/TensorOperationsBumperExt.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,16 @@ module TensorOperationsBumperExt
using TensorOperations
using Bumper

# Hack to normalize StridedView type to avoid too many specializations
# This is allowed because bumper ensures that the pointer won't be GC'd
# and we never return `parent(SV)` anyways.
@static if isdefined(Core, :Memory)
function TensorOperations.wrap_stridedview(A::Bumper.UnsafeArray)
mem_A = Base.unsafe_wrap(Memory{eltype(A)}, pointer(A), length(A))
return TensorOperations.StridedView(mem_A, size(A), strides(A), 0, identity)
end
end

function TensorOperations.tensoralloc(::Type{A}, structure, ::Val{istemp},
buf::Union{SlabBuffer,AllocBuffer}) where {A<:AbstractArray,
istemp}
Expand Down
2 changes: 2 additions & 0 deletions src/TensorOperations.jl
Original file line number Diff line number Diff line change
Expand Up @@ -77,4 +77,6 @@ function __init__()
@require_extensions
end

include("precompile.jl")

end # module
24 changes: 14 additions & 10 deletions src/implementation/blascontract.jl
Original file line number Diff line number Diff line change
Expand Up @@ -47,16 +47,21 @@ function _blas_contract!(C, A, pA, B, pB, pAB, α, β, backend, allocator)
flagC = isblasdestination(C, ipAB)
if flagC
C_ = C
_unsafe_blas_contract!(C_, A_, pA, B_, pB, ipAB, α, β)
_unsafe_blas_contract!(wrap_stridedview(C_),
wrap_stridedview(A_), pA,
wrap_stridedview(B_), pB,
ipAB, α, β)
else
C_ = SV(tensoralloc_add(TC, C, ipAB, false, Val(true), allocator))
_unsafe_blas_contract!(C_, A_, pA, B_, pB, trivialpermutation(ipAB),
one(TC), zero(TC))
C_ = tensoralloc_add(TC, C, ipAB, false, Val(true), allocator)
_unsafe_blas_contract!(wrap_stridedview(C_),
wrap_stridedview(A_), pA,
wrap_stridedview(B_), pB,
trivialpermutation(ipAB), one(TC), zero(TC))
tensoradd!(C, C_, pAB, false, α, β, backend, allocator)
tensorfree!(C_.parent, allocator)
tensorfree!(C_, allocator)
end
flagA || tensorfree!(A_.parent, allocator)
flagB || tensorfree!(B_.parent, allocator)
flagA || tensorfree!(A_, allocator)
flagB || tensorfree!(B_, allocator)
return C
end

Expand All @@ -81,12 +86,11 @@ function _unsafe_blas_contract!(C::StridedView{T},
return C
end

@inline function makeblascontractable(A, pA, TC, backend, allocator)
function makeblascontractable(A, pA, TC, backend, allocator)
flagA = isblascontractable(A, pA) && eltype(A) == TC
if !flagA
A_ = tensoralloc_add(TC, A, pA, false, Val(true), allocator)
Anew = SV(A_, size(A_), strides(A_), 0, A.op)
Anew = tensoradd!(Anew, A, pA, false, One(), Zero(), backend, allocator)
Anew = tensoradd!(A_, A, pA, false, One(), Zero(), backend, allocator)
pAnew = trivialpermutation(pA)
else
Anew = A
Expand Down
41 changes: 29 additions & 12 deletions src/implementation/diagonal.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,22 @@ function tensorcontract!(C::AbstractArray,
dimcheck_tensorcontract(C, A, pA, B, pB, pAB)

if conjA && conjB
_diagtensorcontract!(SV(C), conj(SV(A)), pA, conj(SV(B.diag)), pB, pAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), conj(wrap_stridedview(A)), pA,
conj(wrap_stridedview(B.diag)), pB,
pAB, α, β)
elseif conjA
_diagtensorcontract!(SV(C), conj(SV(A)), pA, SV(B.diag), pB, pAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), conj(wrap_stridedview(A)), pA,
wrap_stridedview(B.diag),
pB, pAB, α,
β)
elseif conjB
_diagtensorcontract!(SV(C), SV(A), pA, conj(SV(B.diag)), pB, pAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), wrap_stridedview(A), pA,
conj(wrap_stridedview(B.diag)),
pB, pAB, α,
β)
else
_diagtensorcontract!(SV(C), SV(A), pA, SV(B.diag), pB, pAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), wrap_stridedview(A), pA,
wrap_stridedview(B.diag), pB, pAB, α, β)
end
return C
end
Expand All @@ -41,13 +50,17 @@ function tensorcontract!(C::AbstractArray,
TupleTools.getindices(indCinoBA, tpAB[2]))

if conjA && conjB
_diagtensorcontract!(SV(C), conj(SV(B)), rpB, conj(SV(A.diag)), rpA, rpAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), conj(wrap_stridedview(B)), rpB,
conj(wrap_stridedview(A.diag)), rpA, rpAB, α, β)
elseif conjA
_diagtensorcontract!(SV(C), SV(B), rpB, conj(SV(A.diag)), rpA, rpAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), wrap_stridedview(B), rpB,
conj(wrap_stridedview(A.diag)), rpA, rpAB, α, β)
elseif conjB
_diagtensorcontract!(SV(C), conj(SV(B)), rpB, SV(A.diag), rpA, rpAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), conj(wrap_stridedview(B)), rpB,
wrap_stridedview(A.diag), rpA, rpAB, α, β)
else
_diagtensorcontract!(SV(C), SV(B), rpB, SV(A.diag), rpA, rpAB, α, β)
_diagtensorcontract!(wrap_stridedview(C), wrap_stridedview(B), rpB,
wrap_stridedview(A.diag), rpA, rpAB, α, β)
end
return C
end
Expand All @@ -62,13 +75,17 @@ function tensorcontract!(C::AbstractArray,
dimcheck_tensorcontract(C, A, pA, B, pB, pAB)

if conjA && conjB
_diagdiagcontract!(SV(C), conj(SV(A.diag)), pA, conj(SV(B.diag)), pB, pAB, α, β)
_diagdiagcontract!(wrap_stridedview(C), conj(wrap_stridedview(A.diag)), pA,
conj(wrap_stridedview(B.diag)), pB, pAB, α, β)
elseif conjA
_diagdiagcontract!(SV(C), conj(SV(A.diag)), pA, SV(B.diag), pB, pAB, α, β)
_diagdiagcontract!(wrap_stridedview(C), conj(wrap_stridedview(A.diag)), pA,
wrap_stridedview(B.diag), pB, pAB, α, β)
elseif conjB
_diagdiagcontract!(SV(C), SV(A.diag), pA, conj(SV(B.diag)), pB, pAB, α, β)
_diagdiagcontract!(wrap_stridedview(C), wrap_stridedview(A.diag), pA,
conj(wrap_stridedview(B.diag)), pB, pAB, α, β)
else
_diagdiagcontract!(SV(C), SV(A.diag), pA, SV(B.diag), pB, pAB, α, β)
_diagdiagcontract!(wrap_stridedview(C), wrap_stridedview(A.diag), pA,
wrap_stridedview(B.diag), pB, pAB, α, β)
end
return C
end
Expand Down
2 changes: 1 addition & 1 deletion src/implementation/functions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ See also [`tensorcopy`](@ref) and [`tensoradd!`](@ref)
"""
function tensorcopy!(C, A, pA::Index2Tuple, conjA::Bool=false, α::Number=One(),
backend=DefaultBackend(), allocator=DefaultAllocator())
return tensoradd!(C, A, pA, conjA, α, false, backend, allocator)
return tensoradd!(C, A, pA, conjA, α, Zero(), backend, allocator)
end

# ------------------------------------------------------------------------------------------
Expand Down
73 changes: 49 additions & 24 deletions src/implementation/strided.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,51 +38,76 @@ end
#-------------------------------------------------------------------------------------------
# Force strided implementation on AbstractArray instances with Strided backend
#-------------------------------------------------------------------------------------------
const SV = StridedView
function tensoradd!(C::AbstractArray,
A::AbstractArray, pA::Index2Tuple, conjA::Bool,
α::Number, β::Number,
backend::StridedBackend, allocator=DefaultAllocator())

# Wrap any compatible array into a `StridedView` for the implementation.
# Additionally, we normalize the parent types to avoid having to have too many specializations.
# This is allowed because we never return `parent(SV)`, so we can safely wrap anything
# that represents the same data.
wrap_stridedview(A::AbstractArray) = StridedView(A)
@static if isdefined(Core, :Memory)
# For Arrays: we simply use the memory directly
# TODO: can we also do this for views?
wrap_stridedview(A::Array) = StridedView(A.ref.mem, size(A), strides(A), 0, identity)
end

Base.@constprop :none function tensoradd!(C::AbstractArray,
A::AbstractArray, pA::Index2Tuple, conjA::Bool,
α::Number, β::Number,
backend::StridedBackend,
allocator=DefaultAllocator())
# resolve conj flags and absorb into StridedView constructor to avoid type instabilities later on
if conjA
stridedtensoradd!(SV(C), conj(SV(A)), pA, α, β, backend, allocator)
stridedtensoradd!(wrap_stridedview(C), conj(wrap_stridedview(A)), pA, α, β, backend,
allocator)
else
stridedtensoradd!(SV(C), SV(A), pA, α, β, backend, allocator)
stridedtensoradd!(wrap_stridedview(C), wrap_stridedview(A), pA, α, β, backend,
allocator)
end
return C
end

function tensortrace!(C::AbstractArray,
A::AbstractArray, p::Index2Tuple, q::Index2Tuple, conjA::Bool,
α::Number, β::Number,
backend::StridedBackend, allocator=DefaultAllocator())
Base.@constprop :none function tensortrace!(C::AbstractArray,
A::AbstractArray, p::Index2Tuple,
q::Index2Tuple, conjA::Bool,
α::Number, β::Number,
backend::StridedBackend,
allocator=DefaultAllocator())
# resolve conj flags and absorb into StridedView constructor to avoid type instabilities later on
if conjA
stridedtensortrace!(SV(C), conj(SV(A)), p, q, α, β, backend, allocator)
stridedtensortrace!(wrap_stridedview(C), conj(wrap_stridedview(A)), p, q, α, β,
backend, allocator)
else
stridedtensortrace!(SV(C), SV(A), p, q, α, β, backend, allocator)
stridedtensortrace!(wrap_stridedview(C), wrap_stridedview(A), p, q, α, β, backend,
allocator)
end
return C
end

function tensorcontract!(C::AbstractArray,
A::AbstractArray, pA::Index2Tuple, conjA::Bool,
B::AbstractArray, pB::Index2Tuple, conjB::Bool,
pAB::Index2Tuple,
α::Number, β::Number,
backend::StridedBackend, allocator=DefaultAllocator())
Base.@constprop :none function tensorcontract!(C::AbstractArray,
A::AbstractArray, pA::Index2Tuple,
conjA::Bool,
B::AbstractArray, pB::Index2Tuple,
conjB::Bool,
pAB::Index2Tuple,
α::Number, β::Number,
backend::StridedBackend,
allocator=DefaultAllocator())
# resolve conj flags and absorb into StridedView constructor to avoid type instabilities later on
if conjA && conjB
stridedtensorcontract!(SV(C), conj(SV(A)), pA, conj(SV(B)), pB, pAB, α, β,
stridedtensorcontract!(wrap_stridedview(C), conj(wrap_stridedview(A)), pA,
conj(wrap_stridedview(B)), pB, pAB, α, β,
backend, allocator)
elseif conjA
stridedtensorcontract!(SV(C), conj(SV(A)), pA, SV(B), pB, pAB, α, β,
stridedtensorcontract!(wrap_stridedview(C), conj(wrap_stridedview(A)), pA,
wrap_stridedview(B), pB, pAB, α, β,
backend, allocator)
elseif conjB
stridedtensorcontract!(SV(C), SV(A), pA, conj(SV(B)), pB, pAB, α, β,
stridedtensorcontract!(wrap_stridedview(C), wrap_stridedview(A), pA,
conj(wrap_stridedview(B)), pB, pAB, α, β,
backend, allocator)
else
stridedtensorcontract!(SV(C), SV(A), pA, SV(B), pB, pAB, α, β,
stridedtensorcontract!(wrap_stridedview(C), wrap_stridedview(A), pA,
wrap_stridedview(B), pB, pAB, α, β,
backend, allocator)
end
return C
Expand Down Expand Up @@ -130,7 +155,7 @@ function stridedtensortrace!(C::StridedView,
newstrides = (strideA.(linearize(p))..., (strideA.(q[1]) .+ strideA.(q[2]))...)
newsize = (size(C)..., tracesize...)

A′ = SV(A.parent, newsize, newstrides, A.offset, A.op)
A′ = StridedView(A.parent, newsize, newstrides, A.offset, A.op)
Strided._mapreducedim!(Scaler(α), Adder(), Scaler(β), newsize, (C, A′))
return C
end
Expand Down
Loading
Loading