The TensorOperations.jl package is centered around the following features:
A macro @tensor for conveniently specifying tensor contractions and index permutations via Einstein's index notation convention. The index notation is analyzed at compile time and lowered into primitive tensor operations, namely (permuted) linear combinations and inner and outer contractions. The macro supports several keyword arguments to customize the lowering process, namely to insert additional checks that help with debugging, to specify contraction order or to automatically determine optimal contraction order for given costs (see next bullet), and finally, to select different backends to evaluate those primitive operations.
The ability to optimize pairwise contraction order in complicated tensor contraction networks according to the algorithm in this paper, where custom (compile time) costs can be specified, either as a keyword to @tensor or using the @tensoropt macro (for expliciteness and backward compatibility). This optimization is performed at compile time, and the resulting contraction order is hard coded into the resulting expression. The similar macro @tensoropt_verbose provides more information on the optimization process.
A function ncon (for network contractor) for contracting a group of tensors (a.k.a. a tensor network), as well as a corresponding @ncon macro that simplifies and optimizes this slightly. Unlike the previous macros, ncon and @ncon do not analyze the contractions at compile time, thus allowing them to deal with dynamic networks or index specifications.
(Experimental) support for automatic differentiation by supplying chain rules for the different tensor operations using the ChainRules.jl interface.
The ability to support different tensor types by overloading a minimal interface of tensor operations, or to support different implementation backends for the same tensor type.
An efficient default implementation for Julia Base arrays that qualify as strided, i.e. such that its entries are layed out according to a regular pattern in memory. The only exceptions are ReinterpretedArray objects. Additionally, Diagonal objects whose underlying diagonal data is stored as a strided vector are supported. This facilitates tensor contractions where one of the operands is e.g. a diagonal matrix of singular values or eigenvalues, which are returned as a Vector by Julia's eigen or svd method. This implementation for AbstractArray objects is based on Strided.jl for efficient (cache-friendly and multithreaded) tensor permutations (transpositions) and gemm from BLAS for contractions. There is also a fallback contraction strategy that is natively built using Strided.jl, e.g. for scalar types which are not supported by BLAS. Additional backends (e.g. pure Julia Base using loops and/or broadcasting) may be added in the future.
Support for CuArray objects if used together with CUDA.jl and cuTENSOR.jl, by relying on (and thus providing a high level interface into) NVidia's cuTENSOR library.
TensorOperations.jl supports 3 basic tensor operations, i.e. primitives in which every more complicated tensor expression is deconstructed.
addition: Add a (possibly scaled version of) one tensor to another tensor, where the indices of both arrays might appear in different orders. This operation combines normal tensor addition (or linear combination more generally) and index permutation. It includes as a special case copying one tensor into another with permuted indices.
trace or inner contraction: Perform a trace/contraction over pairs of indices of a single tensor array, where the result is a lower-dimensional array.
(outer) contraction: Perform a general contraction of two tensors, where some indices of one array are paired with corresponding indices in a second array.
Add more backends, e.g. using pure Julia Base functionality, or using LoopVectorization.jl
Make it easier to modify the contraction order algorithm or its cost function (e.g. to optimize based on memory footprint) or to splice in runtime information.
Settings
This document was generated with Documenter.jl version 0.27.25 on Thursday 4 January 2024. Using Julia version 1.10.0.
The TensorOperations.jl package is centered around the following features:
A macro @tensor for conveniently specifying tensor contractions and index permutations via Einstein's index notation convention. The index notation is analyzed at compile time and lowered into primitive tensor operations, namely (permuted) linear combinations and inner and outer contractions. The macro supports several keyword arguments to customize the lowering process, namely to insert additional checks that help with debugging, to specify contraction order or to automatically determine optimal contraction order for given costs (see next bullet), and finally, to select different backends to evaluate those primitive operations.
The ability to optimize pairwise contraction order in complicated tensor contraction networks according to the algorithm in this paper, where custom (compile time) costs can be specified, either as a keyword to @tensor or using the @tensoropt macro (for expliciteness and backward compatibility). This optimization is performed at compile time, and the resulting contraction order is hard coded into the resulting expression. The similar macro @tensoropt_verbose provides more information on the optimization process.
A function ncon (for network contractor) for contracting a group of tensors (a.k.a. a tensor network), as well as a corresponding @ncon macro that simplifies and optimizes this slightly. Unlike the previous macros, ncon and @ncon do not analyze the contractions at compile time, thus allowing them to deal with dynamic networks or index specifications.
(Experimental) support for automatic differentiation by supplying chain rules for the different tensor operations using the ChainRules.jl interface.
The ability to support different tensor types by overloading a minimal interface of tensor operations, or to support different implementation backends for the same tensor type.
An efficient default implementation for Julia Base arrays that qualify as strided, i.e. such that its entries are layed out according to a regular pattern in memory. The only exceptions are ReinterpretedArray objects. Additionally, Diagonal objects whose underlying diagonal data is stored as a strided vector are supported. This facilitates tensor contractions where one of the operands is e.g. a diagonal matrix of singular values or eigenvalues, which are returned as a Vector by Julia's eigen or svd method. This implementation for AbstractArray objects is based on Strided.jl for efficient (cache-friendly and multithreaded) tensor permutations (transpositions) and gemm from BLAS for contractions. There is also a fallback contraction strategy that is natively built using Strided.jl, e.g. for scalar types which are not supported by BLAS. Additional backends (e.g. pure Julia Base using loops and/or broadcasting) may be added in the future.
Support for CuArray objects if used together with CUDA.jl and cuTENSOR.jl, by relying on (and thus providing a high level interface into) NVidia's cuTENSOR library.
TensorOperations.jl supports 3 basic tensor operations, i.e. primitives in which every more complicated tensor expression is deconstructed.
addition: Add a (possibly scaled version of) one tensor to another tensor, where the indices of both arrays might appear in different orders. This operation combines normal tensor addition (or linear combination more generally) and index permutation. It includes as a special case copying one tensor into another with permuted indices.
trace or inner contraction: Perform a trace/contraction over pairs of indices of a single tensor array, where the result is a lower-dimensional array.
(outer) contraction: Perform a general contraction of two tensors, where some indices of one array are paired with corresponding indices in a second array.
Add more backends, e.g. using pure Julia Base functionality, or using LoopVectorization.jl
Make it easier to modify the contraction order algorithm or its cost function (e.g. to optimize based on memory footprint) or to splice in runtime information.
Settings
This document was generated with Documenter.jl version 0.27.25 on Tuesday 9 January 2024. Using Julia version 1.10.0.
TensorOperations offers experimental support for reverse-mode automatic diffentiation (AD) through the use of ChainRules.jl. As the basic operations are multi-linear, the vector-Jacobian products thereof can all be expressed in terms of the operations defined in VectorInterface and TensorOperations. Thus, any custom type whose tangent type also support these interfaces will automatically inherit reverse-mode AD support.
As the @tensor macro rewrites everything in terms of the basic tensor operations, the reverse-mode rules for these methods are supplied. However, because most AD-engines do not support in-place mutation, effectively these operations will be replaced with a non-mutating version. This is similar to the behaviour found in BangBang.jl, as the operations will be in-place, except for the pieces of code that are being differentiated. In effect, this amounts to replacing all assignments (=) with definitions (:=) within the context of @tensor.
Experimental
While some rudimentary tests are run, the AD support is currently not incredibly well-tested. Because of the way it is implemented, the use of AD will tacitly replace mutating operations with a non-mutating variant. This might lead to unwanted bugs that are hard to track down. Additionally, for mixed scalar types their also might be unexpected or unwanted behaviour.
Settings
This document was generated with Documenter.jl version 0.27.25 on Thursday 4 January 2024. Using Julia version 1.10.0.
TensorOperations offers experimental support for reverse-mode automatic diffentiation (AD) through the use of ChainRules.jl. As the basic operations are multi-linear, the vector-Jacobian products thereof can all be expressed in terms of the operations defined in VectorInterface and TensorOperations. Thus, any custom type whose tangent type also support these interfaces will automatically inherit reverse-mode AD support.
As the @tensor macro rewrites everything in terms of the basic tensor operations, the reverse-mode rules for these methods are supplied. However, because most AD-engines do not support in-place mutation, effectively these operations will be replaced with a non-mutating version. This is similar to the behaviour found in BangBang.jl, as the operations will be in-place, except for the pieces of code that are being differentiated. In effect, this amounts to replacing all assignments (=) with definitions (:=) within the context of @tensor.
Experimental
While some rudimentary tests are run, the AD support is currently not incredibly well-tested. Because of the way it is implemented, the use of AD will tacitly replace mutating operations with a non-mutating variant. This might lead to unwanted bugs that are hard to track down. Additionally, for mixed scalar types their also might be unexpected or unwanted behaviour.
Settings
This document was generated with Documenter.jl version 0.27.25 on Tuesday 9 January 2024. Using Julia version 1.10.0.
The elementary tensor operations can also be accessed via functions, mainly for compatibility with older versions of this toolbox. The function-based syntax is also required when the contraction pattern is not known at compile time but is rather determined dynamically.
The basic exposed interface, as listed below, makes use of any iterable IA, IB or IC to denote labels of indices, in a similar fashion as when used in the context of @tensor. When making use of this functionality, in-place operations are no longer supported, as these are reserved for the expert mode. Note that the return type is only inferred when the labels are entered as tuples, and also IC is specified.
The expert mode exposes both mutating and non-mutating versions of these functions. In this case, selected indices are determined through permutations, specified by pA, pB and pC. In order to distinguish from the non-mutating version in simple mode, overlapping functionality is distinguished by specializing on these permutations, which are required to take a particular form of the type Index2Tuple.
The motivation for this particular convention for specifying permutations comes from the fact that for many operations, it is useful to think of a tensor as a linear map or matrix, in which its different indices are partioned into two groups, the first of which correspond to the range of the linear map (the row index of the associated matrix), whereas the second group corresponds to the domain of the linear map (the column index of the associated matrix). This is most obvious for tensor contraction, which then becomes equivalent to matrix multiplication (which is also how it is implemented by the StridedBLAS backend). While less relevant for tensor permutations, we use this convention throughout for uniformity and generality (e.g. for compatibility with libraries that always represent tensors as linear maps, such as TensorKit.jl).
Note, finally, that only the expert mode call style exposes the ability to select custom backends.
tensorcopy([IC=IA], A, IA, [conjA=:N, [α=1]])
-tensorcopy(pC::Index2Tuple, A, conjA, α) # expert mode
Create a copy of A, where the dimensions of A are assigned indices from the iterable IA and the indices of the copy are contained in IC. Both iterables should contain the same elements, optionally in a different order.
The result of this method is equivalent to α * permutedims(A, pC) where pC is the permutation such that IC = IA[pC]. The implementation of tensorcopy is however more efficient on average, especially if Threads.nthreads() > 1.
Optionally, the symbol conjA can be used to specify whether the input tensor should be conjugated (:C) or not (:N).
Return the result of adding arrays A and B where the iterables IA and IB denote how the array data should be permuted in order to be added. More specifically, the result of this method is equivalent to α * permutedims(A, pA) + β * permutedims(B, pB) where pA (pB) is the permutation such that IC = IA[pA] (IB[pB]). The implementation of tensoradd is however more efficient on average, as the temporary permuted arrays are not created.
Optionally, the symbols conjA and conjB can be used to specify whether the input tensors should be conjugated (:C) or not (:N).
tensortrace([IC], A, IA, [conjA], [α=1])
-tensortrace(pC::Index2Tuple, A, pA::Index2Tuple, conjA, α=1, [backend]) # expert mode
Trace or contract pairs of indices of tensor A, by assigning them identical indices in the iterable IA. The untraced indices, which are assigned a unique index, can be reordered according to the optional argument IC. The default value corresponds to the order in which they appear. Note that only pairs of indices can be contracted, so that every index in IA can appear only once (for an untraced index) or twice (for an index in a contracted pair).
Optionally, the symbol conjA can be used to specify that the input tensor should be conjugated.
tensorcontract([IC], A, IA, [conjA], B, IB, [conjB], [α=1])
-tensorcontract(pC::Index2Tuple, A, pA::Index2Tuple, conjA, B, pB::Index2Tuple, conjB, α=1, [backend]) # expert mode
Contract indices of tensor A with corresponding indices in tensor B by assigning them identical labels in the iterables IA and IB. The indices of the resulting tensor correspond to the indices that only appear in either IA or IB and can be ordered by specifying the optional argument IC. The default is to have all open indices of A followed by all open indices of B. Note that inner contractions of an array should be handled first with tensortrace, so that every label can appear only once in IA or IB seperately, and once (for an open index) or twice (for a contracted index) in the union of IA and IB.
Optionally, the symbols conjA and conjB can be used to specify that the input tensors should be conjugated.
tensorproduct([IC], A, IA, [conjA], B, IB, [conjB], [α=1])
-tensorproduct(pC::Index2Tuple, A, pA::Index2Tuple, conjA, B, pB::Index2Tuple, conjB, α=1, [backend]) # expert mode
Compute the tensor product (outer product) of two tensors A and B, i.e. returns a new tensor C with ndims(C) = ndims(A) + ndims(B). The indices of the output tensor are related to those of the input tensors by the pattern specified by the indices. Essentially, this is a special case of tensorcontract with no indices being contracted over. This method checks whether the indices indeed specify a tensor product instead of a genuine contraction.
Optionally, the symbols conjA and conjB can be used to specify that the input tensors should be conjugated.
Compute C = β * C + α * permutedims(opA(A), pC) without creating the intermediate temporary. The operation opA acts as identity if conjA equals :N and as conj if conjA equals :C. Optionally specify a backend implementation to use.
Warning
The permutation needs to be trivial or C must not be aliased with A.
tensortrace!(C, pC, A, pA, conjA, α=1, β=0 [, backend])
Compute C = β * C + α * permutedims(partialtrace(opA(A)), pC) without creating the intermediate temporary, where A is partially traced, such that indices in pA[1] are contracted with indices in pA[2], and the remaining indices are permuted according to pC. The operation opA acts as identity if conjA equals :N and as conj if conjA equals :C. Optionally specify a backend implementation to use.
Compute C = β * C + α * permutedims(contract(opA(A), opB(B)), pC) without creating the intermediate temporary, where A and B are contracted such that the indices pA[2] of A are contracted with indices pB[1] of B. The remaining indices (pA[1]..., pB[2]...) are then permuted according to pC. The operation opA acts as identity if conjA equals :N and as conj if conjA equals :C; the operation opB is determined by conjB analogously. Optionally specify a backend implementation to use.
tensorproduct!(C, pC::Index2Tuple, A, pA::Index2Tuple, conjA, B, pB::Index2Tuple, conjB, α=1, β=0)
Compute the tensor product (outer product) of two tensors A and B, i.e. a wrapper of tensorcontract! with no indices being contracted over. This method checks whether the indices indeed specify a tensor product instead of a genuine contraction.
The elementary tensor operations can also be accessed via functions, mainly for compatibility with older versions of this toolbox. The function-based syntax is also required when the contraction pattern is not known at compile time but is rather determined dynamically.
The basic exposed interface, as listed below, makes use of any iterable IA, IB or IC to denote labels of indices, in a similar fashion as when used in the context of @tensor. When making use of this functionality, in-place operations are no longer supported, as these are reserved for the expert mode. Note that the return type is only inferred when the labels are entered as tuples, and also IC is specified.
The expert mode exposes both mutating and non-mutating versions of these functions. In this case, selected indices are determined through permutations, specified by pA, pB and pC. In order to distinguish from the non-mutating version in simple mode, overlapping functionality is distinguished by specializing on these permutations, which are required to take a particular form of the type Index2Tuple.
The motivation for this particular convention for specifying permutations comes from the fact that for many operations, it is useful to think of a tensor as a linear map or matrix, in which its different indices are partioned into two groups, the first of which correspond to the range of the linear map (the row index of the associated matrix), whereas the second group corresponds to the domain of the linear map (the column index of the associated matrix). This is most obvious for tensor contraction, which then becomes equivalent to matrix multiplication (which is also how it is implemented by the StridedBLAS backend). While less relevant for tensor permutations, we use this convention throughout for uniformity and generality (e.g. for compatibility with libraries that always represent tensors as linear maps, such as TensorKit.jl).
Note, finally, that only the expert mode call style exposes the ability to select custom backends.
tensorcopy([IC=IA], A, IA, [conjA=:N, [α=1]])
+tensorcopy(pC::Index2Tuple, A, conjA, α) # expert mode
Create a copy of A, where the dimensions of A are assigned indices from the iterable IA and the indices of the copy are contained in IC. Both iterables should contain the same elements, optionally in a different order.
The result of this method is equivalent to α * permutedims(A, pC) where pC is the permutation such that IC = IA[pC]. The implementation of tensorcopy is however more efficient on average, especially if Threads.nthreads() > 1.
Optionally, the symbol conjA can be used to specify whether the input tensor should be conjugated (:C) or not (:N).
Return the result of adding arrays A and B where the iterables IA and IB denote how the array data should be permuted in order to be added. More specifically, the result of this method is equivalent to α * permutedims(A, pA) + β * permutedims(B, pB) where pA (pB) is the permutation such that IC = IA[pA] (IB[pB]). The implementation of tensoradd is however more efficient on average, as the temporary permuted arrays are not created.
Optionally, the symbols conjA and conjB can be used to specify whether the input tensors should be conjugated (:C) or not (:N).
tensortrace([IC], A, IA, [conjA], [α=1])
+tensortrace(pC::Index2Tuple, A, pA::Index2Tuple, conjA, α=1, [backend]) # expert mode
Trace or contract pairs of indices of tensor A, by assigning them identical indices in the iterable IA. The untraced indices, which are assigned a unique index, can be reordered according to the optional argument IC. The default value corresponds to the order in which they appear. Note that only pairs of indices can be contracted, so that every index in IA can appear only once (for an untraced index) or twice (for an index in a contracted pair).
Optionally, the symbol conjA can be used to specify that the input tensor should be conjugated.
tensorcontract([IC], A, IA, [conjA], B, IB, [conjB], [α=1])
+tensorcontract(pC::Index2Tuple, A, pA::Index2Tuple, conjA, B, pB::Index2Tuple, conjB, α=1, [backend]) # expert mode
Contract indices of tensor A with corresponding indices in tensor B by assigning them identical labels in the iterables IA and IB. The indices of the resulting tensor correspond to the indices that only appear in either IA or IB and can be ordered by specifying the optional argument IC. The default is to have all open indices of A followed by all open indices of B. Note that inner contractions of an array should be handled first with tensortrace, so that every label can appear only once in IA or IB seperately, and once (for an open index) or twice (for a contracted index) in the union of IA and IB.
Optionally, the symbols conjA and conjB can be used to specify that the input tensors should be conjugated.
tensorproduct([IC], A, IA, [conjA], B, IB, [conjB], [α=1])
+tensorproduct(pC::Index2Tuple, A, pA::Index2Tuple, conjA, B, pB::Index2Tuple, conjB, α=1, [backend]) # expert mode
Compute the tensor product (outer product) of two tensors A and B, i.e. returns a new tensor C with ndims(C) = ndims(A) + ndims(B). The indices of the output tensor are related to those of the input tensors by the pattern specified by the indices. Essentially, this is a special case of tensorcontract with no indices being contracted over. This method checks whether the indices indeed specify a tensor product instead of a genuine contraction.
Optionally, the symbols conjA and conjB can be used to specify that the input tensors should be conjugated.
Compute C = β * C + α * permutedims(opA(A), pC) without creating the intermediate temporary. The operation opA acts as identity if conjA equals :N and as conj if conjA equals :C. Optionally specify a backend implementation to use.
Warning
The permutation needs to be trivial or C must not be aliased with A.
tensortrace!(C, pC, A, pA, conjA, α=1, β=0 [, backend])
Compute C = β * C + α * permutedims(partialtrace(opA(A)), pC) without creating the intermediate temporary, where A is partially traced, such that indices in pA[1] are contracted with indices in pA[2], and the remaining indices are permuted according to pC. The operation opA acts as identity if conjA equals :N and as conj if conjA equals :C. Optionally specify a backend implementation to use.
Compute C = β * C + α * permutedims(contract(opA(A), opB(B)), pC) without creating the intermediate temporary, where A and B are contracted such that the indices pA[2] of A are contracted with indices pB[1] of B. The remaining indices (pA[1]..., pB[2]...) are then permuted according to pC. The operation opA acts as identity if conjA equals :N and as conj if conjA equals :C; the operation opB is determined by conjB analogously. Optionally specify a backend implementation to use.
tensorproduct!(C, pC::Index2Tuple, A, pA::Index2Tuple, conjA, B, pB::Index2Tuple, conjB, α=1, β=0)
Compute the tensor product (outer product) of two tensors A and B, i.e. a wrapper of tensorcontract! with no indices being contracted over. This method checks whether the indices indeed specify a tensor product instead of a genuine contraction.
The @tensor macro and its relatives work as parsers for indexed tensor expressions. They transform these into a sequence of calls to the primitive tensor operations. This allows the support of custom types that implement the Interface. The actual implementation is achieved through the use of TensorParser, which provides the general framework to parse tensor expressions. The @tensor macro is then just a wrapper around this, which configures the default behavior and handles keyword arguments of the parser.
The TensorParser works by breaking down the parsing into three main phases. First, a basic check of the supplied expression is performed, to ensure that it is a valid tensor expression. Then, a number of preprocessing steps can be performed, which are used to standardize expressions, allow for syntactic sugar features, and can also be used as a hook for writing custom parsers. Then, the different contractions within the tensor expression are analyzed and processed, which rewrites the expression into a set of binary rooted trees. Then, the main step can be executed, namely transforming the whole expression into actual calls to the primitive tensor operations tensoradd!, tensortrace! and tensorcontract!, as well as calls to tensoralloc_add and tensoralloc_contract to allocate the temporary and final tensors. For those, also the resulting scalar type needs to be determined. Finally, a number of postprocessing steps can be added, which are mostly used to clean up the resulting expression by flattening and by removing line number nodes, but also to incorporate the custom backend and allocation system.
Check that ex is a valid tensor expression and throw an ArgumentError if not. Valid tensor expressions satisfy one of the following (recursive) rules):
The expression is a scalar expression or a tensor expression.
The expression is an assignment or a definition, and the left hand side and right hand side are valid tensor expressions or scalars.
The expression is a block, and all subexpressions are valid tensor expressions or scalars.
The @tensor macro and its relatives work as parsers for indexed tensor expressions. They transform these into a sequence of calls to the primitive tensor operations. This allows the support of custom types that implement the Interface. The actual implementation is achieved through the use of TensorParser, which provides the general framework to parse tensor expressions. The @tensor macro is then just a wrapper around this, which configures the default behavior and handles keyword arguments of the parser.
The TensorParser works by breaking down the parsing into three main phases. First, a basic check of the supplied expression is performed, to ensure that it is a valid tensor expression. Then, a number of preprocessing steps can be performed, which are used to standardize expressions, allow for syntactic sugar features, and can also be used as a hook for writing custom parsers. Then, the different contractions within the tensor expression are analyzed and processed, which rewrites the expression into a set of binary rooted trees. Then, the main step can be executed, namely transforming the whole expression into actual calls to the primitive tensor operations tensoradd!, tensortrace! and tensorcontract!, as well as calls to tensoralloc_add and tensoralloc_contract to allocate the temporary and final tensors. For those, also the resulting scalar type needs to be determined. Finally, a number of postprocessing steps can be added, which are mostly used to clean up the resulting expression by flattening and by removing line number nodes, but also to incorporate the custom backend and allocation system.
Check that ex is a valid tensor expression and throw an ArgumentError if not. Valid tensor expressions satisfy one of the following (recursive) rules):
The expression is a scalar expression or a tensor expression.
The expression is an assignment or a definition, and the left hand side and right hand side are valid tensor expressions or scalars.
The expression is a block, and all subexpressions are valid tensor expressions or scalars.
Extract all tensor objects which are not simple symbols with newly generated symbols, and assign them before the expression and after the expression as necessary, in order to avoid multiple evaluations of the expression constituting the tensor object.
Process the contractions in ex using the given treebuilder and treesorter functions. This is done by first extracting a network representation from the expression, then building and sorting the contraction trees with a given treebuilder and treesorter function, and finally inserting the contraction trees back into the expression. When the costcheck argument equals :warn or :cache (as opposed to :nothing), the optimal contraction order is computed at runtime using the actual values of tensorcost and this optimal order is compared to the contraction order that was determined at compile time. If the compile time order deviated from the optimal order, a warning will be printed (in case of costcheck == :warn) or this particular contraction will be recorded in TensorOperations.costcache (in case of costcheck == :cache). Both the warning or the recorded cache entry contain a order suggestion that can be passed to the @tensor macro in order to encode the optimal contraction order at compile time..
Main parsing step to transform a tensor expression ex into a series of function calls associated with the primitive building blocks (tensor operations and allocations).
Insert a backend into a tensor operation, e.g. for any op ∈ operations, transform TensorOperations.op(args...) -> TensorOperations.op(args..., Backend{:backend}())
The macro @tensoropt or the combination of @tensor with the keyword opt can be used to optimize the contraction order of the expression at compile time. This is done by analyzing the contraction graph, where the nodes are the tensors and the edges are the contractions, in combination with the data provided in optdata, which is a dictionary associating a cost (either a number or a polynomial in some abstract scaling parameter) to every index. This information is then used to determine the (asymptotically) optimal contraction tree (in terms of number of floating point operations). The algorithm that is used is described in arXiv:1304.6112.
Settings
This document was generated with Documenter.jl version 0.27.25 on Thursday 4 January 2024. Using Julia version 1.10.0.
Extract all tensor objects which are not simple symbols with newly generated symbols, and assign them before the expression and after the expression as necessary, in order to avoid multiple evaluations of the expression constituting the tensor object.
Process the contractions in ex using the given treebuilder and treesorter functions. This is done by first extracting a network representation from the expression, then building and sorting the contraction trees with a given treebuilder and treesorter function, and finally inserting the contraction trees back into the expression. When the costcheck argument equals :warn or :cache (as opposed to :nothing), the optimal contraction order is computed at runtime using the actual values of tensorcost and this optimal order is compared to the contraction order that was determined at compile time. If the compile time order deviated from the optimal order, a warning will be printed (in case of costcheck == :warn) or this particular contraction will be recorded in TensorOperations.costcache (in case of costcheck == :cache). Both the warning or the recorded cache entry contain a order suggestion that can be passed to the @tensor macro in order to encode the optimal contraction order at compile time..
Main parsing step to transform a tensor expression ex into a series of function calls associated with the primitive building blocks (tensor operations and allocations).
Insert a backend into a tensor operation, e.g. for any op ∈ operations, transform TensorOperations.op(args...) -> TensorOperations.op(args..., Backend{:backend}())
The macro @tensoropt or the combination of @tensor with the keyword opt can be used to optimize the contraction order of the expression at compile time. This is done by analyzing the contraction graph, where the nodes are the tensors and the edges are the contractions, in combination with the data provided in optdata, which is a dictionary associating a cost (either a number or a polynomial in some abstract scaling parameter) to every index. This information is then used to determine the (asymptotically) optimal contraction tree (in terms of number of floating point operations). The algorithm that is used is described in arXiv:1304.6112.
Settings
This document was generated with Documenter.jl version 0.27.25 on Tuesday 9 January 2024. Using Julia version 1.10.0.
diff --git a/dev/man/indexnotation/index.html b/dev/man/indexnotation/index.html
index 27e4e372..74203a2e 100644
--- a/dev/man/indexnotation/index.html
+++ b/dev/man/indexnotation/index.html
@@ -1,6 +1,6 @@
Index notation with macros · TensorOperations.jl
Specify one or more tensor operations using Einstein's index notation. Indices can be chosen to be arbitrary Julia variable names, or integers. When contracting several tensors together, this will be evaluated as pairwise contractions in left to right order, unless the so-called NCON style is used (positive integers for contracted indices and negative indices for open indices).
Additional keyword arguments may be passed to control the behavior of the parser:
order: A list of contraction indices of the form order=(...,) which specify the order in which they will be contracted.
opt: Contraction order optimization, similar to @tensoropt. Can be either a boolean or an OptExpr.
contractcheck: Boolean flag to enable runtime check for contractibility of indices with clearer error messages.
costcheck: Can be either :warn or :cache and adds runtime checks to compare the compile-time contraction order to the optimal order computed for the actual run time tensor costs. If costcheck == :warn, warnings are printed for every sub-optimal contraction that is encountered. If costcheck == :cache, only the most costly run of a particular sub-optimal contraction will be cached in TensorOperations.costcache. In both cases, a suggestion for the order keyword argument is computed to switch to the optimal contraction order.
backend: Inserts an implementation backend as a final argument in the different tensor operation calls in the generated code.
allocator: Inserts an allocation strategy as a final argument in the tensor allocation calls in the generated code.
The prefered way to specify (a sequence of) tensor operations is by using the @tensor macro, which accepts an index notation format, a.k.a. Einstein notation (and in particular, Einstein's summation convention).
This can most easily be explained using a simple example:
using TensorOperations
+@tensor [kw_expr...] tensor_expr
Specify one or more tensor operations using Einstein's index notation. Indices can be chosen to be arbitrary Julia variable names, or integers. When contracting several tensors together, this will be evaluated as pairwise contractions in left to right order, unless the so-called NCON style is used (positive integers for contracted indices and negative indices for open indices).
Additional keyword arguments may be passed to control the behavior of the parser:
order: A list of contraction indices of the form order=(...,) which specify the order in which they will be contracted.
opt: Contraction order optimization, similar to @tensoropt. Can be either a boolean or an OptExpr.
contractcheck: Boolean flag to enable runtime check for contractibility of indices with clearer error messages.
costcheck: Can be either :warn or :cache and adds runtime checks to compare the compile-time contraction order to the optimal order computed for the actual run time tensor costs. If costcheck == :warn, warnings are printed for every sub-optimal contraction that is encountered. If costcheck == :cache, only the most costly run of a particular sub-optimal contraction will be cached in TensorOperations.costcache. In both cases, a suggestion for the order keyword argument is computed to switch to the optimal contraction order.
backend: Inserts an implementation backend as a final argument in the different tensor operation calls in the generated code.
allocator: Inserts an allocation strategy as a final argument in the tensor allocation calls in the generated code.
The prefered way to specify (a sequence of) tensor operations is by using the @tensor macro, which accepts an index notation format, a.k.a. Einstein notation (and in particular, Einstein's summation convention).
This can most easily be explained using a simple example:
using TensorOperations
α = randn()
A = randn(5, 5, 5, 5, 5, 5)
B = randn(5, 5, 5)
@@ -20,7 +20,7 @@
TensorOperations.tensorfree!(var"##E_A#242")
E
E = TensorOperations.tensoradd!(E, ((2, 3, 1), ()), C, :N, α, VectorInterface.One())
-end
The different functions in which this tensor expression is decomposed are discussed in more detail in the Implementation section of this manual.
In this example, the tensor indices were labeled with arbitrary letters; also longer names could have been used. In fact, any proper Julia variable name constitutes a valid label. Note though that these labels are never interpreted as existing Julia variables. Within the @tensor macro they are converted into symbols and then used as dummy names, whose only role is to distinguish the different indices. Their specific value bears no meaning. They also do not appear in the generated code as illustrated above. This implies, in particular, that the specific tensor operations defined by the code inside the @tensor environment are completely specified at compile time. Various remarks regarding the index notation are in order.
TensorOperations.jl only supports strict Einstein summation convention. This implies that there are two types of indices. Either an index label appears once in every term of the right hand side, and it also appears on the left hand side. We refer to the corresponding indices as open or free. Alternatively, an index label appears exactly twice within a given term on the right hand side. The corresponding indices are referred to as closed or contracted, i.e. the pair of indices takes equal values and are summed over their (equal) range. This is known as a contraction, either an outer contraction (between two indices of two different tensors) or an inner contraction (a.k.a. trace, between two indices of a single tensor). More liberal use of the index notation, such as simultaneous summutation over three or more indices, or a open index appearing simultaneously in different tensor factors, are not supported by TensorOperations.jl.
Aside from valid Julia identifiers, index labels can also be specified using literal integer constants or using a combination of integers and symbols. Furthermore, it is also allowed to use primes (i.e. Julia's adjoint operator) to denote different indices, including using multiple subsequent primes. The following expression thus computes the same result as the example above:
If only integers are used for specifying index labels, this can be used to control the pairwise contraction order, by using the well-known NCON convention, where open indices in the left hand side are labelled by negative integers -1, -2, -3, whereas contraction indices are labelled with positive integers 1, 2, … Since the index order of the left hand side is in that case clear from the right hand side expression, the left hand side can be indexed with [:], which is automatically replaced with all negative integers appearing in the right hand side, in decreasing order. The value of the labels for the contraction indices determines the pairwise contraction order. If multiple tensors need to be contracted, a first temporary will be created consisting of the contraction of the pair of tensors that share contraction index 1, then the pair of tensors that share contraction index 2 (if not contracted away in the first pair) will be contracted, and so forth. The next subsection explains contraction order in more detail and gives some useful examples, as the example above only includes a single pair of tensors to be contracted.
Index labels always appear in square brackets [ ... ] but can be separated by either commas, as in D[a, b, c], (yielding a :ref expression) or by spaces, as in D[a b c], (yielding a :typed_hcat expression).
There is also the option to separate the indices into two groups using a semicolon. This can be useful for tensor types which have two distinct set of indices, but has no effect when using Julia AbstractArray objects. While in principle both spaces and commas can be used within the two groups, e.g. as in D[a, b; c] or D[a b; c], there are some restrictions because of accepted Julia syntax. Both groups of indices should use the same convention. If there is only a single index in the first group, the second group should use spaces to constitute a valid expression. Finally, having no indices in the first group is only possible by writing an empty tuple. The second group can then use spaces, or also contain the indices as a tuple, i.e. both D[(); a b c] or D[(); (a, b, c)]. Writing the two groups of indices within a tuple (which uses a comma as natural separator), with both tuples seperated by a semicolon is always valid syntax, irrespective of the number of indices in that group.
Index expressions [...] are only interpreted as index notation on the highest level. For example, if you want to mulitply two matrices which are stored in a list, you can write
Note, finally, that the @tensor specifier can be put in front of a single tensor expression, or in front of a begin ... end block to group and evaluate different expressions at once. Within an @tensor begin ... end block, the @notensor macro can be used to annotate indexing expressions that need to be interpreted literally.
As in illustration, note that the previous code examples about the matrix multiplication with matrices stored in a 3-way array can now also be written as
@tensor begin
+end
The different functions in which this tensor expression is decomposed are discussed in more detail in the Implementation section of this manual.
In this example, the tensor indices were labeled with arbitrary letters; also longer names could have been used. In fact, any proper Julia variable name constitutes a valid label. Note though that these labels are never interpreted as existing Julia variables. Within the @tensor macro they are converted into symbols and then used as dummy names, whose only role is to distinguish the different indices. Their specific value bears no meaning. They also do not appear in the generated code as illustrated above. This implies, in particular, that the specific tensor operations defined by the code inside the @tensor environment are completely specified at compile time. Various remarks regarding the index notation are in order.
TensorOperations.jl only supports strict Einstein summation convention. This implies that there are two types of indices. Either an index label appears once in every term of the right hand side, and it also appears on the left hand side. We refer to the corresponding indices as open or free. Alternatively, an index label appears exactly twice within a given term on the right hand side. The corresponding indices are referred to as closed or contracted, i.e. the pair of indices takes equal values and are summed over their (equal) range. This is known as a contraction, either an outer contraction (between two indices of two different tensors) or an inner contraction (a.k.a. trace, between two indices of a single tensor). More liberal use of the index notation, such as simultaneous summutation over three or more indices, or a open index appearing simultaneously in different tensor factors, are not supported by TensorOperations.jl.
Aside from valid Julia identifiers, index labels can also be specified using literal integer constants or using a combination of integers and symbols. Furthermore, it is also allowed to use primes (i.e. Julia's adjoint operator) to denote different indices, including using multiple subsequent primes. The following expression thus computes the same result as the example above:
If only integers are used for specifying index labels, this can be used to control the pairwise contraction order, by using the well-known NCON convention, where open indices in the left hand side are labelled by negative integers -1, -2, -3, whereas contraction indices are labelled with positive integers 1, 2, … Since the index order of the left hand side is in that case clear from the right hand side expression, the left hand side can be indexed with [:], which is automatically replaced with all negative integers appearing in the right hand side, in decreasing order. The value of the labels for the contraction indices determines the pairwise contraction order. If multiple tensors need to be contracted, a first temporary will be created consisting of the contraction of the pair of tensors that share contraction index 1, then the pair of tensors that share contraction index 2 (if not contracted away in the first pair) will be contracted, and so forth. The next subsection explains contraction order in more detail and gives some useful examples, as the example above only includes a single pair of tensors to be contracted.
Index labels always appear in square brackets [ ... ] but can be separated by either commas, as in D[a, b, c], (yielding a :ref expression) or by spaces, as in D[a b c], (yielding a :typed_hcat expression).
There is also the option to separate the indices into two groups using a semicolon. This can be useful for tensor types which have two distinct set of indices, but has no effect when using Julia AbstractArray objects. While in principle both spaces and commas can be used within the two groups, e.g. as in D[a, b; c] or D[a b; c], there are some restrictions because of accepted Julia syntax. Both groups of indices should use the same convention. If there is only a single index in the first group, the second group should use spaces to constitute a valid expression. Finally, having no indices in the first group is only possible by writing an empty tuple. The second group can then use spaces, or also contain the indices as a tuple, i.e. both D[(); a b c] or D[(); (a, b, c)]. Writing the two groups of indices within a tuple (which uses a comma as natural separator), with both tuples seperated by a semicolon is always valid syntax, irrespective of the number of indices in that group.
Index expressions [...] are only interpreted as index notation on the highest level. For example, if you want to mulitply two matrices which are stored in a list, you can write
Note, finally, that the @tensor specifier can be put in front of a single tensor expression, or in front of a begin ... end block to group and evaluate different expressions at once. Within an @tensor begin ... end block, the @notensor macro can be used to annotate indexing expressions that need to be interpreted literally.
As in illustration, note that the previous code examples about the matrix multiplication with matrices stored in a 3-way array can now also be written as
@tensor begin
@notensor A = list[:,:,1]
@notensor B = list[:,:,2]
result[i,j] = A[i,k] * B[k,j]
@@ -39,8 +39,8 @@
# cost as specified for listed indices, unlisted indices have cost 1 (any symbol for χ can be used)
@tensoropt (a => χ, b => χ^2, c => 2 * χ, e => 5) begin
C[a, b, c, d] := A[a, e, c, f, h] * B[f, g, e, b] * C[g, d, h]
-end
Note that @tensoropt will optimize any tensor contraction sequence it encounters in the (block of) expressions. It will however not break apart expressions that have been explicitly grouped with parenthesis, i.e. in
@tensoropt C[a, b, c, d] := A[a, e, c, f, h] * (B[f, g, e, b] * C[g, d, h])
it will always contract B and C first. For a single tensor contraction sequence, the optimal contraction order and associated (asymptotic) cost can be obtained using @optimalcontractiontree.
As an final remark, the optimization can also be accessed directly from @tensor by specifying the additional keyword argument opt=true, which will then use the default cost model, or opt=optex to further specify the costs.
# cost χ for all indices (a, b, c, d, e, f)
+end
Note that @tensoropt will optimize any tensor contraction sequence it encounters in the (block of) expressions. It will however not break apart expressions that have been explicitly grouped with parenthesis, i.e. in
@tensoropt C[a, b, c, d] := A[a, e, c, f, h] * (B[f, g, e, b] * C[g, d, h])
it will always contract B and C first. For a single tensor contraction sequence, the optimal contraction order and associated (asymptotic) cost can be obtained using @optimalcontractiontree.
As an final remark, the optimization can also be accessed directly from @tensor by specifying the additional keyword argument opt=true, which will then use the default cost model, or opt=optex to further specify the costs.
# cost χ for all indices (a, b, c, d, e, f)
@tensor opt=true D[a, b, c, d] := A[a, e, c, f] * B[g, d, e] * C[g, f, b]
# cost χ for indices (a, b, c, e), other indices (d, f) have cost 1
@@ -55,4 +55,4 @@
end
Tensor network practicioners are probably more familiar with the network contractor function ncon to perform a tensor network contraction, as e.g. described in NCON. In particular, a graphical application TensorTrace was recently introduced to facilitate the generation of such ncon calls. TensorOperations.jl provides compatibility with this interface by also exposing an ncon function with the same basic syntax
where the lists of tensor objects and of index lists can be given as a vector or a tuple. The ncon function necessarily needs to analyze the contraction pattern at runtime, but this can be an advantage, in cases where the contraction is determined by runtime information and thus not known at compile time. A downside from this, besides the fact that this can result in some overhead (though this is typically negligable for anything but very small tensor contractions), is that ncon is type-unstable, i.e. its return type cannot be inferred by the Julia compiler.
The full call syntax of the ncon method exposed by TensorOperations.jl is
where the first two arguments are those of above. Let us first discuss the keyword arguments. The keyword argument order can be used to change the contraction order, i.e. by specifying which contraction indices need to be processed first, rather than the strictly increasing order [1, 2, ...], as discussed in the previous subsection. The keyword argument output can be used to specify the order of the output indices, when it is different from the default [-1, -2, ...].
The optional positional argument conjlist is a list of Bool variables that indicate whether the corresponding tensor needs to be conjugated in the contraction. So while
the latter has the advantage that conjugating B is not an extra step (which creates an additional temporary requiring allocations), but is performed at the same time when it is contracted.
As an alternative solution to the optional positional arguments, there is also an @ncon macro. It is just a simple wrapper over an ncon call and thus does not analyze the indices at compile time, so that they can be fully dynamical. However, it will transform
ncon(tensorlist, indexlist, [conjlist, sym]; order = ..., output = ...)
Contract the tensors in tensorlist (of type Vector or Tuple) according to the network as specified by indexlist. Here, indexlist is a list (i.e. a Vector or Tuple) with the same length as tensorlist whose entries are themselves lists (preferably Vector{Int}) where every integer entry provides a label for corresponding index/dimension of the corresponding tensor in tensorlist. Positive integers are used to label indices that need to be contracted, and such thus appear in two different entries within indexlist, whereas negative integers are used to label indices of the output tensor, and should appear only once.
Optional arguments in another list with the same length, conjlist, whose entries are of type Bool and indicate whether the corresponding tensor object should be conjugated (true) or not (false). The default is false for all entries.
By default, contractions are performed in the order such that the indices being contracted over are labelled by increasing integers, i.e. first the contraction corresponding to label 1 is performed. The output tensor had an index order corresponding to decreasing (negative, so increasing in absolute value) index labels. The keyword arguments order and output allow to change these defaults.
@ncon(tensorlist, indexlist; order = ..., output = ...)
Contract the tensors in tensorlist (of type Vector or Tuple) according to the network as specified by indexlist. Here, indexlist is a list (i.e. a Vector or Tuple) with the same length as tensorlist whose entries are themselves lists (preferably Vector{Int}) where every integer entry provides a label for corresponding index/dimension of the corresponding tensor in tensorlist. Positive integers are used to label indices that need to be contracted, and such thus appear in two different entries within indexlist, whereas negative integers are used to label indices of the output tensor, and should appear only once.
By default, contractions are performed in the order such that the indices being contracted over are labelled by increasing integers, i.e. first the contraction corresponding to label 1 is performed. The output tensor had an index order corresponding to decreasing (negative, so increasing in absolute value) index labels. The keyword arguments order and output allow to change these defaults.
The advantage of the macro @ncon over the function call ncon is that, if tensorlist is not just some variable but an actual list (as a tuple with parentheses or a vector with square brackets) at the call site, the @ncon macro will scan for conjugation calls, e.g. conj(A), and replace this with just A but build a matching list of conjugation flags to be specified to ncon. This makes it more convenient to specify tensor conjugation, without paying the cost of actively performing the conjugation beforehand.
Indices with the same label, either open indices on the two sides of the equation, or contracted indices, need to be compatible. For AbstractArray objects, this means they must have the same size. Other tensor types might have more complicated structure associated with their indices, and requires matching between those. The function checkcontractible is part of the interface that can be used to control when tensors can be contracted with each other along specific indices.
If indices do not match, the contraction will spawn an error. However, this can be an error deep within the implementation, at which point the error message will provide little information as to which specific tensors and which indices are producing the mismatch. When debugging, it might be useful to add the keyword argument contractcheck = true to the @tensor macro. Explicit checks using checkcontractible are then enabled that are run before any tensor operation is performed. When a mismatch is detected, these checks still have access to the label information and spawn a more informative error message.
A different type of check is the costcheck keyword argument, which can be given the values :warn or :cache. With either of both values for this keyword argument, additional checks are inserted that compare the contraction order of any tensor contraction of three or more factors against the optimal order based on the current tensor size. More generally, the function tensorcost is part of the interface and associated a cost value with every index of a tensor, which is then used in the cost model. With costcheck=:warn, a warning will be spawned for every tensor network where the actual contraction order (even when optimized using abstract costs) does not match with the ideal contraction order given the current tensorcost values. With costcheck = :cache, the tensor networks with non-optimal contraction order are stored in a global package variable TensorOperations.costcache. However, when a tensor network is evaluated several times with different tensor sizes or tensor costs, only the evaluation giving rise to the largest total contraction cost for that network will appear in the cache (provided the actual contraction order deviates from the optimal order in that largest case).
Every index expression will be evaluated as a sequence of elementary tensor operations, i.e. permuted additions, partial traces and contractions, which are implemented for strided arrays as discussed in Package features. In particular, these implementations rely on Strided.jl, and we refer to this package for a full specification of which arrays are supported. As a rule of thumb, this primarily includes Arrays from Julia base, as well as views thereof if sliced with a combination of Integers and Ranges. Special types such as Adjoint and Transpose from Base are also supported. For permuted addition and partial traces, native Julia implementations are used which could benefit from multithreading if JULIA_NUM_THREADS>1.
The binary contraction is performed by first permuting the two input tensors into a form such that the contraction becomes equivalent to one matrix multiplication on the whole data, followed by a final permutation to bring the indices of the output tensor into the desired order. This approach allows to use the highly efficient matrix multiplication kernel (gemm) from BLAS, which is multithreaded by default. There is also a native contraction implementation that is used for e.g. arrays with an eltype that is not <:LinearAlgebra.BlasFloat. It performs the contraction directly without the additional permutations, but still in a cache-friendly and multithreaded way (again relying on JULIA_NUM_THREADS > 1). This implementation can also be used for BlasFloat types (but will typically be slower), and the use of BLAS can be controlled by explicitly switching the backend between StridedBLAS and StridedNative using the backend keyword to @tensor. Similarly, different allocation strategies, when available, can be selected using the allocator keyword of @tensor.
The primitive tensor operations are also implemented for CuArray objects of the CUDA.jl library. This implementation is essentially a simple wrapper over the cuTENSOR library of NVidia, and will only be loaded when the cuTENSOR.jl package is loaded. The @tensor macro will then automatically work for operations between GPU arrays.
Mixed operations between host arrays (e.g. Array) and device arrays (e.g. CuArray) will fail. However, if one wants to harness the computing power of the GPU to perform all tensor operations, there is a dedicated macro @cutensor. This will transfer all host arrays to the GPU before performing the requested operations. If the output is an existing host array, the result will be copied back. If a new result array is created (i.e. using :=), it will remain on the GPU device and it is up to the user to transfer it back. Arrays are transfered to the GPU just before they are first used, and in a complicated tensor expression, this might have the benefit that transer of the later arrays overlaps with computation of earlier operations.
Use the GPU to perform all tensor operations, through the use of the cuTENSOR library. This will transfer all arrays to the GPU before performing the requested operations. If the output is an existing host array, the result will be transferred back. If a new array is created (i.e. using :=), it will remain on the GPU device and it is up to the user to transfer it back. This macro is equivalent to @tensor backend=cuTENSOR allocator=cuTENSOR tensor_expr.
Note
This macro requires the cuTENSOR library to be installed and loaded. This can be achieved by running using cuTENSOR or import cuTENSOR before using the macro.
the latter has the advantage that conjugating B is not an extra step (which creates an additional temporary requiring allocations), but is performed at the same time when it is contracted.
As an alternative solution to the optional positional arguments, there is also an @ncon macro. It is just a simple wrapper over an ncon call and thus does not analyze the indices at compile time, so that they can be fully dynamical. However, it will transform
ncon(tensorlist, indexlist, [conjlist, sym]; order = ..., output = ...)
Contract the tensors in tensorlist (of type Vector or Tuple) according to the network as specified by indexlist. Here, indexlist is a list (i.e. a Vector or Tuple) with the same length as tensorlist whose entries are themselves lists (preferably Vector{Int}) where every integer entry provides a label for corresponding index/dimension of the corresponding tensor in tensorlist. Positive integers are used to label indices that need to be contracted, and such thus appear in two different entries within indexlist, whereas negative integers are used to label indices of the output tensor, and should appear only once.
Optional arguments in another list with the same length, conjlist, whose entries are of type Bool and indicate whether the corresponding tensor object should be conjugated (true) or not (false). The default is false for all entries.
By default, contractions are performed in the order such that the indices being contracted over are labelled by increasing integers, i.e. first the contraction corresponding to label 1 is performed. The output tensor had an index order corresponding to decreasing (negative, so increasing in absolute value) index labels. The keyword arguments order and output allow to change these defaults.
@ncon(tensorlist, indexlist; order = ..., output = ...)
Contract the tensors in tensorlist (of type Vector or Tuple) according to the network as specified by indexlist. Here, indexlist is a list (i.e. a Vector or Tuple) with the same length as tensorlist whose entries are themselves lists (preferably Vector{Int}) where every integer entry provides a label for corresponding index/dimension of the corresponding tensor in tensorlist. Positive integers are used to label indices that need to be contracted, and such thus appear in two different entries within indexlist, whereas negative integers are used to label indices of the output tensor, and should appear only once.
By default, contractions are performed in the order such that the indices being contracted over are labelled by increasing integers, i.e. first the contraction corresponding to label 1 is performed. The output tensor had an index order corresponding to decreasing (negative, so increasing in absolute value) index labels. The keyword arguments order and output allow to change these defaults.
The advantage of the macro @ncon over the function call ncon is that, if tensorlist is not just some variable but an actual list (as a tuple with parentheses or a vector with square brackets) at the call site, the @ncon macro will scan for conjugation calls, e.g. conj(A), and replace this with just A but build a matching list of conjugation flags to be specified to ncon. This makes it more convenient to specify tensor conjugation, without paying the cost of actively performing the conjugation beforehand.
Indices with the same label, either open indices on the two sides of the equation, or contracted indices, need to be compatible. For AbstractArray objects, this means they must have the same size. Other tensor types might have more complicated structure associated with their indices, and requires matching between those. The function checkcontractible is part of the interface that can be used to control when tensors can be contracted with each other along specific indices.
If indices do not match, the contraction will spawn an error. However, this can be an error deep within the implementation, at which point the error message will provide little information as to which specific tensors and which indices are producing the mismatch. When debugging, it might be useful to add the keyword argument contractcheck = true to the @tensor macro. Explicit checks using checkcontractible are then enabled that are run before any tensor operation is performed. When a mismatch is detected, these checks still have access to the label information and spawn a more informative error message.
A different type of check is the costcheck keyword argument, which can be given the values :warn or :cache. With either of both values for this keyword argument, additional checks are inserted that compare the contraction order of any tensor contraction of three or more factors against the optimal order based on the current tensor size. More generally, the function tensorcost is part of the interface and associated a cost value with every index of a tensor, which is then used in the cost model. With costcheck=:warn, a warning will be spawned for every tensor network where the actual contraction order (even when optimized using abstract costs) does not match with the ideal contraction order given the current tensorcost values. With costcheck = :cache, the tensor networks with non-optimal contraction order are stored in a global package variable TensorOperations.costcache. However, when a tensor network is evaluated several times with different tensor sizes or tensor costs, only the evaluation giving rise to the largest total contraction cost for that network will appear in the cache (provided the actual contraction order deviates from the optimal order in that largest case).
Every index expression will be evaluated as a sequence of elementary tensor operations, i.e. permuted additions, partial traces and contractions, which are implemented for strided arrays as discussed in Package features. In particular, these implementations rely on Strided.jl, and we refer to this package for a full specification of which arrays are supported. As a rule of thumb, this primarily includes Arrays from Julia base, as well as views thereof if sliced with a combination of Integers and Ranges. Special types such as Adjoint and Transpose from Base are also supported. For permuted addition and partial traces, native Julia implementations are used which could benefit from multithreading if JULIA_NUM_THREADS>1.
The binary contraction is performed by first permuting the two input tensors into a form such that the contraction becomes equivalent to one matrix multiplication on the whole data, followed by a final permutation to bring the indices of the output tensor into the desired order. This approach allows to use the highly efficient matrix multiplication kernel (gemm) from BLAS, which is multithreaded by default. There is also a native contraction implementation that is used for e.g. arrays with an eltype that is not <:LinearAlgebra.BlasFloat. It performs the contraction directly without the additional permutations, but still in a cache-friendly and multithreaded way (again relying on JULIA_NUM_THREADS > 1). This implementation can also be used for BlasFloat types (but will typically be slower), and the use of BLAS can be controlled by explicitly switching the backend between StridedBLAS and StridedNative using the backend keyword to @tensor. Similarly, different allocation strategies, when available, can be selected using the allocator keyword of @tensor.
The primitive tensor operations are also implemented for CuArray objects of the CUDA.jl library. This implementation is essentially a simple wrapper over the cuTENSOR library of NVidia, and will only be loaded when the cuTENSOR.jl package is loaded. The @tensor macro will then automatically work for operations between GPU arrays.
Mixed operations between host arrays (e.g. Array) and device arrays (e.g. CuArray) will fail. However, if one wants to harness the computing power of the GPU to perform all tensor operations, there is a dedicated macro @cutensor. This will transfer all host arrays to the GPU before performing the requested operations. If the output is an existing host array, the result will be copied back. If a new result array is created (i.e. using :=), it will remain on the GPU device and it is up to the user to transfer it back. Arrays are transfered to the GPU just before they are first used, and in a complicated tensor expression, this might have the benefit that transer of the later arrays overlaps with computation of earlier operations.
Use the GPU to perform all tensor operations, through the use of the cuTENSOR library. This will transfer all arrays to the GPU before performing the requested operations. If the output is an existing host array, the result will be transferred back. If a new array is created (i.e. using :=), it will remain on the GPU device and it is up to the user to transfer it back. This macro is equivalent to @tensor backend=cuTENSOR allocator=cuTENSOR tensor_expr.
Note
This macro requires the cuTENSOR library to be installed and loaded. This can be achieved by running using cuTENSOR or import cuTENSOR before using the macro.