-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LinearAlgebra and TensorCore future design: Be more greedy #729
Comments
You might be interested in ITensors.jl, whose tensor objects are explicitly labelled by the spaces (as you suggest) although perhaps without distinguishing dual spaces. While you write that
with a notion of primes to label indices in the same space which ought not to be contracted. About present behaviour, you write:
I'm not sure it's quite as dire as you make out. There is an implicit second dimension |
Thanks for your comment mcabbott 😊
This would be a good thing to discuss. For any application I have ever seen, it would not make physical sense to have In any case, the point is that
Not every IEEE journal takes transposes seriously 😊 When Julia decided to do this right, we made a conscious decision to NOT think of vectors as single column matrices. A single column matrix still provides a representation of a vector though. If a vector was a column matrix and if a covector was a row matrix, then the notation The correct operation is |
Note on assocativity of a
Therefore, such a |
Edit: iTensor does write this as
This is not quite correct. They have Anyway one more thought. For hacking together such Tensor objects, I had a PR to NamedDims at some point which used names |
@EricForgy, can I assume you've read #42? The concept of up and down dimensions in arrays was considered there and intentionally not chosen because it was too complex for such a basic type. The notion of explaining to a new users why they cannot multiply |
Hi Stefan, Yeah. I read and even participated in #42 😊 I agree that up / down indices would be too complicated, but I never suggested doing that. The underlying representation can still be an The 4 rank-2 tensors
are all different types, but all have representations as a 2D array of numbers and are indexed like a normal It is good to point that out though, so I can clarify 👍 |
if tensors were machines then they would be callable. they are data structures |
Hi bionicles 👋
I didn't see this comment, but I would just note that tensors (in the context of multilinear algebra) are absolutely machines and, as such, should be callable. Making that connection is important (and would probably a good PR to LinearAlgebra) 👍 As an example, given vectors
where Note: As I noted in the original post, tensors "done right" should handle both covectors and vectors as first-class citizens, but that doesn't mean we need to change the indexing. Both vectors and covectors can be indexed with |
I was delighted to find this issue and would love to have a correct and powerful tensor representation in Julia. However I agree with Stefan that this should probably be done in a two-layer approach, with only the very plain and easy to understand |
Hi @pablosanjose 👋
I'm not sure if this could be done properly with just A = A^i_j,k e_i ⊗ dx^j ⊗ dx^k from B = B_i,j^k dx^i ⊗ dx^j ⊗ e_k although the underlying datastructure and indexing for both is still We need It was brought to my attention just today that we have some maturing capabilities for applied category theory now in Julia with Catlab.jl. That is awesome 😍
As my friends at the nLab so eloquently put it.
Now I am thinking that the right way to go about this might be to have See, for example, https://algebraicjulia.github.io/Catlab.jl/latest/apis/core/#Instances. 🤔 |
I probably did not express it clearly. Of course we would need a new |
Oh right. Sorry. I did misunderstand. Yes. Absolutely.
My initial thought was to do the latter, but the former could also work and would probably be more palletable. As I said in the first post:
The above could be implemented in If that is done "right" in A cool side benefit of bringing category theory in is that it would give us an excuse to use string diagrams to illustrate a lot of the concepts, e.g. a vector is an arrow pointing down, a covector is an arrow point up, tensor product is two arrows next to each other, etc 😊 |
TL ; DR
I'm proposing a change to the internal structure of
LinearAlgebra
that is more suitable for a to-be-developedTensorAlgebra
library.What it does:
ℝ3
,U
,U*
, etc.Covector
s first-class citizens, while retainingTranspose
andAdjoint
.*
and⊗
.What it does not do:
For easily 95% of use cases, I don't think there will be any impact at all. Most of the machinery can be hidden behind the scenes, but it paves the way to a clean tensor implementation that makes the effort worthwhile.
Summary
This is a follow-up from a discussion in PR JuliaLang/julia#35150.
In particular (from Jeff):
Any concerns I have don't rise to the level of "dodgy" by any means since I think we already do things better than any alternative I have seen, but I do think we can do some things better.
If I were to try to summarize Jiahao's already nice summary, I'd say the conclusion was that scalars types, vector types and matrix types are not sufficient for linear algebra. You need a fourth fundamental type, i.e. covectors, which at the time, they called
RowVector
(but at some point along the way,RowVector
was removed fromLinearAlgebra
for some reason).The fact that "LinearAlgebra done right" requires dual vectors is no surprise since a linear map (matrix) is already a once-contravariant (vector) and once-covariant (covector) tensor, i.e. a machine that takes vectors and spits out vectors in a linear manner.
Although we lost
RowVector
(which I would callCovector
if we were to introduce it again), given a vectoru
, we still haveTranspose(u)
andAdjoint(u)
. These are both, in fact, covectors so this is already better than any alternatives I have seen. However, not all covectors arise from taking the transpose or adjoint of a vector. For instance, the simplest nontrivial example of a pure covector I can think of is the differential:[Ascii math note:
@_x
denotes "partial derivative with respect tox
"][Aside: Given the importance of machine learning and automatic differentiation, I think getting this stuff right has some real practical benefits, so I hope what I'm saying isn't seen as purely theoretical.]
From a differential topological point of view, the differential
df
is more fundamental thangrad(f)
, sincegrad(f)
is defined aswhich has a dependency on the metric, where
df
does not. In fact, when you see things likein automatic differentiation, one could argue (correctly) that you are actually dealing with covectors. Calling these "gradients" is a misnomer that I'm not sure is healthy to propagate. You don't "pullback vectors". You "pullback covectors".
In the spirit of greediness, I want Julia to be even more greedy about getting this right where no one else I am aware (except maybe Haskell, which I am not familiar with) gets right. The difference between vectors and covectors matters although you can probably get away without worrying about it for 90% of use cases, but I don't think we are shooting for 90% excellence. We want to be 100% excellent and we can be.
LinearAlgebra
As I see it now,
LinearAlgebra
has 3 fundamental types:T
Array{T,1}
Array{T,2}
with corresponding derived types:
Transpose
Adjoint
Given a vector
u
,Transpose(u)
andAdjoint(u)
are covectors, but rightly have their own type distinct fromCovector
because of the way the data is laid out and how they interact with other operations (via dispatch).To do LinearAlgebra right, I think we need (at least) 3 fundamental types:
with corresponding derived types:
[Note: If we have vectors and covectors as fundamental types, a matrix is actually a derived type.]
If we were super greedy, we'd also include:
For example, if
u
andv
are vectors, the vector cross productu × v
is actually a pseudovector.I have put some thought into how to do this, but it is not easy to get right (otherwise everyone would have already done it). For example, one of my earlier ideas was to extend
Array
to negative dimensions so that:but that seems unsatisfactory and not great for students just learning linear algebra.
Then, there is a beautiful package: TensorKit.jl. The documentation goes into some awesome background material. It is probably my own failing, but some things there are little hard for me to understand, but what I do understand looks great and forms a nice basis for thinking through this stuff.
I think that getting
LinearAlgebra
right requires getting tensors right. While we sort all this out, Tim created TensorCore.jlTensorCore
After the discussion in JuliaLang/julia#35150, a merged PR that introduced tensor product
⊗
toLinearAlgebra
was reverted and moved to TensorCore.jl. That seems like a good place to experiment and sort this out.Both vectors and covectors are special kinds of tensors, so it seems sensible to try to work this out for tensors and then make
Vector
andCovector
special instances of a newTensor
type rather thanArray
.I sketched some initial (only lightly baked) ideas in a comment on the PR. The idea is that given vector spaces
U
,V
,W
and their respective dual spacesU*
,V*
,W*
, we can define a vectoru ϵ U
as an instance ofTensor{T,U}
, i.e.Similarly, we can define a covector
α ϵ U^*
as an instance ofTensor{T,U*}
.In this way, a matrix
A
would be an instance of typeTensor{T,V,U*}
.The tensor product of two vectors of type
Tensor{T,U}
andTensor{T,V}
is a tensor of typeTensor{T,U,V}
, i.e. the list of vector space type parameters is concatenated.Given two vectors
u ϵ U
andv ϵ V
together with covectorsα ϵ U*
andβ ϵ V*
, tensor product results in four distinct geometric objects:u ⊗ v
: An instance ofTensor{T,U,V}
u ⊗ β
: An instance ofTensor{T,U,V*}
, i.e. a "matrix"α ⊗ v
: An instance ofTensor{T,U*,V}
α ⊗ β
: An instance ofTensor{T,U*,V*}
with additional obvious permutations. This extends to higher order, e.g.
(u ⊗ v) ⊗ (α ⊗ β)
: An instance ofTensor{T,U,V,U*,V*}
.Multilinear Machines
Tensors can be thought of multilinear machines taking vectors and covectors as arguments and producing a number that is linear in each argument, e.g. if
A
is an instance ofTensor{T,U,V*,U*,V}
withu
,v
,α
andβ
as above, thenis an instance of
T
.Partial Evaluation
With the same
A
as above, we can consider partial evaluation:This an an instance of type
Tensor{T,U}
, i.e. a vector.Similarly,
is an instance of type
Tensor{T,V*}
, i.e. a covector.Hence, a vector can be thought of as a machine that takes a covector and produces a number in a linear manner. Similarly, a covector can be thought of as a machine that takes a vector and produces a number in a linear manner.
Now, we can revisit matrices as linear maps by considering
which is an instance of
Tensor{T,U,V*}
, which we've said is a matrix. If we feedA
one more vector likewe get a machine that will take one more covector and produce a number, but we already said this type of machine is a vector, so
A(-,-,u,β)
is a linear mapV -> U
.Tensor Operations
I think there is one area we did not get right with
LinearAlgebra
and I suspect it is the source of issues elsewhere. Given vectorsu ϵ U
andv ϵ V
,LinearAlgebra
currently definesto be a matrix. This is unfortunate. The correct expression should be
is a matrix. So we have effectively made
* = ⊗
. This is unfortunate because*
and⊗
are two distinct tensor operators with different meanings so identifying them is problematic.Since the proposed tensor type
Tensor
consists of a scalar typeT
and an ordered list of vector spaces and dual vector spaces, I'd probably introduce three methods:lastspace(A::Tensor)::VectorSpace
firstspace(A::Tensor)::VectorSpace
aredual(U::VectorSpace,V::VectorSpace)::Bool
Then, we can define
A1*A2
for tensorsA1
andA2
only ifi.e. multiplication of tensors is only defined if the last space of the first tensor is dual to the first space of the second tensor. This can be thought of as path concatenation. Two paths can only be concatenated if the end of the first path coincides with the beginning of the second path.
If that is the case, then we simply contract the last tensor index of the first tensor with the the first tensor index of the second tensor.
For instance, given a matrix
A
of typeTensor{T,U,V*}
and a vectoru
of typeTensor{T,V}
then we haveThis is clearly distinct from:
As a more general example, consider
and
Conclusion
It may seem like overkill to bring in such discussion of tensor algebra just in order to get
LinearAlgebra
right, but I am greedy and I would likeLinearAlgebra
to be consistent with a more generalTensorAlgebra
(orTensorCore
). I think it is worth trying to get tensors right and then reviseLinearAlgebra
accordingly for consistency.I also think the scope of
LinearAlgebra
should be confined to vectors, covectors, matrices (as linear maps) and possibly other rank-2 tensors sinceu ⊗ v
can be thought of as a linear mapCovector -> Vector
u ⊗ β
can be thought of as a linear mapVector -> Vector
α ⊗ v
can be thought of as a linear mapCovector -> Covector
α ⊗ β
can be thought of as a linear mapVector -> Covector
Operations that keep you in scope, e.g.
dot
, are fine, but things likekron
that take you out of scope should probably be in another package likeTensorCore
(or some newTensorAlgebra
).I see this as a long term project. Maybe for Julia v2.0 or v3.0. It would be awesome if something like this could make it to v2.0, but it is a lot of work. I can try to make a POC package and see if it makes sense to others. In any case, after the discussion in JuliaLang/julia#35150 , I felt like writing a comprehensive summary might be of interest.
The text was updated successfully, but these errors were encountered: