Tensor terminology #1

devinamatthews · 2017-12-05T16:20:57Z

fruitfly1026 · 2017-12-06T15:35:15Z

Hi Devin,
They are good questions! Thanks for doing this.
This issue's checkboxes seem don't work well, but #2 issue's works well. After I checked the boxes above, it still showing "0 of 39" completed. Maybe I did this too early? haha~

fredrikbk · 2017-12-06T16:06:56Z

In the taco project we ended up with the following terminology after discussion with @peterahrens in the Julia group (and in the taco project): an order-3 tensor has 3 modes and the size of a mode is it's dimension. This avoids confusion for math- and physics-oriented people for whom a 3-dimensional vector is a vector with three components. It also means we can use the size term to refer to the size of a tensor in bytes.

CS and machine learning people, however, are more comfortable with referring to an array with three indices as having n dimensions. They often also use the concept of a shape to refer to the list containing the size of each dimension (e.g., TensorFlow which probably inherits its terminology from numpy).

This standard will probably be pretty low-level, so I don't think the above distinction between mode and dimension is that important. My votes are:
1 -> a: n-dimensional
2 -> a: dimensions
3 -> c: dimension
4 -> in taco we'll support both, but without code generation this is more work (it's probably possible to implement both without duplicating code with C macros, but still).
5 -> a: index
6 -> a: transpose
8 -> a: contracted (alternatively, "summed")
9 -> c: free

devinamatthews · 2017-12-06T16:28:47Z

@fruitfly1026 I don't think the checkboxes will work very well for "voting", but please feel free to comment and "vote" using the number/letter combos in the comments. Once there is a consensus I'll update the checkboxes to reflect that.

devinamatthews · 2017-12-06T16:32:57Z

@fredrikbk I think it might be prudent to look beyond the low-level interface for really fundamental stuff like this. High-level interfaces are in scope eventually as well.

fredrikbk · 2017-12-06T16:39:29Z

@devinamatthews yes, I suppose it is also a question of who the target audience is, since physicists/math people prefer mathematical terminology while CS people prefer array terminology. A tensor API can be used by both, but one group will have to learn new terminology. This effort seems to be heavy on scientists, so perhaps that argues for mode/order-terminology.

DmitryLyakh · 2017-12-06T19:10:02Z

1->c: order-n;
2->a: dimensions;
3->b: extent (dimension extent);
4->a: 0-based (default);
5->a: index;
6->a: transpose (tensor transpose);
7->b: I use, but no preference;
8->a: contracted;
9->a: uncontracted;

epifanovsky · 2017-12-06T19:30:45Z

1 -> a (n-dimensional) or c (order-n)
2 -> a (dimensions) or d (indices)
3 -> c (dimension)
4 -> a (0-based)
5 -> b (label)
6 ->b (permutation)
7 -> a ("comes from")
8 -> a (contracted)
9 -> a (uncontracted) or b (external)

fruitfly1026 · 2017-12-06T22:28:22Z

1->a: n-dimensional or c: order-n;
2->a: dimensions or b: modes;
3->a: length;
4->a: 0-based or b: 1-based;
5->a: index;
6->b: permutation;
7->a: (1,2,3,0) "comes-from";
8->a: contracted or b: internal;
9->a: uncontracted or b: external;

willow-ahrens · 2017-12-07T03:42:57Z

I will try to summarize some of the design decisions that went in to TACO's naming scheme.
1 c: order-n
I like order because it looks good as a function argument. I find the tensor constructor Tensor(int way) or Tensor(int adic) slightly more confusing than Tensor(int order). Using a plural of an answer to question 2 will make the interface confusing because the same word will be used twice in the interface but mean different things. (No to d: rank-n because rank can also refer to linear independence of columns).

2: b: modes
Ways is fine as a secondary. Using dimensions here presents a problem if you want to use dimension to talk about the answer to 3 (the function to query the answer to 3 would then be written dimension(int dimension)!). Index would work here too, but it makes more sense to use index to answer 5. Again I'm opposed to rank due to collisions with mathematics.

3: c: dimension
I think extent is fine too, but I'm not sure length makes sense if there are more than one lengths. Typically length represents only one degree of freedom, and others are denoted using width or height. Does a matrix have two lengths, a length and a width, or two dimensions?

b: 1-based
This is a personal preference, and I think it's easier to use 1-based indexing in a 0-based language than vice-versa.
a: index
In most explanations of Einstein summation (also called an index expression), this is called an index. It is likely that a tensor interface will use Einstein summation.
a: transpose
a: (1,2,3,0) "comes-from"
a: contracted (also I will echo fred's "summed")
c: free

evaleev · 2017-12-07T15:01:51Z

Largely overlap with @peterahrens thinking here.

order-n or rank-n or n-mode. "Order" can clash when used in same context as perturbation theory (or any kind of expansion), e.g. "First-order amplitudes T are an order4-tensor". "Rank" is not ideal either due to the confusion with matrix or tensor rank, but due to being a term of art in field theory this is OK. "Mode" has neutral meaning and perhaps is the best candidate. But we ideally need a term for the number of modes. P.S. I strongly dislike n-dimensional. I also don't like n-index in case we want to generalize to continuous degrees of freedom (probably irrelevant to low-level interface).
modes or ranks.
extent. Size (not listed here) could lead to confusion with the number of elements in tensor. Don't like dimension here or elsewhere: "dimension of the 2nd dimension of the 4-dimensional tensor ..." vs "extent of the 2nd mode or the order-4 tensor ...".
0-based as the default in most important languages. Probably makes sense to use the same base as the most likely implementation languages. In high-level it is possible to abstract out this choice as a Range concept that implements the index->ordinal conversion.
index or mode label
permutation. Transposition involves 2 indices. Permutation involves 0, 2 or more, and can be written as a product of transpositions.
no preference.
contracted or bound. Internal will be problem when discussing Hadamard as same index will appear in the result. This actually needs more thought. Maybe "contraction" for co-reduction, and "multiplied" (feel free to suggest better ideas) for Hadamard.
free. Don't like uncontracted (reversal of contraction?).

pauldj · 2017-12-07T16:09:30Z

n-dimensional or order-n.
Since "dimension" is used in a multitude of different way, I'd go for "order-n".
dimensions, modes, indices
"Dimensions" and "modes" follow the answer to Tensor terminology #1.
I would argue that "indices" is unmistakable.
extent or size.
0-based
index
This is really tricky.
It is true that "transposition" only involves 2 indices, but it is also true that "permutation" has a completely different meaning when applied to a matrix (e.g., you permute the rows of a matrix).

I'd go either with "transposition" or with something entirely different (reshape, rearrangement, ...)

a: comes-from
contracted
free

ShadenSmith · 2017-12-07T16:59:05Z

Multiple answers are ordered by preference.

n-mode, order-n, n-dimensional
modes (but should really just follow what we select for Tensor terminology #1)
dimension, extent
In source code? 0-based
index
tensor transpose ("permutation' should be an interchange of indices within a mode, whereas this crosses modes)
comes from
contracted
uncontracted, free

DmitryLyakh · 2017-12-07T21:10:22Z

To defend further 6a:"tensor transpose", the term "matrix transpose" is widely used, so a "tensor transpose" would be a natural generalization.

devinamatthews · 2017-12-07T21:34:30Z

I have also used "tensor transpose" quite heavily in the past, but I acknowledge that there are some rough edges on the notation. For example, in QC we also use the transpose w.r.t the Hilbert space, e.g. . On the other hand, as others have noted, "permutation" can be problematic as well. "reordering" is of course not a good idea if we adopt 1c. "reshape" a la MATLAB is something entirely different. It also doesn't seem like anyone likes "shuffle". So I guess I would vote:

order-n (or n-mode)
modes
extent or length (although Eigen uses dimension)
0-based (either can be supported in a high-level interface, mostly I was referring to how it should be defined mathematically for defining indexing, strides, etc.)
index
permutation (of modes, the other "permutation" could be "permutation of basis" or something)
comes from
bound
free

For 8 and 9, I would usually use contracted/uncontracted but for Hadamard indices this doesn't make as much sense.

Re @evaleev's comment: I have always used the term "simple permutation" to describe a two-index transpose/permutation.

evaleev · 2017-12-07T22:06:25Z

Re: permutation vs transpose. Just throwing out there to see if someone can generate something better: repack or re-mode, or mode permutation or mode shuffle? Repack might make sense for a low-level interface when we refer to changing the physical layout of the data. I like @devinamatthews idea of "mode permutation" and was thinking along same lines ... permutation of modes reorders modes, reordering a mode shuffles the indexing within the mode.

p.s. additional data point: from what I recall at a recent Simons tensor meeting meaning of "tensor transpose" was not immediately apparent to a roomful of physicists, i.e. they wanted to know if it meant permutation. FWIW.

pauldj · 2017-12-08T08:33:10Z

"Mode permutation" sounds good. Then one can always write:
<<In this paper, we use the terms "mode permutation" and "tensor transposition" interchangeably.>> :-)

devinamatthews · 2017-12-12T19:40:38Z

I've updated the list to reflect what seem at this point to be the least ambiguous and most compatible terms. Please voice your displeasure in the next day or two and then I will update the existing documentation and move on to the next issue(s).

ryanmrichard · 2017-12-13T16:26:03Z

I like the current front-runners, with a strong favoring for 3b (extent) over 3c (dimension) . I'll also chime in with support of not redefining math terminology, i.e. transpose is a permutation of 2, and only 2, elements in a sequence.

springer13 · 2017-12-13T17:04:13Z

(a) order-n or (c) n-way
(b) modes
(b) extent or size
- I don't like dimension too much since this could be easily mistaken with (1)
(a) 0-based
(a) index
(a) transpose
(b) goes-to
(a) contracted
(c) free

psaday · 2017-12-13T19:09:00Z

(a) order-n
(b) modes
(b) extent
(a) 0-based
(a) index
Index-permutation
No preference
(a) contracted
(c) free

annakrylov · 2017-12-17T21:02:27Z

I think that it is important to keep the terminology simple and consistent with what is commonly used in physical sciences, in particular in chemistry. Afterall, MolSSI is the chemistry institute. I found some of the suggested terms truly bizarre and confusing! I took quite a few math formal courses from math department back in Russia and have never encountered some of these names.

Regardless of what we agree on, we should make a page summarizing these different ways to describe tensors and common operations and have it somewhere on the MolSSI tensor page, as a reference of what can be encountered in the literature.

1 -> a (n-dimensional)
2 -> a (dimensions) or d (indices)
3 -> a (length) or c (dimension)
4 -> a (0-based)
5 ->a (index)
6 ->b (permutation)
7 -> a ("comes from")
8 -> a (contracted)
9 -> a (uncontracted)

devinamatthews · 2017-12-18T21:18:50Z

It seems the main points of contention are a) "dimension vs. mode" and b) "contracted/uncontracted vs. bound/free". Second place contention is c) "extent vs. dimension vs. length".

a) Additional data point: both TensorFlow and C++ refer to rank as the number of dimensions. Thus an n-dimensional tensor with rank n and n dimensions would actually make a lot of sense, and continue common chemistry usage. Perhaps order-n with n modes is just too unfamiliar? In any case I plan to describe alternative notations whatever "official" term we pick.

b) Of course contracted/uncontracted is very popular, but I have to say, "Think of the Hadamard indices!" I really think notational precision is more important than historical inertia here.

c) I would nix dimension whatever happens with a) since there is just too much room for confusion. But on the other hand extent just sounds weird to me. I personally think that length, width, ... extends quite naturally to length_0, length_1, ...

dgasmith · 2017-12-19T01:21:09Z

@devinamatthews

Thus an n-dimensional tensor with rank n and n dimensions would actually make a lot of sense, and continue common chemistry usage.

+1 for this.

For your c) point, it might be helpful to think about how to word this in context:

"The size/width/extent/length of dimension i is ..."
"Dimension i ranges from 0...n"
"Dimension i has dimension n" (really rough)

I would vote for size over width/length/extent which seems to imply a contiguous range of numbers which might not always be true and would adhere more to C++11/TensorFlow/NumPy/etc nomenclature.

DmitryLyakh · 2017-12-19T03:46:32Z

Dear Colleagues, May I insert few words: a) I would advocate to be intra-domain consistent (minimal). There is zero chance we can come up with a great unification here (inter-domain). I think it would be appropriate to pick up the nomenclature mostly used in the domain we plan the development for, at least at this point. b) I think the mitigation for inter-domain applications would be to introduce a map of synonyms which would extend the nomenclature to other (potential) domains, to avoid confusion. c) I am personally against using geometrically biased words as length, width, depth. And size, I believe, should be used with care. Thanks, Dmitry

…

On Dec 18, 2017 8:21 PM, "Daniel Smith" ***@***.***> wrote: @devinamatthews <https://github.com/devinamatthews> Thus an n-dimensional tensor with rank n and n dimensions would actually make a lot of sense, and continue common chemistry usage. +1 for this. For your c) point, it might be helpful to think about how to word this in context: - "The size/width/extent/length of dimension i is ..." - "Dimension i ranges from 0...n" - "Dimension i has dimension n" (really rough) I would vote for size over width/length/extent which seems to imply a contiguous range of numbers which might not always be true and would adhere more to C++11/TensorFlow/NumPy/etc nomenclature. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKDFR5Httgi767mEeBVoe-s4PLWznVzhks5tBw-GgaJpZM4Q2j6t> .

DmitryLyakh · 2017-12-19T04:08:39Z

Concerning Hadamard elementwise multiplication, and potential non-standard tensor products, the Einstein convention may be compromised. Also, let's include CP and derivatives (hypercontraction?). From my point of view, the two invariants are l.h.s. and r.h.s indices. Another two invariants are "being summed over" and "being iterated over", so, unless I am wrong, we should have four differnt kinds of binary tensor operations in this context. Thanks, Dmitry

…

On Dec 18, 2017 4:18 PM, "Devin Matthews" ***@***.***> wrote: It seems the main points of contention are a) "dimension vs. mode" and b) "contracted/uncontracted vs. bound/free". Second place contention is c) "extent vs. dimension vs. length". a) Additional data point: both TensorFlow and C++ refer to rank as the number of dimensions. Thus an n-dimensional tensor with rank n and n dimensions would actually make a lot of sense, and continue common chemistry usage. Perhaps order-n with n modes is just too unfamiliar? In any case I plan to describe alternative notations whatever "official" term we pick. b) Of course contracted/uncontracted is very popular, but I have to say, "Think of the Hadamard indices!" I really think notational precision is more important than historical inertia here. c) I would nix dimension whatever happens with a) since there is just too much room for confusion. But on the other hand extent just sounds weird to me. I personally think that length, width, ... extends quite naturally to length_0, length_1, ... — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AKDFR-SS7rbkTOmEC8xoJB0BcxWO2x_1ks5tBta8gaJpZM4Q2j6t> .

ShadenSmith · 2017-12-19T14:49:29Z

There is zero chance we can come up with a great unification here (inter-domain)

Agreed.

I think it would be appropriate to pick up the nomenclature mostly used in the domain we plan the development for, at least at this point.

Yes, but we should avoid selecting terminology only because of a field that it is used in (i.e., catering to the comfort of certain users). Every field will pick up bad habits in its terminology over the years, or select terminology that strongly clashes with other fields, and we should make a modest effort to select terminology that minimizes confusion across domains.

The point above by @annakrylov that we should prefer the terminology of chemists (who are of course major users of tensor contraction software) is quite reasonable, but we should use this group as a chance to talk to other fields and drop terminology that exists only due to historical momentum.

twindus · 2018-03-25T16:30:51Z

Dear Colleagues,

We at MolSSI (www.molssi.org) would like to get the conversation about tensors started up again within the working group. There was great momentum before the end of the year and we hope to spark the conversation to get it moving forward. Toward this end, we have asked Devin Matthews ([email protected]) to take on the leadership role of this working group to facilitate discussion, shepherd the decision making process and to make decisions to keep the momentum going. Devin has graciously agreed to take on this role!

As a reminder, MolSSI formed this group to have discussion around shared memory (on-node) tensor operations particularly for the molecular sciences domain – primarily quantum chemistry. To date we have focused on definitions mainly for contractions. We would like to get to the point where we can have a few APIs that are “standard” for the community to use – one that is more BLAS like and one that is more C++ oriented. In addition to this, other possible directions for the group to take are tensor manipulation (a more object oriented approach for indexing into tensors), tensor factorization, sparsity and maybe even distributed tensors. We are, of course, very open to other suggestions on directions that this group should or should not take.

Our plan is to continue the discussions over GitHub, but have sent this initial message over the google group as well just in case you haven’t been following the GitHub repo.

If you should have any questions or concerns, please feel free to contact me or Devin.

All the best,
Theresa Windus

emstoudenmire · 2018-03-29T19:43:26Z

Just to contribute to the discussion, I think this is a great initiative to find common interfaces for connecting libraries.

Keeping that more practical goal in mind, it's helpful to nail down this terminology with main goal of making sure everyone understands each other, and that consumers of libraries understand how to use them and how to read the API's etc.

I don't think mathematical correctness is nearly as important as just clarity and consistency.

Taking the example of the names size/dimension/extent etc. I think that the way to go is to narrow it down to just a few candidates, based on which are most likely to be intelligible to the widest number of people, then simply have someone (Devin) make the deciding vote that we're going to go with one of them. Similar with other terminology.

The main consideration (which Ed Valeev seems to focus on, and I think is the right thing to emphasize) is confusion with other terms. So I agree that "dimension" is to be avoided since although it's mathematically correct, it's ambiguous between the dimension of a single index versus the number of indices, etc.

So in light of the above, my votes would be:

order (physicists often use 'rank' but I think order is better since rank is ambiguous with number of non-zero eigenvalues / singular values)
indices (seems closer to how we talk about matrices when first learning them in college)
extent (it's hard to confuse with other things and is self-explanatory)
0-based indices are better for interacting with low-level systems languages and implementing slicing
(not sure; do you mean the numerical value of "i" or the letter?)
permutation (transpose seems like a matrix-specific term to me)
(no opinion)
contracted (if the meaning is being summed over, then I thought this terminology was fairly standard?)
(no opinion; all seem fine)

devinamatthews · 2018-04-09T15:17:02Z

I've updated the issue to reflect what seems to be the least problematic set of "canonical" terms. Of course, as long as we have a translation table then everyone can stay on the same page. I have avoided "dimension" in any usage since it is so heavily overloaded, and also tried to avoid using "index" for two different concepts (the "slot" in the tensor and what label that slot is given).

I will be updating the wiki page on terminology to reflect this discussion and provide a translation table of terms. Then, I suggest moving on to working on a basic low-level tensor contraction interface.

basanders · 2018-04-09T15:40:18Z

Just want to point out that these terms are for an interface in software, so it is possible to use more than one word if that is clearer or highlights relationships between variables. For example, if there is some object referred to by X, then the meaning of numX (or num_X--pick a convention) is obvious.

When people are talking or writing about their work, they can still use whatever term is usual in their domain.

devinamatthews · 2018-04-09T16:27:58Z

I've updated the issue to reflect what seems to be the least problematic set of "canonical" terms. Of course, as long as we have a translation table then everyone can stay on the same page. I have avoided "dimension" in any usage since it is so heavily overloaded, and also tried to avoid using "index" for two different concepts (the "slot" in the tensor and what label that slot is given).

I will be updating the wiki page on terminology to reflect this discussion and provide a translation table of terms. Then, I suggest moving on to working on a basic low-level tensor contraction interface.

evaleev · 2018-06-24T12:45:04Z

Re: issue #5 and the aligned mdspan ISO C++ standard proposal ... the terminology issue may need to be revisited if we decide to align with that proposal.

Comments?

devinamatthews · 2018-06-24T14:52:50Z

@evaleev I think we can reopen this discussion after we have made some progress on dense tensor contraction. It would simply be a matter of word replacement at that point and I would rather not get bogged back down now.

devinamatthews mentioned this issue Dec 5, 2017

Low-level Tensor Interface I #2

Open

15 tasks

ryanmrichard mentioned this issue Mar 19, 2018

Index Spaces documentation NWChemEx/TAMM#20

Merged

2 tasks

devinamatthews closed this as completed Apr 9, 2018

evaleev reopened this Jun 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor terminology #1

Tensor terminology #1

devinamatthews commented Dec 5, 2017 •

edited

Loading

fruitfly1026 commented Dec 6, 2017 •

edited

Loading

fredrikbk commented Dec 6, 2017 •

edited

Loading

devinamatthews commented Dec 6, 2017

devinamatthews commented Dec 6, 2017

fredrikbk commented Dec 6, 2017

DmitryLyakh commented Dec 6, 2017

epifanovsky commented Dec 6, 2017

fruitfly1026 commented Dec 6, 2017

willow-ahrens commented Dec 7, 2017 •

edited

Loading

evaleev commented Dec 7, 2017

pauldj commented Dec 7, 2017

ShadenSmith commented Dec 7, 2017

DmitryLyakh commented Dec 7, 2017

devinamatthews commented Dec 7, 2017

evaleev commented Dec 7, 2017

pauldj commented Dec 8, 2017

devinamatthews commented Dec 12, 2017

ryanmrichard commented Dec 13, 2017

springer13 commented Dec 13, 2017

psaday commented Dec 13, 2017

annakrylov commented Dec 17, 2017

devinamatthews commented Dec 18, 2017

dgasmith commented Dec 19, 2017

DmitryLyakh commented Dec 19, 2017 via email

DmitryLyakh commented Dec 19, 2017 via email

ShadenSmith commented Dec 19, 2017

twindus commented Mar 25, 2018

emstoudenmire commented Mar 29, 2018

devinamatthews commented Apr 9, 2018

basanders commented Apr 9, 2018

devinamatthews commented Apr 9, 2018

evaleev commented Jun 24, 2018

devinamatthews commented Jun 24, 2018

Tensor terminology #1

Tensor terminology #1

Comments

devinamatthews commented Dec 5, 2017 • edited Loading

fruitfly1026 commented Dec 6, 2017 • edited Loading

fredrikbk commented Dec 6, 2017 • edited Loading

devinamatthews commented Dec 6, 2017

devinamatthews commented Dec 6, 2017

fredrikbk commented Dec 6, 2017

DmitryLyakh commented Dec 6, 2017

epifanovsky commented Dec 6, 2017

fruitfly1026 commented Dec 6, 2017

willow-ahrens commented Dec 7, 2017 • edited Loading

evaleev commented Dec 7, 2017

pauldj commented Dec 7, 2017

ShadenSmith commented Dec 7, 2017

DmitryLyakh commented Dec 7, 2017

devinamatthews commented Dec 7, 2017

evaleev commented Dec 7, 2017

pauldj commented Dec 8, 2017

devinamatthews commented Dec 12, 2017

ryanmrichard commented Dec 13, 2017

springer13 commented Dec 13, 2017

psaday commented Dec 13, 2017

annakrylov commented Dec 17, 2017

devinamatthews commented Dec 18, 2017

dgasmith commented Dec 19, 2017

DmitryLyakh commented Dec 19, 2017 via email

DmitryLyakh commented Dec 19, 2017 via email

ShadenSmith commented Dec 19, 2017

twindus commented Mar 25, 2018

emstoudenmire commented Mar 29, 2018

devinamatthews commented Apr 9, 2018

basanders commented Apr 9, 2018

devinamatthews commented Apr 9, 2018

evaleev commented Jun 24, 2018

devinamatthews commented Jun 24, 2018

devinamatthews commented Dec 5, 2017 •

edited

Loading

fruitfly1026 commented Dec 6, 2017 •

edited

Loading

fredrikbk commented Dec 6, 2017 •

edited

Loading

willow-ahrens commented Dec 7, 2017 •

edited

Loading