Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape of transpose(x)' differs for real and complex x #703

Open
thisrod opened this issue Mar 19, 2020 · 25 comments
Open

Shape of transpose(x)' differs for real and complex x #703

thisrod opened this issue Mar 19, 2020 · 25 comments

Comments

@thisrod
Copy link

thisrod commented Mar 19, 2020

This is annoying if you run into it. I imagine that the best fix is a special method for the size of this complicated type of Complex array.

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.0-rc2.0 (2020-02-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> x = randn(3);

julia> transpose(x)'
3-element Array{Float64,1}:
 -0.36345466121621794
  0.07867715859941832
  0.14644026210406283

julia> x = Complex.(x);

julia> transpose(x)'
3×1 LinearAlgebra.Adjoint{Complex{Float64},LinearAlgebra.Transpose{Complex{Float64},Array{Complex{Float64},1}}}:
 -0.36345466121621794 - 0.0im
  0.07867715859941832 - 0.0im
  0.14644026210406283 - 0.0im
@thisrod thisrod changed the title Shape of transpose(x)' differs for Real and Complex Shape of transpose(x)' differs for real and complex x Mar 19, 2020
@KristofferC
Copy link
Member

Could you explain a bit more what you think the problem is? What did you want to happen?

@StefanKarpinski
Copy link
Member

I'd guess that the unexpected part is that adjoint "undoes" the transpose in the real case but not in the complex case. For real vectors transpose and adjoint are the same operation which is its own inverse. That's not the case for complex vectors. A possible different behavior would be for a composition of an adjoint and a transpose to produce an eagerly conjugated vector.

@KristofferC
Copy link
Member

KristofferC commented Mar 19, 2020

I'd guess that the unexpected part is that adjoint "undoes" the transpose in the real case but not in the complex case.

Maybe unexpected but then unexpectedly nice? If you want a vector all the time just collect it?

@thisrod
Copy link
Author

thisrod commented Mar 20, 2020

What did you want to happen?

I want a way to deduce size(transpose(x)') from size(x), without having to know eltype(x). I think those sizes should always be the same, but I care less about what the answer is and more about it being consistent.

The context is that I tested some code for Array{Complex}, then it failed when I inadvertently gave it an Array{Real}. The transpose and the adjoint were many lines of code apart.

While writing this code, I'm starting to suspect that identifying adjoint vectors with row matrices is a mistake: dual vectors should be vectors, and if you want a vector to turn into a matrix you should have to explicitly cast or reshape it. E.g:

julia> x = randn(5);

julia> x == Array(x)
true

julia> x' == Array(x')
true

julia> x'*x
2.9138488824321147

julia> Array(x') * Array(x)
1-element Array{Float64,1}:
 2.9138488824321147

I would prefer:

julia> x'
5-element DualVector{Float64}:
...

julia> Array(x')
5-element Array{Float64,1}:
...

julia> x' isa AbstractArray
# not sure about that

julia> x' == Array(x')
false

julia> x'*x
2.9138488824321147

julia> Array(x') * Array(x)
ERROR: MethodError: no method matching *(::Array{Float64,1}, ::Array{Float64,1})
...

julia> Matrix(x')
1×5 Array{Float64,2}:
...

julia> Matrix(x')*Array(x)
1-element Array{Float64,1}:
...

But that's a side track for the issue.

@mcabbott
Copy link
Collaborator

I'm starting to suspect that identifying adjoint vectors with row matrices is a mistake

It certainly creates lots of edge cases... I wasn't around but most other options got discussed in #42

However I wonder whether there should be a lazy Conj wrapper, which does not change size, this would be the natural thing to return here.

@StefanKarpinski
Copy link
Member

Why not just return an eagerly conjugated vector?

@mcabbott
Copy link
Collaborator

Someone could be relying on transpose(x)[1,1] = 0 updating x, I guess. Not sure I've ever knowingly done that.

@mbauman
Copy link
Member

mbauman commented Mar 20, 2020

We did at one point have a lazy Conj wrapper in some of the initial sketch work... I don't recall all the considerations in deciding we didn't need it.

@StefanKarpinski
Copy link
Member

I don't think being able to reliably mutate the original array through these kinds of mathy APIs is necessary. If you're doing this kind of operations it really implicitly treats the data as immutable mathematical objects rather than mutable containers.

@thisrod
Copy link
Author

thisrod commented Mar 23, 2020

most other options got discussed in #42

Thanks for that link. As it happens, the code I'm working on was inspired by up and down tuples. (Basic idea: an (n,m)-tensor maps m-dimensional arrays to n-dimensional ones.) I'm going to post it as soon as I tidy up the README, and ask for pointers to whatever wheel I'm reinventing. That thread might have some pointers.

@StefanKarpinski
Copy link
Member

Redesigning the way dual vectors work is off the table until 2.0, but the fact that the shape of transpose(x)' depends on the element-type of x is not ideal, let's focus on that. I would propose that in the complex case it be modified to produce a conjugated vector (eagerly).

@mbauman
Copy link
Member

mbauman commented Mar 23, 2020

I feel like sometimes-aliasing behavior is worse than sometimes-column-sometimes-vector.

@StefanKarpinski
Copy link
Member

Then the alternative would be to re-introduce the lazy Conj wrapper type and return that.

@mbauman
Copy link
Member

mbauman commented Mar 23, 2020

@andyferris what would you think about bringing back the lazy Conj wrapper? Do you remember the iterations here?

@StefanKarpinski
Copy link
Member

The nice thing about adding a lazy Conj wrapper is that identity, adjoint, transpose and conj form a group (ℤ₂×ℤ₂); currently we're missing one element from the group.

@thisrod
Copy link
Author

thisrod commented Mar 24, 2020

To play devil's advocate, there is a general problem with adding things: the number of edge cases increases geometrically with the number of things you add.

If conj was lazy, someone would have to specify the shape of transpose(x)*conj.(y) and so on. As I illustrated above, there is no way to specify that consistently, because the current implementation is not consistent. There might not be a more consistent alternative to making transpose(x)' an eager conj.

This won't be a common issue. As far as I'm concerned, it's fine if you decide this isn't worth fixing, and I add some if eltype(x) <: Real ... elseif eltype(x) <: Complex type logic where needed.

@thisrod
Copy link
Author

thisrod commented Mar 24, 2020

Hi Andy, too :-) Numerical quantum atom optics is obviously a small world.

@andyferris
Copy link
Member

andyferris commented Mar 24, 2020

Yes, we shouldn't have the behaviour of container properties like size depend on the eltype; these must be orthogonal concerns.

As @mbauman alluded to, one of the early implementations of this had a Conj, which was precisely to make identity, adjoint, transpose and conj into a lazily evaluated group of structure ℤ₂×ℤ₂. I think its going to be easiest to reason about if the lazy/eager structure followed that group structure - while making transpose(x)' an eager conj(x) might fix the size issue it doesn't really make that much sense to me. (Though it might be a worthwhile bugfix for now).

Bringing back Conj would be a good option in my opinion. Note that care needs to be taken to nest the lazy wrappers in the correct order and so-on, which is challenging when they end up either inside or outside other data structures, and with more different wrapper types it seemed to make dispatch a little complicated. It also needs plumbing into all the specializations in LinAlg, BLAS, the sparse library, etc - it was basically a bunch of work which I didn't quite complete (it was taking months and everything was shifting underneath); meanwhile through superhuman effort Matt got us over the line for v1 with the current implementation.

As an implementation note - I used to consider one of the ℤ₂ to be transpose and the other conj and that was reflected in my implementation - but I think after "taking matrix transposes seriously" enough I was convinced (thanks Steven) that thinking of the dichotomy as adjoint × conj was more natural/productive and it would be interesting to see if the implementation is simplified by having just Adjoint and Conj and Conj{<:Adjoint}. I suspect having the rule of always bringing these to the outside, and it that order, might make things easier for packages like static arrays, sparse arrays, etc, to correctly follow the dispatch logic and not make the mistake in the OP about size.

I can't see any other option that let's us use all the BLAS operations of adjoint/transpose/conjugation without extra copies. We might consider reducing laziness to fix size issues but it feels like a regression. Would anyone consider adding Conj and some of the dispatch around that as a breaking change?

Hi Andy, too :-) Numerical quantum atom optics is obviously a small world.

Oh, so you're that Rod, @thisrod. 😆 Hi! Indeed it is.

@StefanKarpinski
Copy link
Member

Since Adjoint, Transpose, Conj and identity are all of the non-identity elements of the group I don't quite see what the ordering issue is—any composition of these can always be reduced to a single one, so there's no ordering issue. Having them all be first-class instead of parameterizing one on the other seems like it would be simple enough. What does Conj{<:Adjoint} buy?

@andyferris
Copy link
Member

andyferris commented Mar 25, 2020

My proposal was to make the group structure explicit - Conj is obviously cancels itself and is ℤ₂-like, as is Adjoint. Transpose is just "both" and could become a type alias rather than a struct. I.e. follow the principle of letting two simple things compose rather than worry about one complex thing.

Ordering them in a specific way is just one way of ensuring two Adjoints always cancel, for example, rather than getting Adjoint{<:Conj{<:Adjoint}} type things to worry about, and you wouldn't want to write special dispatch rules for both Conj{<:Adjoint{<:MyArrayType}} as well as Adjoint{<:Conj{<:MyArrayType}}. Dealing with ordering of lazy wrappers can become interesting with custom array types that may themselves contain arrays, for example, and it's nice to share a convention such as always pushing Adjoint towards the outside of such wrapper containers - which is important in avoiding the issue in the OP since we don't have a trait for "covector".

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 25, 2020

Maybe I'm not making my point well enough, let me try to be more explicit. In addition to every element being its own inverse, the group has these relations:

  • Adjoint ∘ Conj == Transpose
  • Adjoint ∘ Transpose == Conj
  • Conj ∘ Adjoint == Transpose
  • Conj ∘ Transpose == Adjoint
  • Transpose ∘ Adjoint == Conj
  • Transpose ∘ Conj == Adjoint

There are no compositions of elements that don't reduce to one of the other elements. In particular, this means that you never need to nest these structures unless there's some other intervening abstraction that prevents collapsing them. The elements identity, Adjoint, Conj and Transpose aren't just the generators of the group, they are the whole group: there are no composite elements. In code, you can define

adjoint(a::Conj) = transpose(a.parent)
adjoint(a::Transpose) = conj(a.parent)
adjoint(a::Adjoint) = a.parent

conj(a::Conj) = a.parent
conj(a::Transpose) = adjoint(a.parent)
conj(a::Adjoint) = transpose(a.parent)

transpose(a::Conj) = adjoint(a.parent)
transpose(a::Transpose) = a.parent
transpose(a::Adjoint) = conj(a.parent)

There should never (unless you explicitly construct one) be a need for an Adjoint{<:Conj} array—that should become just Transpose. You can always collapse down to a single lazy transform. Perhaps you're arguing that instead of having Transpose, we should have Adjoint{<:Conj}. If so, that may be where we're disagreeing/disconnecting: it seems to me simpler, clearer, more symmetrical, more elegant, and to have far fewer (rather than more) corner cases, if all of the non-identity group elements have the same representation as a single lazy operation, rather than one arbitrarily chosen element of the group being represented as a composite.

Dealing with ordering of lazy wrappers can become interesting with custom array types that may themselves contain arrays, for example, and it's nice to share a convention such as always pushing Adjoint towards the outside of such wrapper containers - which is important in avoiding the issue in the OP since we don't have a trait for "covector".

I don't get why ordering is a concern at all: the above shows that there's only ever a single lazy layer if we have all three lazy transforms. Yes, there's an ordering concern when there are intervening abstractions, but we can't see through that because they are, you know, abstract. Those array types themselves have to decide and define appropriate methods to move these lazy operations in or out. But they should always do so using the function forms, adjoint, conj and transpose, specifically so that the above definitions can kick in and reduce a stack of these lazy transforms down to a single layer.

@andyferris
Copy link
Member

Perhaps you're arguing that instead of having Transpose, we should have Adjoint{<:Conj}.

Yes, concretely that was my suggestion. (We can still export a Transpose type alias to make it less breaking).

If so, that may be where we're disagreeing/disconnecting: it seems to me simpler, clearer, more symmetrical, more elegant, and to have far fewer (rather than more) corner cases, if all of the non-identity group elements have the same representation as a single lazy operation, rather than one arbitrarily chosen element of the group being represented as a composite.

Yes this is the argument. :) After already attempting what you said during v0.7 development I was musing that perhaps decomposing the group of 4 into two seperate ℤ₂ wrappers would work out easier to implement and reason about, precisely with the major drawback that "one arbitrarily chosen element of the group being represented as a composite". Perhaps I am wrong but Julia is typically great at composing abstractions and felt it may be worth exploring.

I don't get why ordering is a concern at all: the above shows that there's only ever a single lazy layer if we have all three lazy transforms.

Yes I was specifically addressing the case we remove Transpose; otherwise there is no ordering to consider.

Yes, there's an ordering concern when there are intervening abstractions, but we can't see through that because they are, you know, abstract. Those array types themselves have to decide and define appropriate methods to move these lazy operations in or out.

Yes, this is totally true - though from experience it becomes difficult to cover all the corner cases. The wrapper author might know all about the AbstractArray interface from Base but never encountered (or cared about) LinearAlgebra - is there anything we can do to make this work automatically or without too much work by the package author? Alternatively, would removing Transpose mean that the author only has to special-case two things instead of three? Or would it be more complex?

@thisrod
Copy link
Author

thisrod commented Mar 26, 2020

While everyone is thinking about this, here's the package I'm working on. I feel like it's ready for other people to look at, and I'd welcome any feedback before I announce it on Discourse.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 26, 2020

Using composition is generally a winning approach when it offers a multiplicative effect: we get every combination of things for free instead of having to handle each possible combination. In this case, it gives us exactly one thing "for free". There's no win, it's just makes things more complex for no advantage. That's why I keep harping on about how identity, Adjoint, Conj and Transpose are the entire group—because that means there is no multiplicative effect. It's way simpler and easier to just give that one thing its own name, especially since transposes are already a concept that have a name and which people are super familiar with!

As to corner cases, let's consider the two approaches with respect to that:

  • If we express Transpose as Adjoint{<:Conj}, then we have to tell people: Ok, so you only have to handle Adjoint and Conj. Oh, but you have to be careful about ordering. Make sure that if you compose them that you do it in the right order! Which is the right order? Hmm... I don't remember, let me go look that up... Also, are we sure that it's Adjoint and Conj that are first class, or was it Transpose and Conj? Or maybe it was Adjoint and Transpose?...

  • If we have all three of Adjoint, Conj and Transpose then we just tell people: If you handle these three lazy wrappers correctly, then you're done, that covers everything.

Add to that the fact that it's very common that Transpose should be handled specially anyway, and I just don't see what benefit expressing one of these three elements as a composition buys us. It seems to just add complexity with no upside.

@thisrod
Copy link
Author

thisrod commented May 27, 2020

Has anyone considered this approach?

using LinearAlgebra: dot

struct RestridedView{T,N} <: AbstractArray{T,N}
    parent :: AbstractArray{T}
    dims :: Dims{N}
    strides :: Dims{N}
    offset :: Int
    conjugated :: Bool
end

Base.size(A::RestridedView) = A.dims

# Index linearly into the parent array
function Base.getindex(A::RestridedView, I...)
   x = A.parent[A.offset + dot(A.strides,I)]
   A.conjugated ? conj(x) : x
end

RestridedView(A::AbstractArray) = RestridedView(A, dso(size(A))..., false)

# dims, strides, offset
function dso(d)
    s = cumprod([1, d...])
    s = s[1:end-1]
    d, Tuple(s), 1-sum(s)
end

rv_conj(A::RestridedView) =
    RestridedView(A.parent, A.dims, A.strides, A.offset, !A.conjugated)
rv_conj(A) = rv_conj(RestridedView(A))

rv_permutedims(A::RestridedView, P) =
    RestridedView(A.parent, A.dims[P], A.strides[P], A.offset, A.conjugated)
rv_permutedims(A, P) = rv_permutedims(RestridedView(A), P)

# This only works for vectors, but it can be extended to the general case
getindex(A::RestridedView, I::StepRange) =
    RestridedView(A.parent, (length(I),), (I.step,), I.start-1, A.conjugated)

rv_view(A, I) = RestridedView(A)[I]

rv_reshape(A::AbstractArray, dims) = RestridedView(A, dso(dims)..., false)

I'd have to think more about reshape. It can be absorbed by RestridedView in the special case that strides[j+1] = strides[j]*(dims[j]+1), and no doubt in some other special cases.

I'm unsure how best to implement dual vectors.

@KristofferC KristofferC transferred this issue from JuliaLang/julia Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants