Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for low-rank constraint on symmetric matrices #2197

Open
blegat opened this issue Jun 8, 2023 · 0 comments · May be fixed by #2198
Open

Support for low-rank constraint on symmetric matrices #2197

blegat opened this issue Jun 8, 2023 · 0 comments · May be fixed by #2198
Labels
Type: Set Request Request for a new set
Milestone

Comments

@blegat
Copy link
Member

blegat commented Jun 8, 2023

Semidefinite programs are challenging to solve and so it's crucial to try to exploit as much structure as possible to solve them.
This makes it challenging when writing a generic SDP solver since it seems tailored implementation would always be faster.
That's where the extensibility of MOI/JuMP can actually be really helpful.
One particular structure is that the matrices A_i in the SDPA format are actually low-rank.
This is the case for instance in the MAX-CUT problem where they have rank 1 and in Sum-of-Squares (in particular with specific bases such as the interpolation basis).
Some solvers already support already supports this:

If we do have it part of the interface, the support might also appear in other solvers as well (@mariohsouto and @joaquimg told me it would be easy to add in ProxSDP.jl for instance).

This is actually supported by YALMIP as detailed in https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf. YALMIP even does not compute matrix products and keep then in symbolic form so that the user does not even have to specify the structure. But let's assume that the user give this explicitly for now.

I guess two important questions to see if it's worth it are:

  1. Is that improving the performance a lot ?
  2. Is that easy to add support for this ?

I'll try to convince you that it's "Yes and Yes".

About 1., it is improving performance quite a lot and it's been reported in numerous places. For instance, https://www.diva-portal.org/smash/get/diva2:471330/FULLTEXT01.pdf says:

To be fair, it should be pointed out that the theoretical idea exploited in this paper was incorporated already in LMI Lab (Gahinet et al. 1995), which was one of the first public implementations of an SDP solver, tailored specifically for control problems. However, many years of active research in the SDP field has led to much more efficient algorithms, and it is our goal to leverage on this. A major reason for pursuing the idea in this paper is that it is slightly embarrassing for people coming from the SDP field, that the now over 15 year old LMI Lab solver actually outperforms state-of-the-art general purpose SDP solvers on some small- to medium-scale control problems.

About 2., tl;dr; I detail below different natural ideas and the pros and cons, a shorter version is that #2198 is the one I think we should go for

We have two options, defining a new set or defining a new function. We want to avoid defining new functions as much as possible so let's first try by defining a new set.
A first possibility is to merge the PSD set and the low rank constraints together.
So we could have the constraint that sum_i <A_i, X_i> = b_i where <., .> is the Frobenius inner product of symmetric matrices and A_i are in LDLT form.

struct LDLT{T}
    L::Matrix{T}
    diagonal::Vector{T}
end
struct LowRankFrobeniusEqualTo{T}
    matrices::Vector{LDLT{T}}
    constant::T
end

And then you would do

X = Vector{MOI.VariableIndex}[]
for ...
    Xi, cXi = MOI.add_constrained_variables(model, MOI.PositiveSemidefiniteConeTriangle(...))
    push!(X, Xi)
end
MOI.add_constraint(model, MOI.VectorOfVariables(reduce(vcat, X)), MOI.LowRankFrobeniusEqualTo(A, 1.0))

And then of course, there would be a bridge from F-in-LowRankFrobeniusEqualTo into F-in-EqualTo.

There are a few things to think about here.

What if the user gives an invalid vector of variables ?

The vector of variables need to be a concatenation of PSD matrices of the right size for solvers to work.
If a user gives a vector of variables that isn't like that, we should bridge it into EqualTo so that the solver does work but there is no way to see that from the constraint type so we won't be able to see that when bridging and we will pass the constraint as is to the solver.
I think here we should simply not complicate things for a use case that does not exists. The users is not allowed to use non-SDP variables in this set similarly to the Indicator case where the first variable has to be binary. If he does not do that, he should expect an error.

What about variable bridges ?

If a variable of the VectorOfVariables get variable-bridged then it will turn into a VectorAffineFunction which won't be supported anymore by the solver so it will then turn into a ScalarAffineFunction-in-EqualTo.
This is actually a good thing because it means the vector of variables wasn't a concatenation of PSD variables so no worry to have on this side.

What if we dualize ?

We of course want this to work with Dualization so what's the dual of this set ? Just like EqualTo, this is not a cone so it does not really have a dual. However, we can make it work by implementing Dualization._dual_set so that one variable is added to the dual and then in MOI.Utilities.dot_coefficients, we can multiply this variables by the matrices A_i, expressed in symmetric-vectorized for (without the low-rank structure) so that it would be added in the VectorAffineFunction-in-PSDCone along with the dual of the other constraints of each PSD variable. The sad part of this story is that we lost the low-rank structure.

In the dual, to keep the low-rank structure, we would need something like (assuming the PSD matrix only appears on LowRankFrobeniusEqualTo constraints and not in any EqualTo constraint)

In the dual, to keep the low-rank structure, we would need something like

sum y_i A_i is PSD

where the A_i are low-rank.
The sum y_i A_i part would be represented by a VectorAffineFunction but how do we represent the sum z_i B_i ?
Maybe with a set

struct SumLowRankPSD
    side_dimension::Int
    matrices::Vector{LDLT{T}}
end

Wait, isn't that a convex cone ? So why don't we use the dual of that in standard conic form so that it's easy to dualize ?

Ok, so according to [Rockafellar "Convex Analysis", Corollary 16.3.2], dual(adjoint(M), S) = preimage(M, dual(S)).
So here we have M(y) = sum z_i A_i and <M(y), Q> = sum y_i <A_i, Q> + <x, Q> = <y, (<A_1, Q>, ..., <A_m, Q>)) so the dual set is

struct LowRankFrobeniusOfPSD
    side_dimension::Int
    matrices::Vector{LDLT{T}}
end

which is defined as the set of vectors of the form (<matrices[1], Q>, ..., <matrices[m], Q>) where Q is PSD.

Here, that's preventing Q to be used as scalar product with something else than low-rank equality constraints because Q is not part of the set itself!
If we want to allow Q to be used in classical ScalarAffineFunction-in-Equalto then we need to consider that the dual will become

sum y_i A_i + sum z_i B_i is PSD

where the A_i are low-rank and the B_i are not low-rank.
The sum z_i B_i part would be represented by a VectorAffineFunction but how do we represent the sum z_i B_i` ?
Maybe with a set

struct PlusSumLowRankPSD
    side_dimension::Int
    matrices::Vector{LDLT{T}}
end

Where the first length(matrices) dimensions would be y and the last dimensions would be sum z_i B_i.

Wait, so the first dimensions should be VectorOfVariables and the last ones should be a VectorAffineFunction, isn't that an issue ? No, we are already doing that for IndicatorSet.

Again, according to [Rockafellar "Convex Analysis", Corollary 16.3.2], dual(adjoint(M), S) = preimage(M, dual(S)).
So here we have M(y, x) = x + sum y_i A_i and <M(y, x), Q> = sum y_i <A_i, Q> + <x, Q> = <(y, x), ((<A_i, Q>)_i, Q) so the set is

struct LowRankFrobeniusAndPSD
    side_dimension::Int
    matrices::Vector{LDLT{T}}
end

which is defined as the set of vectors of the form (<matrices[1], Q>, ..., <matrices[m], Q>, Q) where Q is PSD. Note that this time we include Q at the end.

So I think there are two options here.

  • The first one is the pair of dual cones PlusSumLowRankPSD and LowRankFrobeniusAndPSD. It's good because it also allows Q to be used outside but if a solver only supports only matrices with pure low-rank or matrices with no low-rank then it won't know how to communicate it with supports_constraint.
  • The second one LowRankFrobeniusOfPSD and SumLowRankPSD is simpler and allows solvers that only support that to communicate it but we should check which solver is in this case and also that does not really allow Q to be used in the objective. So even the MAX-CUT problem which has n rank-1 constraints on one nxn PSD matrix and a linear objective wouldn't fall into that category because of the objective. The dual version of this is that the symmetric matrix of the objective needs to be in the second part of the constants field of the VectorAffineFunction in the PlusSumLowRankPSD constraint.

So I would probably start experimenting with the first option and see what happens.

@blegat blegat linked a pull request Jun 8, 2023 that will close this issue
20 tasks
@odow odow added this to the v1.x milestone Oct 24, 2023
@odow odow added the Type: Set Request Request for a new set label Oct 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Set Request Request for a new set
Development

Successfully merging a pull request may close this issue.

2 participants