Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QR algorithm #472

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

Aweptimum
Copy link
Contributor

In investigating the problems in #468, I decided to try implementing the qr algorithm to have an alternative way of getting the eigenvectors.

This is still a work in progress, but it adds:

  1. NumericMatrix::upperHessenberg - converts a matrix to Upper Hessenberg form using householder reflections. I only have one test case, but it implicitly passes every Eigenvalue::qrAlgorithm test case so that's neat.
  2. Eigenvalue::qrAlgorithm - finds the eigenvalues of a matrix using the QR Algorithm, which follows the same principle as the power method. It relies on upperHessenberg. I implemented it using the Wilkinson Shift strategy and deflation, so when the lowest diagonal element converges it recurses using the submatrix that doesn't include that diagonal. It's pretty fast, none of the test cases take more than ~100 iterations to converge. All tests pass.
  3. Eigenvector::qrAlgorithm - this just takes the output of the above method and feeds it to Eigenvector::eigenvectors. The problem is that it fails in cases where the eigenvalues are correct, so I'm pretty sure something is wrong in the eigenvectors method. There is a way to directly solve for the eigenvectors using the QR algorithm, but I haven't fully understood it yet.
  4. Adds failing test cases from Add Moore-Penrose Inverse #468

I'm not sure how you'd want to expose Eigenvector::qrAlgorithm in the public API, so I haven't worked on that. Might need a refactor similar to Eigenvector

Comment on lines +1511 to +1523
yield [
'A' => [
[3, 0, 0, 0],
[0, 1, 0, 1],
[0, 0, 2, 0],
[0, 1, 0, 1],
],
'H' => [
[3, 0, 0, 0],
[0, 1, 1, 0],
[0, 1, 1, 0],
[0, 0, 0, 2],
]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about Hessenberg matrices: Is there only one upper Hessenberg matrix for any matrix, or are there multiple possible Hessenberg matrices that satisfy the critiera?

I tried this test case in R and SciPy and it got a slightly different answer.

R

> A <- rbind(c(3, 0, 0, 0), c(0, 1, 0, 1), c(0, 0, 2, 0), c(0, 1, 0, 1))
> hb <- hessenberg(A)
> hb
$H
     [,1]          [,2]          [,3]          [,4]
[1,]    3  0.000000e+00  0.000000e+00  0.000000e+00
[2,]    0  1.000000e+00 -1.000000e+00  2.220446e-16
[3,]    0 -1.000000e+00  1.000000e+00 -5.551115e-16
[4,]    0  2.220446e-16 -4.440892e-16  2.000000e+00

$P
     [,1] [,2]          [,3]          [,4]
[1,]    1    0  0.000000e+00  0.000000e+00
[2,]    0    1  0.000000e+00  0.000000e+00
[3,]    0    0  2.220446e-16 -1.000000e+00
[4,]    0    0 -1.000000e+00  2.220446e-16

Python

In [1]: import numpy as np
In [2]: from scipy.linalg import hessenberg

In [10]: A = np.array([[3, 0, 0, 0], [0, 1, 0, 1], [0, 0, 2, 0], [0, 1, 0, 1]])

In [11]: A
Out[11]:
array([[3, 0, 0, 0],
       [0, 1, 0, 1],
       [0, 0, 2, 0],
       [0, 1, 0, 1]])

In [12]: H = hessenberg(A)

In [13]: H
Out[13]:
array([[ 3.,  0.,  0.,  0.],
       [ 0.,  1., -1.,  0.],
       [ 0., -1.,  1.,  0.],
       [ 0.,  0.,  0.,  2.]])

This online calculator however gets the same answer as you have here.

I'd like to help with generating test cases, but I don't know enough about it so I want to make sure I understand how it works before offering help.

Thanks,
Mark

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the values are unique, the signs are not. In addition to the upper hessenberg form, the algorithm also computes a permutation matrix, P, such that A = PHPT. Wolfram returns the hessenberg form with the added context of the permutation matrix, and multiplying it accordingly gives the original matrix.

In the NumericMatrix::upperHessenberg method, the P matrix calculated is exactly the same as R, Python, and Wolfram, it just has different signs:

[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 0, 1]
[0, 0, 1, 0]

Multiplying A = PHPT gives the original matrix as well.

So it might be better to return both matrices for context rather than just the upper hessenberg matrix

Also, I had originally tried the other method of getting the matrix in Upper-Hessenberg form using Givens rotations. It seems more widely used than the householder method, and I think that's because it allows for parallelization + works better on sparse matrices. This particular matrix needed (I think) a planar rotation of pi/2 to zero out the 1 at (4, 1), so the givens rotation block was just:

[cos(π/2), -sin(π/2)]    =    [0, -1]
[sin(π/2),  cos(π/2)]         [1,  0]

All that to say, I wouldn't be surprised if the Givens method introduced negative signs to both P and H

@Aweptimum
Copy link
Contributor Author

Aweptimum commented Nov 3, 2023

So the Eigenvector::qrAlgorithm test failures aren't actually that bad - The first two failures return the same eigenvectors, but the columns are swapped around. The second two failures have repeated eigenvectors, but they are cases where the eigenvalues are repeated. I think this because I'm feeding the eigenvalues directly into the eigenvectors method. It looks like the method tries to deal with dupes, but I'm not sure if I need to change the eigenvalue input to match its expectations.

Given:
[2, 0, 1]
[2, 1, 2]
[3, 0, 4]

Eigenvalues: -3, 1, 1

Expected eigenvectors:
[0.26726124191242, 0, 0.70710678118655]
[0.53452248382485, 1, 0]
[0.80178372573727, 0, -0.70710678118655]

Actual:
[0.26726124191242, 0, 0]
[0.53452248382485, 1, 1]
[0.80178372573727, 0, 0]

@Beakerboy what sort of input does the eigenvectors method expect for the $eigenvalues variable? I thought I'd try only giving it unique eigenvalues but it complains the length of the array doesn't match the length of the diagonal.

@Beakerboy
Copy link
Contributor

Beakerboy commented Nov 6, 2023

@Aweptimum According to the source, it's an array of floats and is the same size as the matrix. Duplicated should be listed multiple times. When you say "it complains" what is the error that is reported?

@Aweptimum
Copy link
Contributor Author

I just called array_unique on the eigenvalues returned from the QR Algorithm and passed them in. The error is at line 72 in Eigenvectors: MathPHP\Exception\MatrixException: Matrices have different number of rows
The identity matrix is constructed using the length of $eigenvalues array, so I figured it might be intended that the duplicates should be passed in

@Beakerboy
Copy link
Contributor

Yeah, that's what I said, I think the duplicates should be passed in, so don't call array_unique.

@Aweptimum
Copy link
Contributor Author

Aweptimum commented Nov 6, 2023

Yeah, I meant to communicate it was my assumption and I'm glad to have your affirmation. I'm having a hard time writing today.

I guess there's some weird edge case in the eigenvectors method when the supplied eigenvalues contain repeats. I put the failing eigenvector cases (#4 and #5 in dataProviderForEigenvector) in the eigenvalue test and they pass. The baffling part is the identity matrix case, where the eigenvalues are all 1's, returns the correct eigenvectors.

@Beakerboy
Copy link
Contributor

From scanning the code, it looks like the algorithm is supposed to return two orthogonal vectors when it sees the first “1” in the eigenvalue list. Then when it is generating the return matrix, it will sort them so that one of them is assigned to the position of the first “1” and the other the position of the second “1”. It’s been years since I wrote this so no idea what might be going on.

@Aweptimum
Copy link
Contributor Author

Aweptimum commented Nov 6, 2023

It's ok, I found the culprit! It's this line here
The floating point comparison fails in every case (except for the identity matrix one), so $key is always false when it isn't meant to be. I rewrote it to this:

$found = \array_column($solution_array, 'eigenvalue');
$filtered = \array_filter($found, fn($v) => Arithmetic::almostEqual($v, $eigenvalue, $A->getError()));
$key = array_key_first($filtered);
if ($key === null) {
    // ... solve
}

And it returns the correct solutions for the repeats. I think I'll submit this patch as a separate PR so mark can just merge it, although array_key_first was introduced in 7.3 so I'll have to use something else

array_search seems to fail in most cases when looking for a float in an array of floats. And even if it does find a match, if the key is 0, php evaluates `!0` to true.

To find a match, we can instead loop through and compare the numbers with `Arithmetic::almostEqual` and then explicitly check if `$key === false`
Converts a matrix to hessenberg form using householder reflections
The Wilkinson shift accelerates the rate of convergence for the QR algorithm - it's in the static calcShift helper

Additionally, moved the iteration to an inner helper function to hide the $values parameter from the public method
Just gets the eigenvalues from the qr algorithm and feeds them to the eigenvectors method
They work, so the problem must be in the eigenvectors method
@Aweptimum
Copy link
Contributor Author

I have found another problem, and that is the original matrix that produces a pseudo-inverse failure also produces an rref failure

If you take this guy and pass it to the eigenvectors method with its eigenvalues:

[2, 0, 0, 0, 0, 0]
[0, 2, 0, 0, 0, 1729.7]
[0, 0, 2, 0, -1729.7, 0]
[0, 0, 0, 0, 0, 0]
[0, 0, -1729.7, 0, 2828785.69, 0]
[0, 1729.7, 0, 0, 0, 2828785.69]

The first step the eigenvector method does when it loops over the given eigenvalues is to subtract the largest eigenvalue from the matrix (2828791.05765 in the first loop) and then get the rref. The rref returned by the matrix method on line 74 is this:

[1, 0, 0, 0, 0, 0]
[0, 1, 0, 0, 0, -0.00061146305530409]
[0, 0, 1, 0, 0.00061146305530409, 0]
[0, 0, 0, 1, 0, 0]
[0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0]

Whereas wolfram and this random rref calculator I found produce an identity matrix (the latter is using the scaled-down matrix in the eigenvectors method)

I'll add the matrix as a test case to the ReducedRowEchelonFormTest and submit a separate PR with it so anyone can look at what's going on

@Aweptimum Aweptimum mentioned this pull request Nov 8, 2023
@Aweptimum
Copy link
Contributor Author

Aweptimum commented Feb 23, 2024

Ok, been a bit, but I found the real problem and I'm not sure what to do about it

In the above test case, when it gets to the 3rd eigenvalue, the resulting RREF is an identity matrix.
This causes the algorithm to die.

The matrix copied from above:

[2, 0, 0, 0, 0, 0]
[0, 2, 0, 0, 0, 1729.7]
[0, 0, 2, 0, -1729.7, 0]
[0, 0, 0, 0, 0, 0]
[0, 0, -1729.7, 0, 2828785.69, 0]
[0, 1729.7, 0, 0, 0, 2828785.69]

The eigenvalues: 2828786.747649, 2828786.747649, 2, 0.94235235527, 0.94235235527, 0

The offending eigenvalue, 2, produces this $T (A-λI) matrix (after everything is scaled down by the largest matrix element):

[3.7388808264281E-7, 0, 0, 0, 0, 0]
[0, 3.7388808264281E-7, 0, 0, 0, 0.00061146378324616]
[0, 0, 3.7388808264281E-7, 0, -0.00061146378324616, 0]
[0, 0, 0, -3.3312906859285E-7, 0, 0]
[0, 0, -0.00061146378324616, 0, 0.99999966687093, 0]
[0, 0.00061146378324616, 0, 0, 0, 0.99999966687093]

Which I plugged into wolfram and asked for the rref. It also gives an identity.

So nothing is wrong with the RREF implementation or eigenvalues implementation. However, wolfram can solve for the eigenvectors of the original matrix, so that must mean that the eigenvector procedure is missing something - there is either a way to recover from the RREF not pointing to a solution or an alternative method of getting the eigenvectors.

Edit: @Beakerboy idk if you would know anything of that, but any thoughts? Would you know a resource to look at?

@markrogoyski
Copy link
Owner

Hi @Aweptimum,

Thanks for your continued work on this.

I'm presuming the eigenvalue calculation is correct. If that is the case, and we think the eigenvector calculation is the issue, are you able to reproduce the issue with a simpler matrix, or another matrix, which might help identify what characteristics are causing the issue?

Alternatively, is there a more robust method of calculating eigenvectors that we could be using?

@Aweptimum
Copy link
Contributor Author

The eigenvalue calculation is correct, yes, I have the correct ones verified by numpy + wolfram + a professor who has matlab
I'm not sure how to reproduce it with a simpler matrix :/ but I can try some ideas

I'm not sure if there's a more "robust" way because the eigenvector algorithm is just solving (A-λI) = 0 directly
That said, the professor I tried getting help from just pointed me at the matlab docs for eig, which uses the QZ Algorithm in the general case.

The QZ Algorithm is analogous to the QR algorithm, but it uses the results of the QR algorithm to form what's called the Schur Decomposition, then solves the system using the decomposition rather than the original matrix, and then transforms the eigenvectors back to match the original matrix.

The original paper is behind paywalls and the only implementations I can find are in C or Lapack (and I can read neither). However, there is a nasa paper someone wrote on extending the algorithm that might be plain enough to work from: https://ntrs.nasa.gov/citations/19730018792

@Beakerboy
Copy link
Contributor

I thought I’d at least chime in to say that I have no idea how to help, but a quick scan of the nasa paper looks like there’s enough direction that a well motivated individual could just follow the provided steps and implement it. I am not a mathematician, most of my contributions here were just implementing algorithms from Wikipedia, textbooks, or other software packages in PHP, just like this case.

@Aweptimum
Copy link
Contributor Author

Sorry it's been a while, but my previous job laid me off a few months ago. I was previously allowed to contribute to math-php on the job, but my current job no longer touches PHP (and probably never will).

I can no longer work on my PR's, basically :(

@markrogoyski
Copy link
Owner

@Aweptimum,

Thank you for your work up till now. We appreciate it. I'll continue with what you've started.

Thanks again.
Mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants