Skip to content

Commit

Permalink
Use scipy's decompositions in PCA. (#1080)
Browse files Browse the repository at this point in the history
* Replace numpy with scipy decomps.

* add changelog entry

---------

Co-authored-by: Sebastian Raschka <[email protected]>
  • Loading branch information
fkdosilovic and rasbt authored Mar 31, 2024
1 parent 8592cc7 commit 63a2655
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 4 deletions.
3 changes: 2 additions & 1 deletion docs/sources/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ The CHANGELOG for the current development version is available at

##### Changes

- Add `n_classes_` attribute to stacking classifiers for compatibility with scikit-learn 1.3 ([#1091](https://github.com/rasbt/mlxtend/issues/1091)
- Add `n_classes_` attribute to stacking classifiers for compatibility with scikit-learn 1.3 ([#1091](https://github.com/rasbt/mlxtend/issues/1091))
- Use Scipy's instead of NumPy's decompositions in PCA for improved accuracy in edge cases ([#1080](https://github.com/rasbt/mlxtend/issues/1080) via [[fkdosilovic](https://github.com/rasbt/mlxtend/issues?q=is%3Apr+is%3Aopen+author%3Afkdosilovic)])



Expand Down
7 changes: 4 additions & 3 deletions mlxtend/feature_extraction/principal_component_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
# License: BSD 3 clause

import numpy as np
import scipy as sp

from .._base import _BaseModel

Expand Down Expand Up @@ -152,14 +153,14 @@ def _covariance_matrix(self, X):

def _decomposition(self, mat, n_samples):
if self.solver == "eigen":
e_vals, e_vecs = np.linalg.eig(mat)
e_vals, e_vecs = sp.linalg.eig(mat)
elif self.solver == "svd":
# Only SVD on mean centered data is equivalent to
# PCA via covariance matrix (note that computing
# the covariance matrix will implicitely center
# the covariance matrix will implicitly center
# the data)
mat_centered = mat - mat.mean(axis=0)
u, s, v = np.linalg.svd(mat_centered.T)
u, s, _ = sp.linalg.svd(mat_centered.T, lapack_driver="gesvd")
e_vecs, e_vals = u, s
e_vals = e_vals**2 / (n_samples - 1)
if e_vals.shape[0] < e_vecs.shape[1]:
Expand Down

0 comments on commit 63a2655

Please sign in to comment.