Skip to content

A Fast Canonical-Correlation-Based Feature Selection Algorithm

License

Notifications You must be signed in to change notification settings

scikit-learn-contrib/fastcan

Repository files navigation

FastCan: A Fast Canonical-Correlation-Based Feature Selection Algorithm

conda Codecov CI Doc PythonVersion PyPi Black ruff pixi

FastCan is a feature selection method, which has following advantages:

  1. Extremely fast.
  2. Support unsupervised feature selection.
  3. Support multioutput feature selection.
  4. Skip redundant features.
  5. Evaluate relative usefulness of features.

Check Home Page for more information.

Installation

Install FastCan via PyPi:

  • Run pip install fastcan

Or via conda-forge:

  • Run conda install -c conda-forge fastcan

Getting Started

>>> from fastcan import FastCan
>>> X = [[ 0.87, -1.34,  0.31 ],
...     [-2.79, -0.02, -0.85 ],
...     [-1.34, -0.48, -2.55 ],
...     [ 1.92,  1.48,  0.65 ]]
>>> y = [[0, 0], [1, 1], [0, 0], [1, 0]] # Multioutput feature selection
>>> selector = FastCan(n_features_to_select=2, verbose=0).fit(X, y)
>>> selector.get_support()
array([ True,  True, False])
>>> selector.get_support(indices=True) # Sorted indices
array([0, 1])
>>> selector.indices_ # Indices in selection order
array([1, 0], dtype=int32)
>>> selector.scores_ # Scores for selected features in selection order
array([0.91162413, 0.71089547])
>>> # Here Feature 2 must be included
>>> selector = FastCan(n_features_to_select=2, indices_include=[2], verbose=0).fit(X, y)
>>> # We can find the feature which is useful when working with Feature 2
>>> selector.indices_
array([2, 0], dtype=int32)
>>> selector.scores_
array([0.34617598, 0.95815008])

Citation

FastCan is a Python implementation of the following papers.

If you use the h-correlation method in your work please cite the following reference:

@article{ZHANG2022108419,
   title = {Orthogonal least squares based fast feature selection for linear classification},
   journal = {Pattern Recognition},
   volume = {123},
   pages = {108419},
   year = {2022},
   issn = {0031-3203},
   doi = {https://doi.org/10.1016/j.patcog.2021.108419},
   url = {https://www.sciencedirect.com/science/article/pii/S0031320321005951},
   author = {Sikai Zhang and Zi-Qiang Lang},
   keywords = {Feature selection, Orthogonal least squares, Canonical correlation analysis, Linear discriminant analysis, Multi-label, Multivariate time series, Feature interaction},
   }

If you use the eta-cosine method in your work please cite the following reference:

@article{ZHANG2025111895,
   title = {Canonical-correlation-based fast feature selection for structural health monitoring},
   journal = {Mechanical Systems and Signal Processing},
   volume = {223},
   pages = {111895},
   year = {2025},
   issn = {0888-3270},
   doi = {https://doi.org/10.1016/j.ymssp.2024.111895},
   url = {https://www.sciencedirect.com/science/article/pii/S0888327024007933},
   author = {Sikai Zhang and Tingna Wang and Keith Worden and Limin Sun and Elizabeth J. Cross},
   keywords = {Multivariate feature selection, Filter method, Canonical correlation analysis, Feature interaction, Feature redundancy, Structural health monitoring},
   }