Name		Name	Last commit message	Last commit date
parent directory ..
code		code
doc		doc
tex		tex
README.md		README.md
README.tex.md		README.tex.md

README.md

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is an unsupervised dimensionality reduction technique. The goal of PCA is to project the dataset onto a lower-dimensional space while preserving as much of the variance of the dataset as possible.

PCA can be performed in 6 steps:

Subtract the mean of each variable
Calculate the Covariance Matrix
Compute the Eigenvalues and Eigenvectors
Sort Eigenvectors by corresponding Eigenvalues in descending order and select a subset from the rearranged Eigenvalue matrix
Recast data along the principal components

1. Subtract the mean of each variable

First, subtract the mean of each variable from the dataset so that the dataset is centered around the origin. By default, most implementations of PCA, including Scikit-Learn, only center the data but don't scale it to be between 0-1.

2. Calculate the Covariance Matrix

After centering the dataset, the covariance matrix is calculated. The covariance matrix is a symmetric square matrix giving the covariance between each pair of elements. Since the covariance of a variable with itself is its variance, the main diagonal (top left to bottom right) contains the variances of each initial variable.

The covariance tells us about the correlations between the variables. If the sign of the covariance is positive, then the two variables increase and decrease together. If, on the other hand, the sign of the covariance is negative, then one variable increases as the other decreases.

3. Compute the Eigenvalues and Eigenvectors

Next, compute the Eigenvalues and eigenvectors. The eigenvectors of the covariance matrix are the directions of the axes where there is the most variance (most information). Eigenvalues are simply the coefficients attached to eigenvectors, which give the amount of variance carried in each Eigenvector / Principal Component.

4. Sort Eigenvectors by corresponding Eigenvalues in descending order and select a subset from the rearranged Eigenvalue matrix

By ranking the eigenvectors in order of their eigenvalues, highest to lowest, you get the principal components in order of significance and then choose the top eigenvectors, where is the number of dimensions we want to keep.

5. Recast data along the principal components

After selecting the eigenvectors, we can use the resulting -dimensional eigenvector matrix to transform data onto the new subspace via the following equation:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

principal_component_analysis

principal_component_analysis

README.md

Principal Component Analysis (PCA)

1. Subtract the mean of each variable

2. Calculate the Covariance Matrix

3. Compute the Eigenvalues and Eigenvectors

4. Sort Eigenvectors by corresponding Eigenvalues in descending order and select a subset from the rearranged Eigenvalue matrix

5. Recast data along the principal components

Resources

Files

principal_component_analysis

Directory actions

More options

Directory actions

More options

Latest commit

History

principal_component_analysis

Folders and files

parent directory

README.md

Principal Component Analysis (PCA)

1. Subtract the mean of each variable

2. Calculate the Covariance Matrix

3. Compute the Eigenvalues and Eigenvectors

4. Sort Eigenvectors by corresponding Eigenvalues in descending order and select a subset from the rearranged Eigenvalue matrix

5. Recast data along the principal components

Resources