Skip to content

Commit

Permalink
many fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
rasbt committed Apr 24, 2014
1 parent 8af6b76 commit 89cee55
Show file tree
Hide file tree
Showing 11 changed files with 1,571 additions and 266 deletions.
330 changes: 286 additions & 44 deletions .ipynb_checkpoints/h4-checkpoint.ipynb

Large diffs are not rendered by default.

102 changes: 82 additions & 20 deletions .ipynb_checkpoints/max_likelihood_est_distributions-checkpoint.ipynb

Large diffs are not rendered by default.

344 changes: 292 additions & 52 deletions .ipynb_checkpoints/parzen_window_technique-checkpoint.ipynb

Large diffs are not rendered by default.

141 changes: 124 additions & 17 deletions .ipynb_checkpoints/principal_component_analysis-checkpoint.ipynb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"metadata": {
"name": "",
"signature": "sha256:201fdb96e2d2e70b01d2feec96ab182ba81534bd7c95df312f999ee3cf580bf2"
"signature": "sha256:87c7ae7d39c07a3a2123b7d3c47b83e683cd0894b1d5e5190c7c65cbff25d616"
},
"nbformat": 3,
"nbformat_minor": 0,
Expand All @@ -12,10 +12,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Sebastian Raschka \n",
"last updated: 04/17/2014\n",
"Sebastian Raschka \n",
"last updated: 04/17/2014 \n",
"\n",
"- Link to the containing GitHub Repository: [https://github.com/rasbt/pattern_classification](https://github.com/rasbt/pattern_classification)\n",
"- Link to this IPython Notebook on GitHub: [principal_component_analysis.ipynb](https://github.com/rasbt/pattern_classification/blob/master/dimensionality_reduction/projection/principal_component_analysis.ipynb\")"
"- Link to this IPython Notebook on GitHub: [principal_component_analysis.ipynb](\"https://github.com/rasbt/pattern_classification/blob/master/dimensionality_reduction/projection/principal_component_analysis.ipynb\") "
]
},
{
Expand Down Expand Up @@ -48,8 +49,8 @@
"- <a href=\"#sample_data\">Generating 3-dimensional sample data</a>\n",
"- <a href=\"#gen_data\">The step by step approach</a>\n",
" - 1.&nbsp;<a href=\"#drop_labels\">Taking the whole dataset ignoring the class labels</a> \n",
" - 2.&nbsp;<a href=\"#mean_vec\"> Compute the $d$-dimensional mean vector</a>\n",
" - 3.&nbsp;<a href=\"#sc_matrix\">Computing the scatter matrix (alternatively, the covariance matrix)</a>\n",
" - 2.&nbsp;<a href=\"#mean_vec\">Compute the $d$-dimensional mean vector</a>\n",
" - 3.&nbsp;<a href=\"#comp_scatter\">Computing the scatter matrix (alternatively, the covariance matrix)</a>\n",
" - 4.&nbsp;<a href=\"#eig_vec\">Computing eigenvectors and corresponding eigenvalues</a>\n",
" - 5.&nbsp;<a href=\"#sort_eig\">Ranking and choosing $k$ eigenvectors</a>\n",
" - 6.&nbsp;<a href=\"#transform\">Transforming the samples onto the new subspace</a>\n",
Expand All @@ -62,7 +63,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"introduction\"></a>\n",
"<br>\n",
"<a name=\"introduction\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# Introduction"
]
},
Expand Down Expand Up @@ -105,7 +114,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"sample_data\"></a>\n",
"<br>\n",
"<br>\n",
"<a name=\"sample_data\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# Generating some 3-dimensional sample data"
]
},
Expand Down Expand Up @@ -221,8 +239,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='drop_labels'></a>\n",
"\n",
"<br>\n",
"<br>\n",
"<a name='drop_labels'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#1. Taking the whole dataset ignoring the class labels\n",
"\n",
"Because we don't need class labels for the PCA analysis, let us merge the samples for our 2 classes into one $3\\times40$-dimensional array."
Expand All @@ -244,7 +269,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='mean_vec'></a>\n",
"<br>\n",
"<br>\n",
"<a name='mean_vec'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#2. Computing the d-dimensional mean vector"
]
},
Expand Down Expand Up @@ -280,7 +313,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"mean_vec\"></a>\n",
"<br>\n",
"<br>\n",
"<a name=\"comp_scatter\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# 3. a) Computing the Scatter Matrix"
]
},
Expand Down Expand Up @@ -319,6 +361,15 @@
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"<br>\n",
"<a name=\"comp_cov\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -367,7 +418,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"eig_vec\"></a>\n",
"<br>\n",
"<br>\n",
"<a name=\"eig_vec\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#4. Computing eigenvectors and corresponding eigenvalues\n",
"\n",
"To show that the eigenvectors are indeed identical whether we derived them from the scatter or the covariance matrix, let us put an `assert` statement into the code. Also, we will see that the eigenvalues were indeed scaled by the factor 39 when we derived it from the scatter matrix."
Expand Down Expand Up @@ -533,6 +593,15 @@
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"<br>\n",
"<a name=\"sort_eig\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -597,6 +666,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<br>\n",
"<br>\n",
"#5.2. Choosing *k* eigenvectors with the largest eigenvalues\n",
"For our simple example, where we are reducing a 3-dimensional feature space to a 2-dimensional feature subspace, we are combining the two eigenvectors with the highest eigenvalues to construct our $d \\times k$-dimensional eigenvector matrix $\\pmb W$."
]
Expand Down Expand Up @@ -628,7 +699,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name='transform'></a>\n",
"<br>\n",
"<br>\n",
"<a name='transform'></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#6. Transforming the samples onto the new subspace\n",
"In the last step, we use the $2 \\times 3$-dimensional matrix $\\pmb W$ that we just computed to transform our samples onto the new subspace via the equation $\\pmb y = \\pmb W^T \\times \\pmb x$."
]
Expand Down Expand Up @@ -678,7 +758,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"mat_pca\"></a>\n",
"<br>\n",
"<br>\n",
"<a name=\"mat_pca\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"#Using the PCA() class from the matplotlib.mlab library\n",
"\n",
"Now, that we have seen how a principal component analysis works, we can use the in-built `PCA()` class from the `matplotlib` library for our convenience in future applications.\n",
Expand Down Expand Up @@ -771,7 +860,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"_diff_mat_pca\"></a>\n",
"<br>\n",
"<br>\n",
"<a name=\"_diff_mat_pca\"></a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Differences between the step by step approach and matplotlib.mlab.PCA()\n",
"\n",
"When we plot the transformed dataset onto the new 2-dimensional subspace, we observe that the scatter plots from our step by step approach and the `matplotlib.mlab.PCA()` class do not look identical. This is due to the fact that `matplotlib.mlab.PCA()` class ***scales the variables to unit variance*** prior to calculating the covariance matrices. This will/could eventually lead to different variances along the axes and affect the contribution of the variable to principal components. \n",
Expand All @@ -784,7 +882,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"sklearn_pca\"> </a>\n",
"<br>\n",
"<br>\n",
"<a name=\"sklearn_pca\"> </a>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# Using the PCA() class from the sklearn.decomposition library to confirm our results"
]
},
Expand Down
Binary file added PDFs/parzen_window_sebastian_raschka.pdf
Binary file not shown.
Binary file not shown.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,8 @@ Reduction</strong></a><br>
<br><br>
[View IPython Notebook](http://nbviewer.ipython.org/github/rasbt/pattern_classification/blob/master/dimensionality_reduction/projection/principal_component_analysis.ipynb?create=1)

[Download PDF](https://github.com/rasbt/pattern_classification/raw/master/PDFs/principal_component_analysis_sebastian_raschka.pdf)

<p><a name="mda"></a></p>
<br>
<br>
Expand Down Expand Up @@ -174,6 +176,7 @@ Reduction</strong></a><br>

[View IPython Notebook](http://nbviewer.ipython.org/github/rasbt/pattern_classification/blob/master/parameter_estimation_techniques/parzen_window_technique.ipynb?create=1)

[Download PDF](https://github.com/rasbt/pattern_classification/raw/master/PDFs/parzen_window_sebastian_raschka.pdf)
<br>
<br>
<br>
Expand Down
Loading

0 comments on commit 89cee55

Please sign in to comment.