many fixes

ravismula · Apr 24, 2014 · 89cee55 · 89cee55
1 parent 8af6b76
commit 89cee55
Show file tree

Hide file tree

Showing 11 changed files with 1,571 additions and 266 deletions.
diff --git a/.ipynb_checkpoints/h4-checkpoint.ipynb b/.ipynb_checkpoints/h4-checkpoint.ipynb
diff --git a/.ipynb_checkpoints/max_likelihood_est_distributions-checkpoint.ipynb b/.ipynb_checkpoints/max_likelihood_est_distributions-checkpoint.ipynb
diff --git a/.ipynb_checkpoints/parzen_window_technique-checkpoint.ipynb b/.ipynb_checkpoints/parzen_window_technique-checkpoint.ipynb
diff --git a/.ipynb_checkpoints/principal_component_analysis-checkpoint.ipynb b/.ipynb_checkpoints/principal_component_analysis-checkpoint.ipynb
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:201fdb96e2d2e70b01d2feec96ab182ba81534bd7c95df312f999ee3cf580bf2"
+  "signature": "sha256:87c7ae7d39c07a3a2123b7d3c47b83e683cd0894b1d5e5190c7c65cbff25d616"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -12,10 +12,11 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "Sebastian Raschka  \n",
-      "last updated: 04/17/2014\n",
+      "Sebastian Raschka   \n",
+      "last updated: 04/17/2014  \n",
+      "\n",
       "- Link to the containing GitHub Repository: [https://github.com/rasbt/pattern_classification](https://github.com/rasbt/pattern_classification)\n",
-      "- Link to this IPython Notebook on GitHub: [principal_component_analysis.ipynb](https://github.com/rasbt/pattern_classification/blob/master/dimensionality_reduction/projection/principal_component_analysis.ipynb\")"
+      "- Link to this IPython Notebook on GitHub: [principal_component_analysis.ipynb](\"https://github.com/rasbt/pattern_classification/blob/master/dimensionality_reduction/projection/principal_component_analysis.ipynb\")  "
      ]
     },
     {
@@ -48,8 +49,8 @@
       "- <a href=\"#sample_data\">Generating 3-dimensional sample data</a>\n",
       "- <a href=\"#gen_data\">The step by step approach</a>\n",
       "    - 1.&nbsp;<a href=\"#drop_labels\">Taking the whole dataset ignoring the class labels</a>  \n",
-      "    - 2.&nbsp;<a href=\"#mean_vec\"> Compute the $d$-dimensional mean vector</a>\n",
-      "    - 3.&nbsp;<a href=\"#sc_matrix\">Computing the scatter matrix (alternatively, the covariance matrix)</a>\n",
+      "    - 2.&nbsp;<a href=\"#mean_vec\">Compute the $d$-dimensional mean vector</a>\n",
+      "    - 3.&nbsp;<a href=\"#comp_scatter\">Computing the scatter matrix (alternatively, the covariance matrix)</a>\n",
       "    - 4.&nbsp;<a href=\"#eig_vec\">Computing eigenvectors and corresponding eigenvalues</a>\n",
       "    - 5.&nbsp;<a href=\"#sort_eig\">Ranking and choosing $k$ eigenvectors</a>\n",
       "    - 6.&nbsp;<a href=\"#transform\">Transforming the samples onto the new subspace</a>\n",
@@ -62,7 +63,15 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name=\"introduction\"></a>\n",
+      "<br>\n",
+      "<a name=\"introduction\"></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "# Introduction"
      ]
     },
@@ -105,7 +114,16 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name=\"sample_data\"></a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"sample_data\"></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "# Generating some 3-dimensional sample data"
      ]
     },
@@ -221,8 +239,15 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name='drop_labels'></a>\n",
-      "\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name='drop_labels'></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
       "#1. Taking the whole dataset ignoring the class labels\n",
       "\n",
       "Because we don't need class labels for the PCA analysis, let us merge the samples for our 2 classes into one $3\\times40$-dimensional array."
@@ -244,7 +269,15 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name='mean_vec'></a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name='mean_vec'></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
       "#2. Computing the d-dimensional mean vector"
      ]
     },
@@ -280,7 +313,16 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name=\"mean_vec\"></a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"comp_scatter\"></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "# 3. a) Computing the Scatter Matrix"
      ]
     },
@@ -319,6 +361,15 @@
      ],
      "prompt_number": 7
     },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"comp_cov\"></a>"
+     ]
+    },
     {
      "cell_type": "markdown",
      "metadata": {},
@@ -367,7 +418,16 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name=\"eig_vec\"></a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"eig_vec\"></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "#4. Computing eigenvectors and corresponding eigenvalues\n",
       "\n",
       "To show that the eigenvectors are indeed identical whether we derived them from the scatter or the covariance matrix, let us put an `assert` statement into the code. Also, we will see that the eigenvalues were indeed scaled by the factor 39 when we derived it from the scatter matrix."
@@ -533,6 +593,15 @@
      ],
      "prompt_number": 11
     },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"sort_eig\"></a>"
+     ]
+    },
     {
      "cell_type": "markdown",
      "metadata": {},
@@ -597,6 +666,8 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
+      "<br>\n",
+      "<br>\n",
       "#5.2. Choosing *k* eigenvectors with the largest eigenvalues\n",
       "For our simple example, where we are reducing a 3-dimensional feature space to a 2-dimensional feature subspace, we are combining the two eigenvectors with the highest eigenvalues to construct our $d \\times k$-dimensional eigenvector matrix $\\pmb W$."
      ]
@@ -628,7 +699,16 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name='transform'></a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name='transform'></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "#6. Transforming the samples onto the new subspace\n",
       "In the last step, we use the $2 \\times 3$-dimensional matrix $\\pmb W$ that we just computed to transform our samples onto the new subspace via the equation  $\\pmb y = \\pmb W^T \\times \\pmb x$."
      ]
@@ -678,7 +758,16 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name=\"mat_pca\"></a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"mat_pca\"></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "#Using the PCA() class from the matplotlib.mlab library\n",
       "\n",
       "Now, that we have seen how a principal component analysis works, we can use the in-built `PCA()` class from the `matplotlib` library for our convenience in future applications.\n",
@@ -771,7 +860,16 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name=\"_diff_mat_pca\"></a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"_diff_mat_pca\"></a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "## Differences between the step by step approach and matplotlib.mlab.PCA()\n",
       "\n",
       "When we plot the transformed dataset onto the new 2-dimensional subspace, we observe that the scatter plots from our step by step approach and the `matplotlib.mlab.PCA()` class do not look identical. This is due to the fact that `matplotlib.mlab.PCA()` class ***scales the variables to unit variance*** prior to calculating the covariance matrices. This will/could eventually lead to different variances along the axes and affect the contribution of the variable to principal components. \n",
@@ -784,7 +882,16 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "<a name=\"sklearn_pca\"> </a>\n",
+      "<br>\n",
+      "<br>\n",
+      "<a name=\"sklearn_pca\"> </a>"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "\n",
       "# Using the PCA() class from the sklearn.decomposition library to confirm our results"
      ]
     },

diff --git a/PDFs/parzen_window_sebastian_raschka.pdf b/PDFs/parzen_window_sebastian_raschka.pdf
diff --git a/PDFs/principal_component_analysis_sebastian_raschka.pdf b/PDFs/principal_component_analysis_sebastian_raschka.pdf
diff --git a/README.md b/README.md
@@ -98,6 +98,8 @@ Reduction</strong></a><br>
 <br><br>
 [View IPython Notebook](http://nbviewer.ipython.org/github/rasbt/pattern_classification/blob/master/dimensionality_reduction/projection/principal_component_analysis.ipynb?create=1)  
 
+[Download PDF](https://github.com/rasbt/pattern_classification/raw/master/PDFs/principal_component_analysis_sebastian_raschka.pdf)
+
 <p><a name="mda"></a></p>
 <br>
 <br>
@@ -174,6 +176,7 @@ Reduction</strong></a><br>
 
 [View IPython Notebook](http://nbviewer.ipython.org/github/rasbt/pattern_classification/blob/master/parameter_estimation_techniques/parzen_window_technique.ipynb?create=1)  
 
+[Download PDF](https://github.com/rasbt/pattern_classification/raw/master/PDFs/parzen_window_sebastian_raschka.pdf)
 <br>
 <br>
 <br>