Improve Notebooks (#150)

Additional notes have been added the Oil and Algea notebooks, and the Algea notebook has been expanded.
azavea · Jul 12, 2021 · 9b7eb88 · 9b7eb88
1 parent 81d692f
commit 9b7eb88
Show file tree

Hide file tree

Showing 2 changed files with 530 additions and 26 deletions.
diff --git a/src/hyperspectral/notebooks/Algal Bloom Exploration.ipynb b/src/hyperspectral/notebooks/Algal Bloom Exploration.ipynb
diff --git a/src/hyperspectral/notebooks/Deepwater Horizon Exploration.ipynb b/src/hyperspectral/notebooks/Deepwater Horizon Exploration.ipynb
@@ -31,6 +31,14 @@
     "# Explore #"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "a28974f4",
+   "metadata": {},
+   "source": [
+    "This sequence was obtained from [The USGS Spectral Library Version 7](https://crustal.usgs.gov/speclab/QueryAll07a.php).  It can be found by going to that site and searching for `Benzene17` or similar in the quick search box."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -48,6 +56,14 @@
     "spectrum_normalized = scipy.signal.resample(spectrum, 224) - spectrum.mean()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "fceec4ee",
+   "metadata": {},
+   "source": [
+    "In the last line above, the signal is resampled to 224 bands.  The spectrum goes from 350nm to roughly 2500nm.  AVIRIS goes from either 350nm to 2500nm or from 400nm to 2500nm (I have not been able to get clarity on which, the former is assumed in this notebook)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -88,6 +104,43 @@
     "                    out_ds.write(data, window=window)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "23b29c3e",
+   "metadata": {},
+   "source": [
+    "AVIRIS imagery is being used.  The following query polygon was used:\n",
+    "```json\n",
+    "{\n",
+    "    \"type\":\"Polygon\",\n",
+    "    \"coordinates\":[\n",
+    "      [\n",
+    "            [\n",
+    "              -88.38783144950867,\n",
+    "              28.735932344899926\n",
+    "            ],\n",
+    "            [\n",
+    "              -88.38593244552611,\n",
+    "              28.735932344899926\n",
+    "            ],\n",
+    "            [\n",
+    "              -88.38593244552611,\n",
+    "              28.73741872317998\n",
+    "            ],\n",
+    "            [\n",
+    "              -88.38783144950867,\n",
+    "              28.73741872317998\n",
+    "            ],\n",
+    "            [\n",
+    "              -88.38783144950867,\n",
+    "              28.735932344899926\n",
+    "            ]\n",
+    "      ]\n",
+    "    ]\n",
+    "}\n",
+    "```"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -285,7 +338,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "def infer2(in_filename, out_filename):\n",
+    "def infer2(in_filename, out_filename, spec):\n",
     "    with rio.open(in_filename, 'r') as in_ds:\n",
     "        profile = copy.deepcopy(in_ds.profile)\n",
     "        profile.update(count=1, driver='GTiff', bigtiff='yes', compress='deflate', predictor='2', tiled='yes', dtype=np.float32, sparse_ok='yes')\n",
@@ -303,7 +356,7 @@
     "                    data /= norm\n",
     "                    data -= np.mean(data, axis=2)[...,None]\n",
     "                    data = whiten(data, W, 0)\n",
-    "                    data = np.dot(data, whitened_spectrum)\n",
+    "                    data = np.dot(data, spec)\n",
     "                    data[np.isnan(data)] = 0\n",
     "                    data = data.reshape(1, width, height).astype(np.float32)\n",
     "                    out_ds.write(data, window=window)"
@@ -338,7 +391,7 @@
    "source": [
     "in_filename = 'data2/f100517t01p00r14rdn_b/f100517t01p00r14rdn_b_sc01_ort_img.tif'\n",
     "out_filename = 'data2/results/f100517t01p00r14rdn_b_sc01_ort_img_result_whitened.tif'\n",
-    "infer2(aviris, out_filename)"
+    "infer2(in_filename, out_filename, whitened_spectrum)"
    ]
   },
   {
@@ -478,7 +531,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "whitened_spectrum = whiten(spectrum, W, 0)"
+    "opt_whitened_spectrum = whiten(spectrum, W, 0)"
    ]
   },
   {
@@ -490,7 +543,7 @@
    "source": [
     "in_filename = 'data2/f100517t01p00r14rdn_b/f100517t01p00r14rdn_b_sc01_ort_img.tif'\n",
     "out_filename = 'data2/results/f100517t01p00r14rdn_b_sc01_ort_img_result_whitened_opt.tif'\n",
-    "infer2(in_filename, out_filename)"
+    "infer2(in_filename, out_filename, opt_whitened_spectrum)"
    ]
   },
   {
@@ -504,13 +557,27 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "ed171092",
+   "id": "8f9addc3",
    "metadata": {},
    "outputs": [],
    "source": [
     "start = neg_subset.shape[0]\n",
     "length = pos.shape[0]\n",
-    "according_to_salience = argsort(model, samples, target, list(range(start, start+length)))"
+    "subset_of_samples = samples[list(range(start, start+length)),...]\n",
+    "mean_of_samples = subset_of_samples.mean(axis=0).unsqueeze(axis=0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ed171092",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# start = neg_subset.shape[0]\n",
+    "# length = pos.shape[0]\n",
+    "# according_to_salience = argsort(model, samples, target, list(range(start, start+length)))\n",
+    "according_to_salience = argsort(model, mean_of_samples, target, [0])"
    ]
   },
   {
@@ -573,6 +640,22 @@
     "according_to_salience = dictionary.get('according_to_salience')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "ddb01e5f",
+   "metadata": {},
+   "source": [
+    "## Best 48 ##"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fd1eb508",
+   "metadata": {},
+   "source": [
+    "Find the best 48 bands (according to salience)."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -591,7 +674,7 @@
    "id": "516f5917",
    "metadata": {},
    "source": [
-    "Should probably renormalize at this point."
+    "(Should probably renormalize at this point.)"
    ]
   },
   {
@@ -621,7 +704,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "def infer3(in_filename, out_filename):\n",
+    "def infer3(in_filename, out_filename, spec, W, bands):\n",
     "    with rio.open(in_filename, 'r') as in_ds:\n",
     "        profile = copy.deepcopy(in_ds.profile)\n",
     "        profile.update(count=1, driver='GTiff', bigtiff='yes', compress='deflate', predictor='2', tiled='yes', dtype=np.float32, sparse_ok='yes')\n",
@@ -634,12 +717,12 @@
     "                    data = in_ds.read(1, window=window)\n",
     "                    if np.abs(data).sum() == 0:\n",
     "                        continue\n",
-    "                    data = np.transpose(in_ds.read(tuple(best_48), window=window).astype(np.float32), (1,2,0))\n",
+    "                    data = np.transpose(in_ds.read(bands, window=window).astype(np.float32), (1,2,0))\n",
     "                    norm = np.linalg.norm(data, ord=2, axis=2)[..., None].astype(np.float32)\n",
     "                    data /= norm\n",
     "                    data -= np.mean(data, axis=2)[...,None]\n",
     "                    data = whiten(data, W, 0)\n",
-    "                    data = np.dot(data, whitened_spectrum48)\n",
+    "                    data = np.dot(data, spec)\n",
     "                    data[np.isnan(data)] = 0\n",
     "                    data = data.reshape(1, width, height).astype(np.float32)\n",
     "                    out_ds.write(data, window=window)"
@@ -654,15 +737,23 @@
    "source": [
     "in_filename = 'data2/f100517t01p00r14rdn_b/f100517t01p00r14rdn_b_sc01_ort_img.tif'\n",
     "out_filename = 'data2/results/f100517t01p00r14rdn_b_sc01_ort_img_result_whitened_48.tif'\n",
-    "infer3(in_filename, out_filename)"
+    "infer3(in_filename, out_filename, whitened_spectrum48, W, tuple(best_48))"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "f1a40e6e",
    "metadata": {},
    "source": [
-    "### Worst 48 ###"
+    "## Worst 48 ##"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "84df0d5c",
+   "metadata": {},
+   "source": [
+    "Extract the 48 worst bands (for testing purposes)."
    ]
   },
   {
@@ -672,10 +763,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "best_48 = according_to_salience[48:].squeeze()\n",
-    "pos48 = pos[:, best_48].squeeze()\n",
-    "neg48 = neg[:, best_48].squeeze()\n",
-    "spectrum48 = spectrum[worst_48].reshape(1,-1)"
+    "worst_48 = according_to_salience[48:].squeeze()\n",
+    "pos48 = pos[:, worst_48].squeeze()\n",
+    "neg48 = neg[:, worst_48].squeeze()\n",
+    "spectrum48 = spectrum[worst_48].reshape(1,-1)\n",
+    "W, mean = zca_whitening_matrix(neg48)\n",
+    "whitened_spectrum48 = whiten(spectrum48, W, 0).reshape(-1,1)"
    ]
   },
   {
@@ -687,7 +780,7 @@
    "source": [
     "in_filename = 'data2/f100517t01p00r14rdn_b/f100517t01p00r14rdn_b_sc01_ort_img.tif'\n",
     "out_filename = 'data2/results/f100517t01p00r14rdn_b_sc01_ort_img_result_whitened_worst_48.tif'\n",
-    "infer3(in_filename, out_filename)"
+    "infer3(in_filename, out_filename, whitened_spectrum48, W, tuple(worst_48))"
    ]
   },
   {