modify as suggested in comments, add demo notebook

LCAV · Nov 29, 2021 · 0feb98d · 0feb98d
1 parent 85dd303
commit 0feb98d
Show file tree

Hide file tree

Showing 5 changed files with 460 additions and 31 deletions.
diff --git a/notebooks/figures/MUSIC_pseudo_spectrum.png b/notebooks/figures/MUSIC_pseudo_spectrum.png
diff --git a/notebooks/norm_music_demo.ipynb b/notebooks/norm_music_demo.ipynb
@@ -0,0 +1,351 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ca8eeb2a",
+   "metadata": {},
+   "source": [
+    "# NormMUSIC Demo\n",
+    "\n",
+    "The goal of this notebook is to demonstrate the effect of frequency normalization when MUSIC is applied on broadband signals\n",
+    "\n",
+    "The notebook is structured as follows:\n",
+    "1. [Dataset generation](#dataset)\n",
+    "2. [Prediction](#prediction)\n",
+    "    <ol>\n",
+    "        <li>MUSIC: Standard implementation without normalization</li>\n",
+    "        <li>NormMUSIC: Implementation with frequency normalization as suggested in [1]</li>\n",
+    "    </ol>\n",
+    "3. [Evaluation](#evaluation)\n",
+    "4. [Intuition: Why normalization?](#intuition)\n",
+    "4. [Recommendation](#recommendation)\n",
+    "--- \n",
+    "[1] D. Salvati, C. Drioli and G. L. Foresti, \"Incoherent Frequency Fusion for Broadband Steered Response Power Algorithms in\n",
+    "Noisy Environments,\" in IEEE Signal Processing Letters, vol. 21, no. 5, pp. 581-585, 2014."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bdea0764",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# imports\n",
+    "import pickle\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns\n",
+    "from scipy.signal import stft\n",
+    "from random import uniform, sample\n",
+    "from pyroomacoustics import doa, Room, ShoeBox"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "78bc9d8d",
+   "metadata": {},
+   "source": [
+    "<a id='dataset'></a>\n",
+    "## Dataset generation\n",
+    "In the following, we simulate a small dataset to evaluate the performance of MUSIC and NormMUSIC.\n",
+    "We assume a single sound source.\n",
+    "- Simulate different rooms:\n",
+    " - DOA on 1° grid\n",
+    " - 3 samples per DOA\n",
+    " - Source signal: Random Gaussian\n",
+    " - Different SNRs between 0 and 30 dB\n",
+    "- Calculate 30 STFT time frames for each sample"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "869fafb6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# constants / config\n",
+    "fs = 16000 \n",
+    "nfft = 1024\n",
+    "n = 5*fs # simulation length of source signal (3 seconds)\n",
+    "n_frames = 30\n",
+    "max_order = 10\n",
+    "doas_deg = np.linspace(start=0, stop=359, num=360, endpoint=True)\n",
+    "rs = [0.5, 1, 1.5]\n",
+    "mic_center = np.c_[[2,2,1]]\n",
+    "mic_locs = mic_center + np.c_[[ 0.04,  0.0, 0.0],\n",
+    "                              [ 0.0,  0.04, 0.0],\n",
+    "                              [-0.04,  0.0, 0.0],\n",
+    "                              [ 0.0, -0.04, 0.0],\n",
+    "]\n",
+    "snr_lb, snr_ub = 0, 30"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "67b4dbda",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# room simulation\n",
+    "data = []\n",
+    "for r in rs:\n",
+    "    for i, doa_deg in enumerate(doas_deg):\n",
+    "        doa_rad = np.deg2rad(doa_deg)\n",
+    "        source_loc = mic_center[:,0] + np.c_[r*np.cos(doa_rad), r*np.sin(doa_rad), 0][0]\n",
+    "        room_dim = [uniform(4,6), uniform(4,6), uniform(2, 4)] # meters\n",
+    "\n",
+    "        room = ShoeBox(room_dim, fs=fs, max_order=max_order)\n",
+    "        room.add_source(source_loc, signal=np.random.random(n))\n",
+    "        room.add_microphone_array(mic_locs)\n",
+    "        room.simulate(snr=uniform(snr_lb, snr_ub))\n",
+    "        signals = room.mic_array.signals\n",
+    "\n",
+    "        # calculate n_frames stft frames starting at 1 second\n",
+    "        stft_signals = stft(signals[:,fs:fs+n_frames*nfft], fs=fs, nperseg=nfft, noverlap=0, boundary=None)[2]\n",
+    "        data.append([r, doa_deg, stft_signals])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0479fa0",
+   "metadata": {},
+   "source": [
+    "<a id='prediction'></a>\n",
+    "## Prediction\n",
+    "In the following, we apply MUSIC and NormMUSIC to the simulated dataset.\n",
+    "\n",
+    "The results are stored in a pandas DataFrame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6111ceaf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "kwargs = {'L': mic_locs,\n",
+    "          'fs': fs, \n",
+    "          'nfft': nfft,\n",
+    "          'azimuth': np.deg2rad(np.arange(360))\n",
+    "}\n",
+    "algorithms = {\n",
+    "    'MUSIC': doa.music.MUSIC(**kwargs),\n",
+    "    'NormMUSIC': doa.normmusic.NormMUSIC(**kwargs),\n",
+    "}\n",
+    "columns = [\"r\", \"DOA\"] + list(algorithms.keys())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "de1f6b13",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "predictions = {n:[] for n in columns}\n",
+    "for r, doa_deg, stft_signals in data:\n",
+    "    predictions['r'].append(r)\n",
+    "    predictions['DOA'].append(doa_deg)\n",
+    "    for algo_name, algo in algorithms.items():\n",
+    "        algo.locate_sources(stft_signals)\n",
+    "        predictions[algo_name].append(np.rad2deg(algo.azimuth_recon[0]))\n",
+    "df = pd.DataFrame.from_dict(predictions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a752705a",
+   "metadata": {},
+   "source": [
+    "<a id='evaluation'></a>\n",
+    "## Evaluation\n",
+    "In the next cells we calculate the following metrics:\n",
+    "- Mean Absolute Error (MAE)\n",
+    "- Median Absolute error (MEDAE)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03b7fb64",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "MAE, MEDAE = {}, {}\n",
+    "\n",
+    "def calc_ae(a,b):\n",
+    "    x = np.abs(a-b)\n",
+    "    return np.min(np.array((x, np.abs(360-x))), axis=0)\n",
+    "\n",
+    "for algo_name in algorithms.keys():\n",
+    "    ae = calc_ae(df.loc[:,[\"DOA\"]].to_numpy(), df.loc[:,[algo_name]].to_numpy())\n",
+    "    MAE[algo_name] = np.mean(ae)\n",
+    "    MEDAE[algo_name] = np.median(ae)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f49c579d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(f\"MAE\\t MUSIC: {MAE['MUSIC']:5.2f}\\t NormMUSIC: {MAE['NormMUSIC']:5.2f}\")\n",
+    "print(f\"MEDAE\\t MUSIC: {MEDAE['MUSIC']:5.2f}\\t NormMUSIC: {MEDAE['NormMUSIC']:5.2f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18804383",
+   "metadata": {},
+   "source": [
+    "<a id='intuition'></a>\n",
+    "## Intuition: Why normalization?\n",
+    "\n",
+    "### MUSIC revisited: Complex narrowband signals\n",
+    "Before we discuss the intution behind frequency normalization, let's revisit the MUSIC algorithm for complex narrowband signals.\n",
+    "\n",
+    "#### The MUSIC pseudo spectrum\n",
+    "\n",
+    "The MUSIC pseudo spectrum $\\hat{P}_{MUSIC}(\\theta)$ is defined as:\n",
+    "\n",
+    "$$\\hat{P}_{MUSIC}(\\mathbf{e}(\\theta)) = \\frac{1}{\\sum_{i=p+1}^{N} |\\mathbf{e}(\\theta)^H \\mathbf{v}_i|}$$,\n",
+    "where\n",
+    "- $\\mathbf{v}_i$ are the noise eigenvectors\n",
+    "- $\\mathbf{e}(\\theta)$ is the candidate steering vector\n",
+    "- $\\theta$ is the candidate DOA\n",
+    "\n",
+    "MUSIC obtains its estimated DOA $\\hat{\\theta}$ by maximizing the pseudo spectrum $\\hat{P}_{MUSIC}(\\theta)$ over the candidate DOAs $\\theta \\in \\{\\theta_1, \\theta_2, ... \\theta_I\\}$, i.e.,\n",
+    "$$\\hat{\\theta} = {arg max}_\\theta \\;\\hat{P}_{MUSIC}(\\theta)$$\n",
+    "\n",
+    "#### Orthogonality\n",
+    "The main property that is exploited by MUSIC is the orthogonality between the noise eigenvectors $\\mathbf{v}_i$ and the steering vector $\\mathbf{e}(\\theta^{\\star})$ of the DOA $\\theta^{\\star}$, i.e., \n",
+    "$$ \\mathbf{e}(\\theta^{\\star}) \\perp span(\\mathbf{v}_{p+1}, \\mathbf{v}_{p+2}, \\mathbf{v}_{N}) \\;\\; \\Leftrightarrow \\;\\; |\\mathbf{e}(\\theta^{\\star})^H \\mathbf{v}_i| = 0 \\;\\; \\forall i \\in \\{p+1, p+2, ... N\\}$$\n",
+    "\n",
+    "In practice, $\\hat{\\theta}$ is approximately orthogonal to the noise eigenvectors $\\mathbf{v}_i$, i.e.,\n",
+    "$$ |\\mathbf{e}(\\hat{\\theta})^H \\mathbf{v}_i| \\approx 0 \\;\\; \\forall i \\in \\{p+1, p+2, ... N\\}$$\n",
+    "\n",
+    "Therefore, ${max}_{\\theta} \\; \\hat{P}_{MUSIC}(\\theta) = \\frac{1}{\\epsilon}$, with $\\epsilon \\ll 1$, which results in a curve with (one of more) distict peak(s) that is characteristic for the MUSIC pseudo spectrum.\n",
+    "\n",
+    "An example of a MUSIC pseudo spectrum is plotted below:\n",
+    "\n",
+    "![alt text](figures/MUSIC_pseudo_spectrum.png \"MUSIC pseudo spectrum\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e0c40674",
+   "metadata": {},
+   "source": [
+    "### MUSIC revisited: STFT processing for broadband signals\n",
+    "When MUSIC is applied to broadband signals via STFT-processing, a pseudo spectrum $\\hat{P}_{MUSIC}(\\theta, k)$ is calculated for each individual frequency bin $k$.\n",
+    "\n",
+    "The individual MUSIC pseudo spectra $\\hat{P}_{MUSIC}(\\theta, k)$ are summed across all frequency bins $k \\in \\{1, 2, \\ldots, K\\}$.\n",
+    "\n",
+    "In the simplest case, this is performed without normalization, i.e., \n",
+    "$$\\tilde{P}_{MUSIC}(\\theta) = \\sum_{k=1}^{K} \\hat{P}_{MUSIC}(\\theta, k)$$\n",
+    "\n",
+    "While this is the commonly used implementation, it is far from being optimal, since\n",
+    "the maxima of the MUSIC pseudospectra $\\hat{P}_{MUSIC}(\\theta, k)$ may differ in orders of magnitude. By summing across frequencies without normalization, only the information of the few frequencies with the highest peaks in the MUSIC pseudo spectra are used. The information of frequencies with lower peaks in the MUSIC pseudo spectra  are practically not used to estimate the DOA.\n",
+    "\n",
+    "To illustrate this, let's first plot the MUSIC pseudo spectra of 10 randomly selected frequency bins.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3df2f670",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fig = plt.figure(figsize=(14,10))\n",
+    "frequencies = sample(list(range(algorithms['MUSIC'].Pssl.shape[1])), k=10)\n",
+    "for i, k in enumerate(frequencies):\n",
+    "    plt.plot(algorithms[\"MUSIC\"].Pssl[:,k])\n",
+    "plt.xlabel(\"angle [°]\")\n",
+    "plt.title(\"Multiple narrowband MUSIC pseudo spectra in one plot\", fontsize=15)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a45c4a62",
+   "metadata": {},
+   "source": [
+    "In the next cell, we calculate the maxima of the individual MUSIC pseudp-spectra per frequency bin $k$, i.e.,\n",
+    "$$ \\hat{\\theta}_k = max_\\theta \\;\\hat{P}_{MUSIC}(\\theta, k)$$\n",
+    "and visualize them in a swarm plot. \n",
+    "\n",
+    "From the plot, it should be clear that effectively only a few frequency bins contribute to the solution (if no normalization is applied)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b8865f10",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# calculation\n",
+    "maxima = sorted(np.max(algorithms[\"MUSIC\"].Pssl, axis=0))\n",
+    "\n",
+    "#plotting\n",
+    "fig, ax = plt.subplots(1, 1, figsize=(14,10))\n",
+    "sns.swarmplot(data=maxima, ax=ax, size=6)\n",
+    "\n",
+    "ax.set_title(\"\\nDistribution: Maxima of the MUSIC pseudo spectra of multiple frequency bins\\n\", fontsize=20)\n",
+    "ax.set_xticks([1])\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4c6b8c9",
+   "metadata": {},
+   "source": [
+    "To avoid the above mentioned problem, one could simply normalize the MUSIC pseudo spectra before summing across frequencies, i.e.,\n",
+    "$$\\hat{P}_{NormMUSIC}(\\theta, k) = \\frac{\\hat{P}_{MUSIC}(\\theta, k)}{\\hat{\\theta}_k }$$\n",
+    "$$\\tilde{P}_{NormMUSIC}(\\theta) = \\sum_{k=1}^{K} \\hat{P}_{NormMUSIC}(\\theta, k)$$\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "87e2a691",
+   "metadata": {},
+   "source": [
+    "<a id='recommendation'></a>\n",
+    "## Recommendation\n",
+    "\n",
+    "- As its performance is more robust, we recommend to use NormMUSIC over MUSIC\n",
+    "- When MUSIC is used as a baseline for publications, we recommend to include both MUSIC without frequency normalization and NormMUSIC because:\n",
+    " - (i)   Using MUSIC without normalization ensures comparability with older papers.\n",
+    " - (ii)  Using NormMUSIC ensures a fair comparison. Especially if you tweak the parameters of a new algorithm, you should not use a suboptimal implementation as a baseline."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}