Update 2_dtw.md

ElektrikAkar · ElektrikAkar · commit a04eb8472039 · 2023-12-09T21:54:37.000Z
diff --git a/docs/2_method/2_dtw.md b/docs/2_method/2_dtw.md
@@ -10,25 +10,24 @@ Dynamic time warping is a well-known technique for manipulating time series to e
 
 ## DTW Algorithm
 
-Consider a time series to be a vector of some arbitrary length. Consider that we have ($p$) such vectors in total, each possibly differing in length. To find a subset of ($k$) clusters within the set of ($p$) vectors using MIP formulation, we must first make $\frac{1}{2} {p \choose 2}$ pairwise comparisons between all vectors within the total set and find the `similarity' between each pair. In this case, the similarity is defined as the DTW distance. Consider two time series ($x$) and ($y$) of differing lengths ($n$) and ($m$) respectively,
+Consider a time series to be a vector of some arbitrary length. Consider that we have ($$p$$) such vectors in total, each possibly differing in length. To find a subset of ($$k$$) clusters within the set of ($$p$$) vectors using MIP formulation, we must first make $$\frac{1}{2} {p \choose 2}$$ pairwise comparisons between all vectors within the total set and find the `similarity' between each pair. In this case, the similarity is defined as the DTW distance. Consider two time series ($$x$$) and ($$y$$) of differing lengths ($$n$$) and ($$m$$) respectively,
 
 $$
-x=(x_1, x_2, ..., x_n)
-$$
-$$
-y=(y_1, y_2, ..., y_m).
+\begin{aligned} 
+x&=(x_1, x_2, ..., x_n)\\
+y&=(y_1, y_2, ..., y_m).
 $$
 
 The DTW distance is the sum of the Euclidean distance between each point and its matched point(s) in the other vector. The following constraints must be met: 
 
 1. The first and last elements of each series must be matched.
-2. Only unidirectional forward movement through relative time is allowed, i.e., if $x_1$ is mapped to $y_2$ then $x_2$ may not be mapped to
-    $y_1$ (monotonicity). 
+2. Only unidirectional forward movement through relative time is allowed, i.e., if $$x_1$$ is mapped to $$y_2$$ then $$x_2$$ may not be mapped to
+    $$y_1$$ (monotonicity). 
 3. Each point is mapped to at least one other point, i.e., there are no jumps in time (continuity).
 
-![Two time series with DTW pairwise alignment between each element, showing one-to-many mapping properties of DTW (left). Cost matrix $C$ for the two time series, showing the warping path and final DTW cost at $C_{14,13}$ (right).](../../media/Merged_document.pdf)
+![Two time series with DTW pairwise alignment between each element, showing one-to-many mapping properties of DTW (left). Cost matrix $$C$$ for the two time series, showing the warping path and final DTW cost at $$C_{14,13}$$ (right).](../../media/Merged_document.png)
 
-Finding the optimal warping arrangement is an optimisation problem that can be solved using dynamic programming, which splits the problem into easier sub-problems and solves them recursively, storing intermediate solutions until the final solution is reached. To understand the memory-efficient method used in ''DTW-C++``, it is useful to first examine the full-cost matrix solution, as follows. For each pairwise comparison, an ($n$) by ($m$) matrix $C^{n\times m}$ is calculated, where each element represents the cumulative cost between series up to the points $x_i$ and $y_j$:
+Finding the optimal warping arrangement is an optimisation problem that can be solved using dynamic programming, which splits the problem into easier sub-problems and solves them recursively, storing intermediate solutions until the final solution is reached. To understand the memory-efficient method used in ''DTW-C++``, it is useful to first examine the full-cost matrix solution, as follows. For each pairwise comparison, an ($$n$$) by ($$m$$) matrix $$C^{n\times m}$$ is calculated, where each element represents the cumulative cost between series up to the points $$x_i$$ and $$y_j$$:
 
 \[
 c_{i,j} = (x_i-y_j)^2+\min \left\{
@@ -38,11 +37,11 @@ c_{i-1,j-1} & c_{i-1,j} & c_{i,j-1}
 \right\}
 \]
 
-The final element $c_{n,m}$ is then the total cost, $C_{x,y}$, which provides the comparison metric between the two series $x$ and $y$. \autoref{fig:warping_signals} shows an example of this cost matrix $C$ and the warping path through it.
+The final element $$c_{n,m}$$ is then the total cost, $$C_{x,y}$$, which provides the comparison metric between the two series $$x$$ and $$y$$. \autoref{fig:warping_signals} shows an example of this cost matrix $$C$$ and the warping path through it.
 
-For the clustering problem, only this final cost for each pairwise comparison is required; the actual warping path (or mapping of each point in one time series to the other) is superfluous for k-medoids clustering. The memory complexity of the cost matrix $C$ is $O(nm)$, so as the length of the time series increases, the memory required increases greatly. Therefore, significant reductions in memory can be made by not storing the entire $C$ matrix. When the warping path is not required, only a vector containing the previous row for the current step of the dynamic programming sub-problem is required (i.e., the previous three values $c_{i-1,j-1}$, $c_{i-1,j}$, $c_{i,j-1}$).
+For the clustering problem, only this final cost for each pairwise comparison is required; the actual warping path (or mapping of each point in one time series to the other) is superfluous for k-medoids clustering. The memory complexity of the cost matrix $$C$$ is $$O(nm)$$, so as the length of the time series increases, the memory required increases greatly. Therefore, significant reductions in memory can be made by not storing the entire $$C$$ matrix. When the warping path is not required, only a vector containing the previous row for the current step of the dynamic programming sub-problem is required (i.e., the previous three values $$c_{i-1,j-1}$$, $$c_{i-1,j}$$, $$c_{i,j-1}$$).
 
-The DTW distance $C_{x,y}$ is found for each pairwise comparison. Pairwise distances are then stored in a separate symmetric matrix, $D^{p\times p}$, where ($p$) is the total number of time series in the clustering exercise. In other words, the element $d_{i,j}$ gives the distance between time series ($i$) and ($j$).
+The DTW distance $$C_{x,y}$$ is found for each pairwise comparison. Pairwise distances are then stored in a separate symmetric matrix, $$D^{p\times p}$$, where ($$p$$) is the total number of time series in the clustering exercise. In other words, the element $$d_{i,j}$$ gives the distance between time series ($$i$$) and ($$j$$).
 
 ### Warping Window