Skip to content

Commit 182198b

Browse files
Update 3_mip.md
1 parent c3fb725 commit 182198b

File tree

1 file changed

+0
-32
lines changed

1 file changed

+0
-32
lines changed

docs/2_method/3_mip.md

Lines changed: 0 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -31,35 +31,3 @@ After solving this integer program, the non-zero diagonal entries of $A$ represe
3131

3232
Finding global optimality can increase the computation time, depending on the number of time series within the dataset and the DTW distances. Therefore, there is also a built-in option to cluster using k-medoids, described in [k-Medoids Clustering](link to that). The k-medoids method is often quicker as it is an iterative approach, however it is subject to getting stuck in local optima. The results in the next section show the timing and memory performance of both MIP clustering and k-medoids clustering using \texttt{DTW-C++} compared to other packages.
3333

34-
35-
36-
37-
38-
39-
The DTW distance matrix is a square matrix $D_{n\times n}$ where $n$ is the number of data series in the problem, so $D_{ij}=C(i,j)$. The problem formulation begins with a binary square matrix $A_{n\times n}$ where $A_{ij}=1$ if data series $j$ is in the cluster with centroid $i$ and 0 otherwise. $B$ is a $1\times n$ binary vector where
40-
41-
$
42-
B_{i} = \begin{cases}
43-
1, \qquad \text {if centroid}\\
44-
0, \qquad \text {otherwise}
45-
\end{cases}
46-
$
47-
48-
$$\sum_{i=1}^n B_{i}=k$$
49-
50-
The following constraints apply:
51-
1. Each data series must be in 1 cluster
52-
53-
$$ \sum_{i=1}^nA_{ij}=1$$
54-
55-
2. Only $$k$$ rows have non-zero values
56-
57-
$$ A_{ij} \le B_i $$
58-
59-
With the cost function to be minimised:
60-
61-
$$ F=\min \sum_{i} \sum_{j} D_{ij} \odot A_{ij}$$
62-
63-
Where $$\odot$$ represents element-wise multiplication.
64-
65-
After the problem formulation, there are many many possible solutions to be explored. To reduce this linear programming relaxation and branch and bound are used. The relaxation drops the binary constraint, allowing values between 0 and 1, for example a data series could be 0.2 in one cluster, 0.2 in another an 0.6 in another. Relaxing the problem will give a better solution, because there are more degrees of freedom. The cost value here is the lower bound. Each value in $$A$$ is rounded to 0 or 1 and the cost calculated to give a feasible solution. When exploring each branch with relaxation, if any cost is greater than the previously calculated feasible solution, that branch can be cut and not explored anymore because even with relaxation which gives a better cost, it's still greater and therefore a better solution is not possible down that branch. This process continues until the best solution is found. This was implemented using the YALMIP package in MatLab.

0 commit comments

Comments
 (0)