In the following we compared two techniques of clustering Kmeans and spectral clustering. For spectral clustering we implemented the Normalized Spectral Clustering technique from Ng, Jordan, and Weiss described in following reference:
A Tutorial on Spectral Clustering, Ulrike von Luxburg, 2007
In part I we used a generative model of graphs called mixed membership stochastic block model (MMSBM) from reference:
Mixed Membership Stochastic Block models, Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing, 2008
We show how kmeans and spectral clustering performs in this framework with different graph complexity.
Graph complexity depends on the complexity of vertices profils. In order to compare performance we computed a label recovery error. Although it should be noted that here models only perform partial recovery. Indeed each model maps each vertex to one cluster while MMSBM model is much more complex is general. In MMSBM each vertex maps to a proportion of membership for each community. As a consequence we considered for each vertex the true cluster to be the cluster for which the proportion of membership is the highest.
In part II we used the previous methods of graph clustering to perform clustering on a point cloud data (PCD)
Again we show how kmeans and spectral clustering performs in this framework.
Here is a summarized presentation of the generative model:
- model:
We sampled MMSBM graphs with parameters:
- Kmeans clustering using adjacency representation:
- Normalized Spectral Clustering:
- Results (averaged over 12 sampling):
- model:
We sampled MMSBM graphs with parameters:
- Kmeans clustering using adjacency representation:
- Normalized Spectral Clustering:
- Results (averaged over 12 sampling):
- model:
We sampled MMSBM graphs with parameters:
we sample each edge with (1-sparsity_param) x proba_edge
- Kmeans clustering using adjacency representation:
- Normalized Spectral Clustering:
- Results (averaged over 12 sampling):
-
With low sparsity, kmeans tends to perform slightly better. It is expected since Kmeans over adjancency matrix uses a lot more dimensions than spectral clustering.
-
With high sparsity, spectral clustering performs better than kmeans. It can be explained by the fact that if Kmeans uses a lot more dimensions, it focuses on intra cluster information and gets easily stuck in local minima. In the contrary spectral clustering method focuses on inter-cluster information. Actually the latter graph looks like a k-Nearest-Neighborhood graph. In the next part we will see that we can use this property to perform clustering in point cloud data.
- Point Cloud Data model and k-NN graphs. Graph is made symmetric by taking A_sym_ij = {np.max(A_ij, A_ji)}_ij
- Kmeans clustering using respectively PCD and adjacency representation:
- Normalized Spectral Clustering:
- Results (averaged over 10 sampling):
Spectral clustering performs better than Kmeans in k-NN graphs and therefore on point cloud data. Is is also a more efficient technique since it uses way less dimensions than kmeans.