Project 2: After preprocessing data in the python script, develop a partitional clustering algorithm based on spherical k-means. Your program should take as input the vector-space representation of the objects, the number of clusters, the number of trials that it will perform (each trial will be seeded with a different randomly selected set of objects), and the class labels of the objects. The class label of an object is the newsgroup that the corresponding posting appeared. Upon completion, your program should write the clustering solution to a file, and report the value of the objective function for the best trial and that solution will be used to analyze the characteristics of the clusters in terms of the class distribution of the objects that they contain. To do that, your program should output a two dimensional matrix of dimensions (# of clusters)*(# of classes) whose entries will be the number of objects of a particular class that belongs to a particular cluster. To evaluate the quality of the clustering solution that you obtain, your program needs to compute the entropy and purity of the clustering solution with respect to that class distribution
Project 3: Classification of emails by having a learning model from the training data and apply to test data. 2 Kinds of Classification:
- KNN classifier.
- Centroid-based classifier.