Principal Component Analysis for Semantic Classification

Final Project for AMATH 582: Computational Methods for Data Analysis

Principal component analysis (PCA) and classification via supervised learning are two popular topics in data science today. In our project, we combine techniques from both areas in order to classify news articles based on their word frequency content. We find that we can accurately classify the data by projecting onto a small subset of principal components, reducing the feature space from nearly 10,000 elements to only 4. We also compare results from the traditional and robust PCA formulations, and discuss what additional semantic information can be inferred from our results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Principal Component Analysis for Semantic Classification

Final Project for AMATH 582: Computational Methods for Data Analysis

Files

README.md

Latest commit

History

README.md

File metadata and controls

Principal Component Analysis for Semantic Classification

Final Project for AMATH 582: Computational Methods for Data Analysis