This project utilizes the statistical foundations of predictive data analytics using the R programming language on the Automobile dataset, which consists of data from 1985 Ward’s Automotive Yearbook. The project is divided into two phases, where the first phase is concerned with developing a model to accurately predict the price of listed automobiles, while the second phase involves clustering various makes of cars on the basis of their specifications and features.
The final objective of my analysis is to create a recommendation system which would integrate the price forecasts from the first model with the cluster groups from the second model and help customers pick a vehicle of their choice or help car manufacturing companies identify segments for their advertising or production purposes.
While statistical measures lay the foundations for data-driven projects, cars must also be analysed through intuition, influence, and rational. Therefore, alongside statistical knowledge, I will use intuition and creativity along the way to make strategic decisions while building the models that could convey data-driven insights for future automobile productions. This report will help consumers pick an ideal car on the basis of their needs and also manufacturers to target certain segments of the population they want to cater to using the insights provided.
A gradient boosted decision tree was used to model the first phase, which was forecasting prices of cars. The clustering of different makes of cars was done using the K-Means algorithm on a PCA transformed dataset.