This project aims to predict the popularity and pricing patterns of LEGO sets using machine learning techniques. By analyzing various attributes of LEGO sets, such as theme, year, number of parts, and availability, we aim to build models that can predict the popularity and pricing of LEGO sets accurately.
We used a dataset containing information on LEGO sets, including columns like 'Set_ID', 'Name', 'Year', 'Theme', 'Theme_Group', 'Subtheme', 'Category', 'Packaging', 'Num_Instructions', 'Availability', 'Pieces', 'Minifigures', 'Owned', 'Rating', 'USD_MSRP', 'Total_Quantity', and 'Current_Price'. The dataset was cleaned and preprocessed to prepare it for machine learning modeling.
We employed various machine learning models, including regression models for pricing prediction and classification models for popularity prediction. We also used feature engineering techniques to extract meaningful features from the dataset and improve the performance of our models. The models were trained and evaluated using techniques like cross-validation and hyperparameter tuning to ensure their robustness.
Our models achieved promising results in predicting the popularity and pricing of LEGO sets. The pricing model showed a high degree of accuracy in predicting the prices of LEGO sets, while the popularity model demonstrated good performance in predicting the popularity of LEGO sets based on various attributes.
In future iterations of this project, we plan to further refine our models by incorporating additional features and experimenting with different machine learning algorithms. We also aim to explore the use of image data to improve the accuracy of our models, as well as investigate the impact of external factors such as market trends and collector preferences on the pricing and popularity of LEGO sets.
- [James Pellicane]