Rossmann is one of the largest drugstore chains in Europe with over 3,000 drugstores in 7 European countries. Store sales are influenced by many factors, including promotions, competition, school and state holidays, seasonality, and locality. Accurate sales projection depends largely on store's unique circumstances and the timing of the prediction.
In this project, I built machine leaning and neural network models to regress historical sales data and predict daily sales for 1115 Rossmann stores in Germany. The dataset used in this work can be found on Kaggle website.
https://www.kaggle.com/competitions/rossmann-store-sales/data
There are two notebooks in this repository and their structures are similar, except for Sections #4 and 5.
- Prepare Problem
a) Load libraries
b) Load dataset
- Summarize Data
a) Descriptive statistics
b) Data visualizations
- Prepare Data
a) Data Cleaning
b) Split data into train and test sets
c) Data Transforms
- Evaluate Algorithms
a) Spot check algorithms (cross-validation)
-
Notebook 1: machine learning (Linear Regression, Random Forest, and XGBoost) and neural network (Keras Sequential) models
-
Notebook 2: recurrent neural network (Long Short-Term Memory (LSTM) model)
b) Compare algorithms
- Finalize Model
a) Predictions on validation dataset
b) Save the model for later use
- Conclusions