Through this project I wanted to explore how two wings of ML i.e. Sentiment Analysis and Time Series Forecasting go hand in hand. It can be seen as a hybrid model of NLP and TSF.
- As one of the column is highly skewed and transforming them will make our model more ininterpretable, so instead of using Linear Regression various Non-linear Regressor are used to find best RMSE value.
- Also while finding correlation heatmap, it was found that there was no strong correlation of variables with target variable, another resason to not go for Linear Regression.
- A SARIMAX model was built for 'Closing Price' which is our target variable
- The data from sentiment analysis is used as exogenous varibles to SARIMAX model.
- The 'Time Varying Linear Regression' model gave the best result in terms of RMSE value.
- Also prediction for next 30 days are done and plotted along with their confidence interval as predicting robust numbers can be misleading.
It was observe when there is an auto-corelation between the entries of data, it's better to go for TSF instead of traditional regressors.