Ajay Krishna & Danilo Giarlini
"Emission by Country" from Kaggle: https://www.kaggle.com/datasets/thedevastator/global-fossil-co2-emissions-by-country-2002-2022
11 columns and 63000 rows (reduced to around 3000 for our purpose). Columns are: time, total emission, emission from coal, emission from gas etc.
Multivariate Time series
Vector Autoregression (VAR)
Design a multivariable time-series model that predicts the total and emissions per energy sector for a selected number of countries.
Visualize the total emission based on input features (using Python or Tableau).
Pick 10 categories (3 developed countries, 3 developing countries, 3 under developed countries and 1 global).
- Design a multivariable time-series model.
- Use VAR to predict the total and emissions per energy sector.
- Explore the data.
- Select the data for 10 categories mentioned above.
- Check if the dataset satisfies the prerequired tests of VAR (Granger’s Causality Test and Cointegration test). If the tests are failing, try dropping some columns or try trimming the data (as an example). And then, run the tests again.
- Implement VAR in python for one country (one of the subdatasets).
- Write a for loop and check if the VAR approach works for all the selected countries.
- Plot the results.
- Compare the accuracy in prediction (for the test set) for different countries.
-
Prediction using VAR for unseen time point, i.e., for the data set, predict the emissions from 2023 to 2030.
-
Plot the prediction.
-
Check if a mathematical relation between the total emissions and the sector-based emissions can be extracted from VAR. If yes, then
3a. evaluate which sector contributes the most to the total emission and
3b. think of a way whether sectors can be prioritized (e.g., feature importance concept using Lasso, Ridge).
- Further improvements and learning
- Further improvements and learning
-
Visualization on Tableau/Python to motivate the problem.
-
Presentation
2a. Make the structure and skeleton
2b. Brainstorm what we want to show
- Presentation final