Skip to content

Commit

Permalink
Update 08.decision_tree.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sofia-frenk authored Nov 15, 2024
1 parent 810ba40 commit efc7aa3
Showing 1 changed file with 23 additions and 0 deletions.
23 changes: 23 additions & 0 deletions content/08.decision_tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Decision Tree Analysis

This section is dedicated to decision tree analysis. Because the dependent variable is not categorical, the DecisionTreeRegressor from scikit-learn was employed.
After the first decision tree was created, using the original dataset (with Duration_hours and Duration_min combined into a single variable Total_Duration), the \( R^2 \) value was 0.999977. This value seemed suspuciously perfect.
The effect of the high correlation value can also be seen in the figure below, which is a plot of the actual vs predicted value, and as can be seen the predicted values fall almost perfectly along the actual values.

<p align="center">
<img src="images/Actual_Predicted_Vals_Original_Data.png" alt="Actual vs predicted values for the original dataset" width="600px">
<br>
<strong>Figure 7:</strong> Correlation matrix created using the original dataset.
</p>



To understand the origins of this \( R^2 \) value, firstly a correlation plot was created. The first correlation plot is seen below in Figure 8:

<p align="center">
<img src="images/Correlation_Mat_Original_Data.png" alt="Correlation matrix created using the original dataset" width="600px">
<br>
<strong>Figure 8:</strong> Correlation matrix created using the original dataset.
</p>

As can be seen from the figure above, the highest correlation appears between Total_Duration and CO2_Emitted (US Ton), the depenent variable. This makes sense, of course, because the longer the plane is in flight, the more \( CO_2 \) will be emitted.

0 comments on commit efc7aa3

Please sign in to comment.