-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
810ba40
commit efc7aa3
Showing
1 changed file
with
23 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
# Decision Tree Analysis | ||
|
||
This section is dedicated to decision tree analysis. Because the dependent variable is not categorical, the DecisionTreeRegressor from scikit-learn was employed. | ||
After the first decision tree was created, using the original dataset (with Duration_hours and Duration_min combined into a single variable Total_Duration), the \( R^2 \) value was 0.999977. This value seemed suspuciously perfect. | ||
The effect of the high correlation value can also be seen in the figure below, which is a plot of the actual vs predicted value, and as can be seen the predicted values fall almost perfectly along the actual values. | ||
|
||
<p align="center"> | ||
<img src="images/Actual_Predicted_Vals_Original_Data.png" alt="Actual vs predicted values for the original dataset" width="600px"> | ||
<br> | ||
<strong>Figure 7:</strong> Correlation matrix created using the original dataset. | ||
</p> | ||
|
||
|
||
|
||
To understand the origins of this \( R^2 \) value, firstly a correlation plot was created. The first correlation plot is seen below in Figure 8: | ||
|
||
<p align="center"> | ||
<img src="images/Correlation_Mat_Original_Data.png" alt="Correlation matrix created using the original dataset" width="600px"> | ||
<br> | ||
<strong>Figure 8:</strong> Correlation matrix created using the original dataset. | ||
</p> | ||
|
||
As can be seen from the figure above, the highest correlation appears between Total_Duration and CO2_Emitted (US Ton), the depenent variable. This makes sense, of course, because the longer the plane is in flight, the more \( CO_2 \) will be emitted. |