Skip to content

Commit

Permalink
Update 05.data-analysis.md
Browse files Browse the repository at this point in the history
  • Loading branch information
sofia-frenk authored Nov 5, 2024
1 parent f481a1e commit 767c47c
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions content/05.data-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,22 +150,22 @@ It is worth noting that there is limited data available for multiple-carrier fli
Before beginning with SVD or even PCA, we must normalize the data. Since most of out variables are categorical, only two variables needed to be normalized. These two variables are Fuel_Consumption_normalized and CO2_Emitted_normalized, and their normalization values are shown in the bar chart below, in **Figure 3**. This provided a preview that perhaps SVD and PCA would not be needed, given the small number of numerical variables.

<p align="center">
<img src="images/Normalized_numerical_data.png" alt="Normalized Numerical Data" width="600px">
<img src="images/05.data-analysis-PCA-plot1.png" alt="Normalized Numerical Data" width="600px">
<br>
<strong>Figure 3:</strong> Bar chat showing the normalization of our two numerical variables.
</p>

Since there are only two numerical variables to be analysed in this dataset, only two singular values were created, as can be seen below in **Figure 4**.

<p align="center">
<img src="images/Singular_values_plot.png" alt="Plot of Singular Values" width="600px">
<img src="images/05.data-analysis-PCA-plot2.png" alt="Plot of Singular Values" width="600px">
<br>
<strong>Figure 4:</strong> Line plot with two points representing two singular values.
</p>
As we only have 2 numerical variables, it makes sense that most of the data points are concentrated around the first and second principal components, because they correspond to the two numerical variables. This can be seen below in **Figure 5**. Of course, there are the only two principal components. Because we have such few numerical variables, if we were to use PCA, we might lose valuable information. Hence, we will proceed with regression analysis in the next section of our project.

<p align="center">
<img src="images/First_second_PCA.png" alt="PCA Plot" width="600px">
<img src="images/05.data-analysis-PCA-plot3.png" alt="PCA Plot" width="600px">
<br>
<strong>Figure 5:</strong> First and second principal components.
</p>

0 comments on commit 767c47c

Please sign in to comment.