Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ibmcloud #59

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
3,160 changes: 2,598 additions & 562 deletions Notebooks/02_data_wrangling.ipynb

Large diffs are not rendered by default.

25 changes: 22 additions & 3 deletions Notebooks/03_exploratory_data_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3463,7 +3463,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**A: 1** Your answer here"
"**A: 1** Your answer here: Montana is the third largest state in the list of states although it doesn't have a high population relative to the other states, meaning it's less densely populated. The state with the highest population is California. The state hosting the most resorts is New York. Montana is in the top five for largest skiable areas and New York is not. Another interesting statistic is that New York tops the Night Skiing Area.\n",
"\n",
"It was decided that resort density (resorts per state/state population & resorts per state/state size) could be useful in predicting price of tickets in a state. Possibly, high competition could be repressing ticket price. Vermont ranks high in both of these new features.\n",
"\n",
"Since the dataset had a lot of dimensions making it very complicated to analyse, principle components analysis (PCA) was used to bring the data back down to a lower dimension and make it more suitable to fit into a model. To do this, the data was first scaled, then the PCA transformation was fitted using the scaled data, the transformation was applied to the data to get derived features, and those features were plotted on a scatterplot of the two most significant components with the points color coded based on the ticket price.\n",
"\n",
"The results of the plot showed no obvious patterns for the price. This tempts us to treat all states the same in our analysis. Two states stood out as outliers in the plot for both components: New Hampshire and Vermont. Also, in analysis of the 2 components from the PCA, it was found that the second component was heavily influenced by resorts_per_100kcapita and resorts_per_100ksq_mile (0.662458 and 0.637691 respectively). These two states are both more than three standard deviations from the mean for the two features.The two outlier states mentioned above are both more than three standard deviations from the mean for the two features.\n",
"\n",
"Next, after analysing the data based on state, some more analysis was done based on resort data. This analysis was to incorporate the data that was taken from both resorts and state to get some information on how each resort in a state was able to share the state's resources such as skiable area and population(market). \n",
"\n",
"From a heatmap, it was discovered that the ratio between resort night skiing and total night skiing for the state was the most correlated with ticket price. This suggests that a greater share of night skiing capacity can lead to a higher price for the tickets for that resort. Other features that correlated well with ticket price were Runs and total_chairs. Another feature with positive correlation to price is vertical drop.\n",
"\n",
"After observing the correlations with ticket price for a number of features, a scatterplots were made for each feature with ticket price on the y-axis. This was to get a more clear view of the relationships of these features with ticket price. From the scatterplots, it was clear that there was a strong correlation of ticket price with vertical drop. Other features once again included fastQuads, Runs, and total_chairs. \n",
"\n",
"Ticket price at a low Resorts_per_100kcapita value seemed have quite a lot of variance, but as the value rose, the ticket price rose as well. It could possibly be because the more resorts there are in an area, the more popular that area is for skiing. This is just speculation ofcourse. \n",
"\n",
"Finally, the final step of the analysis was to visualize the relationship between chairs to runs ration and ticket price for the resorts. The relationship seemed to be negative although it wasn't a very strong correlation. Basically, the less chairs there were, the higher the ticket price. It is important to note that this doesn't necessarily mean more revenue was generated from less chairs. This could mean chairs were more expensive due to a shortage of chairs and that less customers were able to occupy the chairs. It could've been useful to have data about the number of customers per year.\n",
"\n",
"\n",
"\n"
]
},
{
Expand Down Expand Up @@ -3932,7 +3951,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -3946,7 +3965,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
"version": "3.9.7"
},
"toc": {
"base_numbering": 1,
Expand Down
Loading