-
Notifications
You must be signed in to change notification settings - Fork 1
/
DvirBlanderCoreyKozlovskiStockMarketSuccess.Rmd
402 lines (328 loc) · 26.8 KB
/
DvirBlanderCoreyKozlovskiStockMarketSuccess.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
---
title: "StockMarketSuccess"
author: "Dvir Blander and Corey Kozlovski"
date: "4/26/2019"
output: pdf_document
---
## R Markdown
## Goal and Description of Project
Our goal for this project is to be able to predict stocks by combining four methods of forecasting and weighing each one based on their individual successes in order to create one method that can accurately predict each stock individually. More importantly, we want to learn and understand new methods of analyzing data and being able to apply them in real world situations. Specifically, we will look at the stock of “Disney”, “Tesla”, “Coca Cola” and based on our analysis, we will determine the future trends of these stocks. We will be using Time Series, Neural Networks, Linear Regression, and lastly our own qualitative analysis of current events in order to come to an aggregate conclusion for each stock.
## Importing All the Packages and loading all the Libraries
```{r }
library(quantmod)
library(TimeWarp)
library(xts)
library(forecast)
library(zoo)
library(lubridate)
library(neuralnet)
library(BBmisc)
```
Getting the Tesla Data from the quantmod library and a plot of the closing values.
```{r }
getSymbols(Symbols = "TSLA", auto.assign = TRUE)
plot(TSLA$TSLA.Close)
```
Getting the Coca Cola Data from the quantmod library and a plot of the closing values
```{r }
getSymbols(Symbols = "KO", auto.assign = TRUE)
plot(KO$KO.Close)
```
Getting the Disney Data from the quantmod library and a plot of the closing values.
```{r }
getSymbols(Symbols = "DIS", auto.assign = TRUE)
plot(DIS$DIS.Close)
```
## Description for Current Event Analysis
Lastly, we will be using a more qualitative form of analysis through the means of current events. This analysis is very different than the other three, but we think it will bring very useful insight and context to our predictions from the other models. As this is a more general analysis, it will be most useful in predicting overall trends in the company’s future in regards to real-life situations. Some drawbacks include being too reactionary to current events, as that would impact our predictions greatly, and furthermore this method isn't scientifically based as the other three are, but despite these shortcomings, we believe the context it provides will be essential to our predictions.
Current events for Tesla:
When it comes to Tesla and News, these two terms almost come hand in hand. Be it from Elon Musk’s Twitter antics, their full line-up of S-3-X-Y cars, and announcements of a new semi-truck and sports car, it can be very hard to use current events in predicting Tesla’s future. They have some amazing developments, including battery recycling and releasing new cars recently that are more budget friendly, but also face some sharp downfalls. These include potential SEC violations that Musk faces due to his antics, a slowing down of car manufacturing for Q1 of 2019, and having the electric car as a whole not fully developed and realized hurts them as well. Overall, it is hard to predict the future of Tesla based on current events, but within the context of society and the growth of electric vehicles, it can be safe to assume Tesla will grow.
Current events for Coca Cola:
The current events for Coca Cola are as follows. There is going to be advertising for Coca Cola in the newest Star Wars movie, which may bring significant revenue for the company. It is so a recession-proof business that has a healthy dividend, source: https://www.fool.com/investing/2019/04/14/3-reasons-coca-cola-is-a-buy.aspx. Additionally, it is a product that many people want even though people may try for healthier alternatives. Thus, expect continuous growth from Coca Cola.
Current ecents for Disney:
The current events for Disney are as follows. The company is releasing a streaming service called Disney Plus. It is going to be cheaper than Netflix but it is not expected to create much of a dent into Netflix’s market share but, it should create some slow growth for the company. Source:https://www.cnbc.com/2019/04/15/what-a-1000-dollar-investment-in-disney-10-years-ago-would-be-worth-now.html
## Description for Time Series
Starting with Time Series, more specifically through the Holt-Winters method, we understand that it is good at seeing long term trends overall and only relies on historical data, which is what we have through the means of the “quantmod” library in R. Some problems in relation to Holt-Winters Time Series is that historical data may not be particularly useful in forecasting future stocks and that it doesn’t take current events into account. To do this, we will be using the “forecast” library in conjunction with the native Holt-Winters Time Series method in R and analyzing the trend it predicts from the data.
Here is the Holt-Winters code for the Tesla dataset. First, we create the HoltWinters data set shown below. Here is a plot of the graph predicted by HoltWinters. This is the code that has HoltWinters predict what the future of the TSLA stock could be.
```{r }
TSLA.pred <- HoltWinters(TSLA$TSLA.Close, beta = FALSE, gamma = FALSE)
plot(TSLA.pred)
TSLA.pred2 <- forecast:::forecast.HoltWinters(TSLA.pred, h = 50)
TSLA.pred2
forecast:::plot.forecast(TSLA.pred2, xlab = "Number of Days Recorded with Quantmod", ylab = "Stock Price", main = "HoltWinters Tesla")
```
This shows the expected range in stock for the next 50 days.
Here is the Holt-Winters code for the Coca Cola dataset. First, we create the HoltWinters data set shown below. Here is a plot of the graph predicted by HoltWinters. This is the code that has HoltWinters predict what the future of the TSLA stock could be.
```{r }
KO.pred <- HoltWinters(KO$KO.Close, beta = FALSE, gamma = FALSE)
plot(KO.pred)
KO.pred2 <- forecast:::forecast.HoltWinters(KO.pred, h = 50)
KO.pred2
forecast:::plot.forecast(KO.pred2, xlab = "Number of Days Recorded with Quantmod", ylab = "Stock Price", main = "HoltWinters Coca Cola")
```
This shows the expected range in stock for the next 50 days.
Here is the Holt-Winters code for the Disney dataset. First, we create the HoltWinters data set shown below. Here is a plot of the graph predicted by HoltWinters. This is the code that has HoltWinters predict what the future of the TSLA stock could be.
```{r }
DIS.pred <- HoltWinters(DIS$DIS.Close, beta = FALSE, gamma = FALSE)
plot(DIS.pred)
DIS.pred2 <- forecast:::forecast.HoltWinters(DIS.pred, h = 50)
DIS.pred2
forecast:::plot.forecast(DIS.pred2, xlab = "Number of Days Recorded with Quantmod", ylab = "Stock Price", main = "HoltWinters Disney")
```
This shows the expected range in stock for the next 50 days.
## Description for Neural Networks
Moving forward with Neural Networks, we will be using the “neuralnet”, ”xts”, ”zoo”, ”lubridate”, ”BBmisc” libraries in r. We understand that it is beneficial for larger datasets and that it can be more efficient in terms of time to be run. However, our datasets are not in the size of millions and it is difficult to build and understand a neural network algorithm. Thus, it is to be expected that this method will be weighed less than other methods.
Here is the code for Tesla. We first start by creating a new dataframe, as the values we are using to learn are different than the other methods. We also have to filter the dataset, as seen in the for loop.
```{r }
TSLA.df <- data.frame("Open" = TSLA$TSLA.Open, "Close" = TSLA$TSLA.Close, "Year"=as.numeric(format(index(TSLA), "%Y")),
"Month"=as.numeric(format(index(TSLA), "%m")), "Day"=as.numeric(format(index(TSLA), "%d")),
"Is_Month_Start" = FALSE, "Is_Month_End" = FALSE, "Is_Year_End" = FALSE,
"Is_Year_Start" = FALSE, "Is_Quarter_Start" = FALSE, "Is_Quarter_End" = FALSE,
"Quarters"=quarters(index(TSLA)))
for (i in 1:nrow(TSLA.df)){
date <- as.POSIXlt(paste(TSLA.df[i,]$Year, "-", TSLA.df[i,]$Month, "-", TSLA.df[i,]$Day, sep=""))
#Determine if it is beginning/end of the month and change dataset accordingly
if (TSLA.df[i,]$Day <= 7){ #7 is the first ~25% of the month
TSLA.df[i,]$Is_Month_Start = TRUE
} else if (TSLA.df[i,]$Day > 24){ #24 is the last ~25% of the month
TSLA.df[i,]$Is_Month_End = TRUE
}
#Determine if it is the beginning/end of the year and change dataset accordingly
if (TSLA.df[i,]$Quarters == 'Q1'){ #Q1 is the first ~25% of the year
TSLA.df[i,]$Is_Year_Start = TRUE
} else if (TSLA.df[i,]$Quarters == 'Q4'){ #Q4 is the last ~25% of the year
TSLA.df[i,]$Is_Year_End = TRUE
}
#Determine if it is the beginning of the quarter and change dataset accordingly
if (yday(date) <= 22){ #days 1-22 is the first ~25% of Q1
TSLA.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 91 & yday(date) <= 113){ #days 91-113 is the first ~25% of Q2
TSLA.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 182 & yday(date) <=204){ #days 182-204 is the first ~25% of Q3
TSLA.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 273 & yday(date) <=296){ #days 273-296 is the first ~25% of Q4
TSLA.df[i,]$Is_Quarter_Start = TRUE
}
#Determine if it is the end of the quarter and change dataset accordingly
if (yday(date) <= 91 & yday(date) >= 69){ #days 69-91 is the last ~25% of Q1
TSLA.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 182 & yday(date) >= 160){ #days 160-182 is the last ~25% of Q2
TSLA.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 273 & yday(date) >= 251){ #days 251-273 is the last ~25% of Q3
TSLA.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 366 & yday(date) >= 343){ #days 343-365 is the last ~25% of Q4
TSLA.df[i,]$Is_Quarter_End = TRUE
}
}
TSLA.df <- TSLA.df[,1:11]#Remove the last column, as it was only used to clean data
#change true/false values into 1/0s to prepare for normalization.
cols <- sapply(TSLA.df, is.logical)
TSLA.df[,cols] <- lapply(TSLA.df[,cols], as.numeric)
```
To use neural networks, we have to normalize the data set. We used the BBmsic library to help us with this.
```{r}
TSLA.df <- BBmisc::normalize(TSLA.df)
```
Now, we make a testing and training set to evaluate the performance of neural networks. We decided to use 75% of the original data set.
```{r}
TSLA.train <- TSLA.df[1:(.75 * nrow(TSLA.df)),]
TSLA.test <- TSLA.df[(.75 * nrow(TSLA.df)):nrow(TSLA.df),]
```
We now create the neural network and display it.
```{r}
TSLA.NN <- neuralnet(TSLA.Close ~ TSLA.Open + Year + Month + Day + Is_Month_Start + Is_Month_End + Is_Year_End +
Is_Year_Start + Is_Quarter_Start + Is_Quarter_End
, data = TSLA.train, linear.output=TRUE, hidden=c(5,3,3))
plot(TSLA.NN)
```
Using the created neural network, we predict the values and compare them to the actual values. The red line indicates where the training data ends and where the prediction and actual values, respectively, begin.
```{r}
TSLA.pred <- predict(TSLA.NN, TSLA.test)
TSLA.pred.total <- c(TSLA.train$TSLA.Close, TSLA.pred)
plot(TSLA.pred.total, xlab="Number of Days recorded with Quantmod", ylab="Predicted Stock Value Normalized", main="Normalized Stock Predictions of TSLA")
abline(v = (.75 * nrow(TSLA.df)), col='red')
plot(TSLA.df$TSLA.Close, xlab="Number of Days recorded with Quantmod", ylab="Actual Stock Value Normalized", main="Actual Stock Values of TSLA")
abline(v = (.75 * nrow(TSLA.df)), col='red')
```
Comparing both graphs, we can see that our predictor did a pretty poor job of predicting the values for TSLA. It predicted much less volatility than what actually happened, and did not predict it to go nearly as high as it did based on the normalized scale. The prediction maxed out at just above 1, but the actual values went more towards 1.5. Other than that, it did manage to predict that there would be 3 large "dips", as both the predicted graph and the actual graph show it. I would not recommend using this method, with the parameters we chose, to predict, but it did do well in seeing dips.
Here is the code for Coca Cola. We first start by creating a new dataframe, as the values we are using to learn are different than the other methods. We also have to filter the dataset, as seen in the for loop.
```{r }
KO.df <- data.frame("Open" = KO$KO.Open, "Close" = KO$KO.Close, "Year"=as.numeric(format(index(KO), "%Y")),
"Month"=as.numeric(format(index(KO), "%m")), "Day"=as.numeric(format(index(KO), "%d")),
"Is_Month_Start" = FALSE, "Is_Month_End" = FALSE, "Is_Year_End" = FALSE,
"Is_Year_Start" = FALSE, "Is_Quarter_Start" = FALSE, "Is_Quarter_End" = FALSE,
"Quarters"=quarters(index(KO)))
for (i in 1:nrow(KO.df)){
date <- as.POSIXlt(paste(KO.df[i,]$Year, "-", KO.df[i,]$Month, "-", KO.df[i,]$Day, sep=""))
#Determine if it is beginning/end of the month and change dataset accordingly
if (KO.df[i,]$Day <= 7){ #7 is the first ~25% of the month
KO.df[i,]$Is_Month_Start = TRUE
} else if (KO.df[i,]$Day > 24){ #24 is the last ~25% of the month
KO.df[i,]$Is_Month_End = TRUE
}
#Determine if it is the beginning/end of the year and change dataset accordingly
if (KO.df[i,]$Quarters == 'Q1'){ #Q1 is the first ~25% of the year
KO.df[i,]$Is_Year_Start = TRUE
} else if (KO.df[i,]$Quarters == 'Q4'){ #Q4 is the last ~25% of the year
KO.df[i,]$Is_Year_End = TRUE
}
#Determine if it is the beginning of the quarter and change dataset accordingly
if (yday(date) <= 22){ #days 1-22 is the first ~25% of Q1
KO.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 91 & yday(date) <= 113){ #days 91-113 is the first ~25% of Q2
KO.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 182 & yday(date) <=204){ #days 182-204 is the first ~25% of Q3
KO.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 273 & yday(date) <=296){ #days 273-296 is the first ~25% of Q4
KO.df[i,]$Is_Quarter_Start = TRUE
}
#Determine if it is the end of the quarter and change dataset accordingly
if (yday(date) <= 91 & yday(date) >= 69){ #days 69-91 is the last ~25% of Q1
KO.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 182 & yday(date) >= 160){ #days 160-182 is the last ~25% of Q2
KO.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 273 & yday(date) >= 251){ #days 251-273 is the last ~25% of Q3
KO.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 366 & yday(date) >= 343){ #days 343-365 is the last ~25% of Q4
KO.df[i,]$Is_Quarter_End = TRUE
}
}
KO.df <- KO.df[,1:11]#Remove the last column, it was only used to clean data
#change true/false values into 1/0s
cols <- sapply(KO.df, is.logical)
KO.df[,cols] <- lapply(KO.df[,cols], as.numeric)
```
To use neural networks, we have to normalize the data set. We used the BBmsic library to help us with this.
```{r}
KO.df <- BBmisc::normalize(KO.df)
```
Now, we make a testing and training set to evaluate the performance of neural networks. We decided to use 75% of the original data set.
```{r}
KO.train <- KO.df[1:(.75 * nrow(KO.df)),] #75% of data set
KO.test <- KO.df[(.75 * nrow(KO.df)):nrow(KO.df),]
```
We now create the neural network and display it.
```{r}
KO.NN <- neuralnet(KO.Close ~ KO.Open + Year + Month + Day + Is_Month_Start + Is_Month_End + Is_Year_End +
Is_Year_Start + Is_Quarter_Start + Is_Quarter_End
, data = KO.train, linear.output=TRUE, hidden=c(5,3,3))
plot(KO.NN)
```
Using the created neural network, we predict the values and compare them to the actual values. The red line indicates where the training data ends and where the prediction and actual values, respectively, begin.
```{r}
KO.pred <- predict(KO.NN, KO.test)
KO.pred.total <- c(KO.train$KO.Close, KO.pred)
plot(KO.pred.total, xlab="Number of Days recorded with Quantmod", ylab="Predicted Stock Value Normalized", main="Normalized Stock Predictions of KO")
abline(v = (.75 * nrow(KO.df)), col='red')
plot(KO.df$KO.Close, xlab="Number of Days recorded with Quantmod", ylab="Actual Stock Value Normalized", main="Actual Stock Values of KO")
abline(v = (.75 * nrow(KO.df)), col='red')
```
Surprisingly, this model worked very well for Coca Cola. It managed to predict the 3 peaks fairly accurately and both dips as well. It is nearly exact overall but the only sigificant difference would be that the neural network was significantly less aggressive when predicting peaks. The actual values ended up going a bit higher than predicted, but they were very close. For this data set, I would recommend using this method.
Here is the code for Disney. We first start by creating a new dataframe, as the values we are using to learn are different than the other methods. We also have to filter the dataset, as seen in the for loop.
```{r }
DIS.df <- data.frame("Open" = DIS$DIS.Open, "Close" = DIS$DIS.Close, "Year"=as.numeric(format(index(DIS), "%Y")),
"Month"=as.numeric(format(index(DIS), "%m")), "Day"=as.numeric(format(index(DIS), "%d")),
"Is_Month_Start" = FALSE, "Is_Month_End" = FALSE, "Is_Year_End" = FALSE,
"Is_Year_Start" = FALSE, "Is_Quarter_Start" = FALSE, "Is_Quarter_End" = FALSE,
"Quarters"=quarters(index(DIS)))
for (i in 1:nrow(DIS.df)){
date <- as.POSIXlt(paste(DIS.df[i,]$Year, "-", DIS.df[i,]$Month, "-", DIS.df[i,]$Day, sep=""))
#Determine if it is beginning/end of the month and change dataset accordingly
if (DIS.df[i,]$Day <= 7){ #7 is the first ~25% of the month
DIS.df[i,]$Is_Month_Start = TRUE
} else if (DIS.df[i,]$Day > 24){ #24 is the last ~25% of the month
DIS.df[i,]$Is_Month_End = TRUE
}
#Determine if it is the beginning/end of the year and change dataset accordingly
if (DIS.df[i,]$Quarters == 'Q1'){ #Q1 is the first ~25% of the year
DIS.df[i,]$Is_Year_Start = TRUE
} else if (DIS.df[i,]$Quarters == 'Q4'){ #Q4 is the last ~25% of the year
DIS.df[i,]$Is_Year_End = TRUE
}
#Determine if it is the beginning of the quarter and change dataset accordingly
if (yday(date) <= 22){ #days 1-22 is the first ~25% of Q1
DIS.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 91 & yday(date) <= 113){ #days 91-113 is the first ~25% of Q2
DIS.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 182 & yday(date) <=204){ #days 182-204 is the first ~25% of Q3
DIS.df[i,]$Is_Quarter_Start = TRUE
} else if (yday(date) > 273 & yday(date) <=296){ #days 273-296 is the first ~25% of Q4
DIS.df[i,]$Is_Quarter_Start = TRUE
}
#Determine if it is the end of the quarter and change dataset accordingly
if (yday(date) <= 91 & yday(date) >= 69){ #days 69-91 is the last ~25% of Q1
DIS.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 182 & yday(date) >= 160){ #days 160-182 is the last ~25% of Q2
DIS.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 273 & yday(date) >= 251){ #days 251-273 is the last ~25% of Q3
DIS.df[i,]$Is_Quarter_End = TRUE
} else if (yday(date) <= 366 & yday(date) >= 343){ #days 343-365 is the last ~25% of Q4
DIS.df[i,]$Is_Quarter_End = TRUE
}
}
DIS.df <- DIS.df[,1:11]#Remove the last column, it was only used to clean data
#change true/false values into 1/0s
cols <- sapply(DIS.df, is.logical)
DIS.df[,cols] <- lapply(DIS.df[,cols], as.numeric)
```
To use neural networks, we have to normalize the data set. We used the BBmsic library to help us with this.
```{r}
DIS.df <- BBmisc::normalize(DIS.df)
```
Now, we make a testing and training set to evaluate the performance of neural networks. We decided to use 75% of the original data set.
```{r}
DIS.train <- DIS.df[1:(.75 * nrow(DIS.df)),] #75% of data set
DIS.test <- DIS.df[(.75 * nrow(DIS.df)):nrow(DIS.df),]
```
We now create the neural network and display it.
```{r}
DIS.NN <- neuralnet(DIS.Close ~ DIS.Open + Year + Month + Day + Is_Month_Start + Is_Month_End + Is_Year_End +
Is_Year_Start + Is_Quarter_Start + Is_Quarter_End
, data = DIS.train, linear.output=TRUE, hidden=c(5,3,3))
plot(DIS.NN)
```
Using the created neural network, we predict the values and compare them to the actual values. The red line indicates where the training data ends and where the prediction and actual values, respectively, begin.
```{r}
DIS.pred <- predict(DIS.NN, DIS.test)
DIS.pred.total <- c(DIS.train$DIS.Close, DIS.pred)
plot(DIS.pred.total, xlab="Number of Days recorded with Quantmod", ylab="Predicted Stock Value Normalized", main="Normalized Stock Predictions of DIS")
abline(v = (.75 * nrow(DIS.df)), col='red')
plot(DIS.df$DIS.Close, xlab="Number of Days recorded with Quantmod", ylab="Actual Stock Value Normalized", main="Actual Stock Values of DIS")
abline(v = (.75 * nrow(DIS.df)), col='red')
```
Again, this model performed exceedingly well! It was able to predict 3 peaks, 3 dips, and even manage to predict, towards the very end, the start of a peak. As with the last one, Neural Networks seem to be less aggressive with the predictions overall, as the actual data set reached and exceeded 2, but the predictions didn't quite get there. I would again recommend using this model for our overall stock market prediciton.
## Description for Linear Regression
Onward, we will be looking into the Linear Regression method of predicting the future values of our chosen stocks. The method in which this will be executed is by choosing the most recent trend in the data by looking at the plot for each graph and finding a linear regression of the graphs to make predictions. Some benefits for this method is that it will use all of the data and fit the data in the best possible way and the model will also favor newer data rather than older data. Moreover, a property in Calculus is that if one restricts the domain in any graph enough, the graph will show up as linear. This is why Linear Regression is a valid form of analyzinng the data. However, there is a risk in excluding necessary data to create the trend of the graph. Additionally, this is a pretty simplistic way of predicting future stocks. Thus, this may be weighed less than other options. The Timewarp library was used in addition to the quantmod library.
Here are the steps for Linear Regression in the Tesla dataset. First we restrict the domain of the dataset to only view the most recent trend of the data. We next use Chartseries with subset as the main argument. We then use a linear model on the same range of the chart in addition to plotting the chart.
```{r }
subset <- TSLA[1700:nrow(TSLA),]
chartSeries(subset, TA = NULL, theme = "white", up.col = "green", dn.col = "red")
indices = 1:nrow(subset)
model=lm(TSLA.Close~indices,data=subset)
abline(model$coefficients[1],model$coefficients[2])
```
Based on the graph above, Tesla seems to be a stock that is decreasing in value.
Here are the steps for Linear Regression in the Coca Cola dataset. First we restrict the domain of the dataset to only view the most recent trend of the data. We next use Chartseries with subset as the main argument. We then use a linear model on the same range of the chart in addition to plotting the chart.
```{r }
subset = KO[1500:nrow(KO),]
chartSeries(subset, TA = NULL, theme = "white", up.col = "green", dn.col = "red")
indices = 1:nrow(subset)
model=lm(KO.Close~indices,data=subset)
abline(model$coefficients[1],model$coefficients[2])
```
Based on the graph above, Coca Cola seems to be a stock that is increasing in value.
Here are the steps for Linear Regression in the Disney dataset. First we restrict the domain of the dataset to only view the most recent trend of the data. We next use Chartseries with subset as the main argument. We then use a linear model on the same range of the chart in addition to plotting the chart.
```{r }
subset <- DIS[2000:nrow(DIS),]
chartSeries(subset, TA = NULL, theme = "white", up.col = "green", dn.col = "red")
indices = 1:nrow(subset)
model=lm(DIS.Close~indices,data=subset)
abline(model$coefficients[1],model$coefficients[2])
```
Based on the graph above, Disney seems to be a stock that is increasing in value.
## Conclusion
Based on the pros and cons of each method of forecasting, we are going to weigh each model differently, placing higher weight on Time-Series and Current Events, and lower weights on Neural Networks and Linear Regression. The different weights for each model will be based on how we expect each model to contribute to our conclusion. Since our pros and cons for the Time Series and the Current Events has led us to believe more in those methods, they will have higher contribution than Neural Networks and Linear Regression. If we learn that our method is successful, we think it could be very useful in predicting the outcome of future stocks for a variety of other companies as well.
In terms of Tesla, we predicted that the current events would have an increase for the stock value. The Holt-winters analysis predicted a large range of values for the future, which creates a lot of uncertainty. The Linear regression analysis showed that the Tesla stock tended to decrease. The Neural Network analysis did not really fit the Tesla data and we wouldn’t feel safe using this, in a different context, to predict Tesla’s future. Thus, we are not sure what will happen with the Tesla stock in the future due to the conflicting analysis’ for Linear Regression and Current Events.
For Coca Cola, we found that in almost every test that it would end up growing. For Current Events, we found that everything that is happening for that company overall benefits them and Holt Winters Time Series we found that the range of values isn’t nearly as volatile as Tesla’s. These two tests we valued higher than the other two, but despite that, we found that Linear Regression also increased alongside having Neural Networks be very accurate in predictions. With that knowledge, if we could apply Neural Networks in a different context and predict it, we believe it would also see an increase. Overall, all of our tests tend to lead towards Coca Cola only growing from here.
Finally, the Disney stock should grow slowly based on the following results. The Current Event analysis had the stock increasing, the Linear Regression indicated in an increasing stock value, and the Holt-Winters had a smaller range of possible values for Disney. The Neural Networks was also very accurate in its analysis. With that knowledge, if we could apply Neural Networks in a different context and predict it, we believe it would also see an increase.
From this project, we noticed a few trends that were apparent. First of all, if Holt Winters performed well and had not that much uncertainty, we found that Neural Networks would perform very well. If it did, Neural Networks struggled to perform, as seen in the tests ran on Tesla. In addition, we found that for all 3 companies, current events would always lead to us believing in an increase to the stock value. This could be very dependent on our biases, in the sense of how we value each event, in addition to news source articles that we received the information from. Overall, it was very interesting applying and learning new ways to predict real-life situations.