Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 2 #10

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
128 changes: 128 additions & 0 deletions Data/IODS 2022 Exercise set 2_JV08112022.r
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
####################################
### Jussi Pekka Vehviläinen 08112022
### IODS 2022 Exercise set 2
####################################


# LIBRARIES

# Own functions

#########################################
### MAIN CODE
#########################################

### Data wrangling
# PART 1
# SET UP WORKING DIR AND BUILD UP DATA FOLDER
setwd("D://IODS-project")
dir.create(paste0(getwd(),"/Data"))
setwd(paste0(getwd(),"/Data"))

# PART 2
# Bring .txt file to R environment which has been downloaded to Data folder
learning_2014<-as.data.frame(read.table("learning2014.txt", header = TRUE, sep = "\t"))

''' Table should have 183 rows and 60 cols containing integer values in all except last column
which contains information about gender by character '''

# PART 2
''' Subset Data set to contain columns: gender, age, attitude, deep, stra, surf and points
Deep is calculated by taking the mean of cols: c("D03","D11","D19","D27","D03","D11","D19","D27","D06","D15","D23","D31") and excluding 0
Stra is calculated by taking the mean of cols: c("ST01","ST09","ST17","ST25","ST04","ST12","ST20","ST28") and excluding 0
Surf is calculated by taking the mean of cols: c("SU02","SU10","SU18","SU26","SU05","SU13","SU21","SU29","SU08","SU16","SU24","SU32") and excluding 0'''

# Analysis dataset
analy_dataset<-subset(learning_2014, as.numeric(Points)!=0)
analy_dataset[analy_dataset==0]<-NA

# New DataFrame
df<-data.frame(Gender=analy_dataset$gender,Age=analy_dataset$Age,Attitude=NA, Deep=NA,Stra=NA, Surf=NA, Points=analy_dataset$Points)

# Add Deep by taking the mean
deep <- c("D03","D11","D19","D27","D03","D11","D19","D27","D06","D15","D23","D31")
df$Deep<-rowMeans(analy_dataset[,colnames(analy_dataset) %in% deep], na.rm=TRUE)

# Add Stra by taking the mean
stra <- c("ST01","ST09","ST17","ST25","ST04","ST12","ST20","ST28")
df$Stra<-rowMeans(analy_dataset[,colnames(analy_dataset) %in% stra], na.rm=TRUE)

# Add Surf by taking the mean
surf <- c("SU02","SU10","SU18","SU26","SU05","SU13","SU21","SU29","SU08","SU16","SU24","SU32")
df$Surf<-rowMeans(analy_dataset[,colnames(analy_dataset) %in% surf], na.rm=TRUE)

# Add Attitude by taking the mean
attitude <- c("Da","Db","Dc","Dd","De","Df","Dg","Dh","Di","Dj")
df$Attitude<-rowMeans(analy_dataset[,colnames(analy_dataset) %in% attitude], na.rm=TRUE)

# PART 4
# Change working dir
setwd("D://IODS-project")
# Write results to .csv format file
write.csv2(df,"Data/learning2014.csv")
# Read saved .csv file. Header= TRUE means that file has headers. sep=";" means that values are seperated by ;. dec="," means that in the file , has been used for decimals
learning_2014.2<-as.data.frame(read.table("Data/learning2014.csv", header = TRUE, sep = ";", dec=","))

#########################################
### Analysis
#########################################

# PART 1
# Read file to R environment
''' Data set contains summary results of course 'Johdatus yhteiskuntatilastotieteeseen, syksy 2014' survey.
There should be 7 variables (gender, age, attitude, deep, stra, surf and points) and 166 observations.
Gender: Male = 1 Female = 2
Age: Age (in years) derived from the date of birth
Attitude: Global attitude toward statistics. Mean of original variables (~Da+Db+Dc+Dd+De+Df+Dg+Dh+Di+Dj)
Deep: Deep approach. Mean of original variables (~d_sm+d_ri+d_ue)
Stra: Strategic approach. Mean of original variables ( ~st_os+st_tm)
Surf: Surface approach. Mean of original variables (~su_lp+su_um+su_sb)
Points: Total counts from survey.
More information about used variables can be found from http://www.helsinki.fi/~kvehkala/JYTmooc/JYTOPKYS2-meta.txt'''

learning_2014.2<-as.data.frame(read.table("Data/learning2014.csv", header = TRUE, sep = ";", dec=","))
# Dimensions of DataFrame
dim(learning_2014.2)

# PART 2
# Summary
print(summary(learning_2014.2[,2:ncol(learning_2014.2)]))

pairs(learning_2014.2[-(1:2)])

# Scatterplot
p <- ggpairs(learning_2014.2[-(1:2)], mapping = aes(), lower = list(combo = wrap("facethist", bins = 20)))
print(p)

''' Scatterplot matrix is used to describe relationships between the variables. It's constructed from the dataframe with ggpairs -function (ggplot2 -package).
Result plot shows in additon of variables relationships variables diverging and gives correlation coefficients with asterix showing level of significance.
Most promising relationship seems to be between: Attitude and Points, and Surf and Deep.
There seems to be also somekind of relationship between: Stra and Surf.
As overall, matrix gives good overlook of data, and starting point to study more relationships between variables'''


# PART 3 and Part4
# Create a regression model with multiple explanatory variables
my_model2 <- lm(Points ~ Attitude + Stra + Surf, data = learning_2014.2)
summary(my_model2)
'''Summary of a regression model shows that only Attitude seems to correlate significantly with Points.
From the print, you can see model residiuals; summary table (estimate value, std. error, t-value and p-value for all variables in model against Points).
Significance of variable correlation can be read from p-value(last column of Coefficients table). Significance levels threshols are given under the table.
Summary gives also p-value for whole model which isn't significant because model contains variables that hasn't have relationship to Points.
In next step, lest remove other variables from the model and see what happens.'''

# Only Attitude seems to be significant, so lets do model again with only adding it
my_model3 <- lm(Points ~ Attitude, data = learning_2014.2)
summary(my_model3)
'''Now results seems to be better and p-value is significant for the variable relationship as well as for the model.
Multiple R-squared is the proportion of the variation in dependent variable that can be explained by the independent variable. So in the model where we haved three variables
20,74 % of the variation in Points can be explained by variables. But now intresting is that Attitude by itself explains 19,06 % of the variation. Showing us that Stra and Surf effect to the points is pretty minimal.'''

# PART 5
# Study more our model with diagnostic plots
plot(my_model3, which=c(1,2,5))

# From the Residual vs leveragre plot we can check which and how many of observation are influential. In our case data seems good and there isn't any point outside Cook distance lines.
# Also residual vs Fitted plot seems good. Data is divided evenly in x - and y-axel.
# QQ-plot also indicates goodness of our model. If the points runs out too early from the line, there migth be some other variables effecting our relationship more than the Attitude variable.
# In this case QQplot seems to be really nice, but no perfect.
167 changes: 167 additions & 0 deletions Data/learning2014.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
"";"Gender";"Age";"Attitude";"Deep";"Stra";"Surf";"Points"
"1";"F";53;3,7;3,5;3,375;2,58333333333333;25
"2";"M";55;3,1;3,125;2,75;3,16666666666667;12
"3";"F";49;2,5;3,25;3,625;2,25;24
"4";"M";53;3,5;3,375;3,125;2,25;10
"5";"M";49;3,7;3,75;3,625;2,83333333333333;22
"6";"F";38;3,8;4,625;3,625;2,41666666666667;21
"7";"M";50;3,5;3,625;2,25;1,91666666666667;21
"8";"F";37;2,9;3,25;4;2,83333333333333;31
"9";"M";37;3,8;4,375;4,25;2,16666666666667;24
"10";"F";42;2,1;4;3,5;3;26
"11";"M";37;3,9;3,625;3,625;2,66666666666667;31
"12";"F";34;3,8;4;4,75;2,41666666666667;31
"13";"F";34;2,4;4,25;3,625;2,25;23
"14";"F";34;3;2,875;3,5;2,75;25
"15";"M";35;2,6;4;1,75;2,33333333333333;21
"16";"F";33;4,1;3,875;3,875;2,33333333333333;31
"17";"F";32;2,6;3,75;1,375;2,91666666666667;20
"18";"F";44;2,6;3,75;3,25;2,5;22
"19";"M";29;1,7;4,125;3;3,75;9
"20";"F";30;2,7;3,875;3,75;2,75;24
"21";"M";27;3,9;3,875;2,625;2,33333333333333;28
"22";"M";29;3,4;4;2,375;2,41666666666667;30
"23";"F";31;2,7;4;3,625;3;24
"24";"F";37;2,3;3,5;2,75;2,41666666666667;9
"25";"F";26;3,7;3,75;1,75;2,83333333333333;26
"26";"F";26;4,4;4,625;3,25;3,16666666666667;32
"27";"M";30;4,1;3,875;4;3;32
"28";"F";33;3,7;3,75;3,625;2;33
"29";"F";33;2,5;3,25;2,875;3,5;29
"30";"M";28;3;3,75;3;3,75;30
"31";"M";26;3,4;4,875;1,625;2,5;19
"32";"F";27;3,2;3,375;3,25;2,08333333333333;23
"33";"F";25;2;2,625;3,5;2,41666666666667;19
"34";"F";31;2,4;3,75;3;2,58333333333333;12
"35";"M";20;4,2;4,5;3,25;1,58333333333333;10
"36";"F";39;1,6;3,875;1,875;2,83333333333333;11
"37";"M";38;3,1;3,875;4,375;1,83333333333333;20
"38";"M";24;3,8;2,875;3,625;2,41666666666667;26
"39";"M";26;3,8;2,125;2,5;3,25;31
"40";"M";25;3,3;3,125;1,25;3,41666666666667;20
"41";"F";30;1,7;4,375;4;3,41666666666667;23
"42";"F";25;2,5;2,75;3;3,16666666666667;12
"43";"M";30;3,2;3,125;2,5;3,5;24
"44";"F";48;3,5;4;4,875;2,66666666666667;17
"45";"F";24;3,2;3,5;5;2,41666666666667;29
"46";"F";40;4,2;4,5;4,375;3,58333333333333;23
"47";"M";25;3,1;3,25;3,25;2,08333333333333;28
"48";"F";23;3,9;3,75;4;3,75;31
"49";"F";25;1,9;4;3,125;2,91666666666667;23
"50";"F";23;2,1;3,125;2,5;2,91666666666667;25
"51";"M";27;2,5;4;3,125;2,41666666666667;18
"52";"M";25;3,2;3,625;3,25;3;19
"53";"M";23;3,2;2,75;2,125;3,41666666666667;22
"54";"F";23;2,6;4;2,75;2,91666666666667;25
"55";"F";23;2,3;2,875;2,375;3,25;21
"56";"F";45;3,8;2,375;3,125;3,25;9
"57";"F";22;2,8;4,125;4;2,33333333333333;28
"58";"F";23;3,3;3;4;3,25;25
"59";"M";21;4,8;3,25;2,25;2,5;29
"60";"M";21;4;4,125;3,25;1,75;33
"61";"F";21;4;4,125;3,625;2,25;33
"62";"F";21;4,7;3;3,625;2,08333333333333;25
"63";"F";26;2,3;3,375;2,5;2,83333333333333;18
"64";"F";25;3,1;4,625;1,875;2,83333333333333;22
"65";"F";26;2,7;3;2;2,41666666666667;17
"66";"M";21;4,1;3,375;1,875;2,25;25
"67";"F";23;3,4;3,5;4;2,83333333333333;28
"68";"F";22;2,5;3,625;2,875;2,25;22
"69";"F";22;2,1;1,375;3,875;1,83333333333333;26
"70";"F";22;1,4;3,125;2,5;2,91666666666667;11
"71";"F";23;1,9;4,25;2,75;2,91666666666667;29
"72";"M";22;3,7;4,25;4,5;2,08333333333333;22
"73";"M";23;3,2;4,75;3,375;2,33333333333333;21
"74";"M";24;2,8;3,625;2,625;2,41666666666667;28
"75";"F";22;4,1;3,375;4,125;2,75;33
"76";"F";23;2,5;4,125;2,625;3,25;16
"77";"M";22;2,8;3,75;2,25;1,75;31
"78";"M";20;3,8;3,875;2,75;2,58333333333333;22
"79";"M";22;3,1;3,25;3;3,33333333333333;31
"80";"M";21;3,5;4,625;1,625;2,83333333333333;23
"81";"F";22;3,6;4,25;1,875;2,5;26
"82";"F";23;2,6;4,125;3,375;2,41666666666667;12
"83";"M";21;4,4;4,125;3,75;2,41666666666667;26
"84";"M";22;4,5;4,25;2,125;2,58333333333333;31
"85";"M";29;3,2;3,5;2,375;3;19
"86";"F";29;3,9;3,125;2,75;2;30
"87";"F";21;2,5;3;3,125;3,41666666666667;12
"88";"M";28;3,3;3,875;3,5;2,83333333333333;17
"89";"F";21;3,3;4,375;2,625;2,25;18
"90";"F";30;3;3,75;3,375;2,75;19
"91";"F";21;2,9;3,75;2,25;3,91666666666667;21
"92";"M";23;3,3;3,875;3;2,33333333333333;24
"93";"F";21;3,3;3,875;4;2,75;28
"94";"F";21;3,5;3,5;3,5;2,75;17
"95";"F";20;3,6;3,75;2,625;2,91666666666667;18
"96";"M";22;3,7;4,25;2,5;2,08333333333333;17
"97";"M";21;4,2;3,625;3,75;3,66666666666667;23
"98";"M";21;3,2;4,125;3,625;2,83333333333333;26
"99";"F";20;5;4,25;4,125;3,41666666666667;28
"100";"M";22;4,7;4,25;4,375;1,58333333333333;31
"101";"F";20;3,6;4,75;2,625;2,91666666666667;27
"102";"F";20;3,6;3,75;4;3;25
"103";"M";24;2,9;3,25;2,75;2,91666666666667;23
"104";"F";20;3,5;3,75;2,75;2,66666666666667;21
"105";"F";19;4;2,75;1,375;3;27
"106";"F";21;3,5;3,5;2,25;2,75;28
"107";"F";21;3,2;3,25;3,625;3,08333333333333;23
"108";"F";22;2,6;4,125;3,75;2,5;21
"109";"F";25;2;3,875;4;2,33333333333333;25
"110";"F";21;2,7;2,875;3,125;3;11
"111";"F";22;3,2;4,125;3,25;3;19
"112";"F";25;3,3;2,5;2,125;4;24
"113";"F";20;3,9;3,375;2,875;3,25;28
"114";"M";24;3,3;3,125;1,5;3,5;21
"115";"F";20;3;2,75;2,5;3,5;24
"116";"M";21;3,7;3,125;3,25;3,83333333333333;24
"117";"F";20;2,5;4,125;3,625;2,91666666666667;20
"118";"F";20;2,9;3,5;3,875;2,16666666666667;19
"119";"M";31;3,9;3,75;3,875;1,66666666666667;30
"120";"F";20;3,6;4,125;2,375;2,08333333333333;22
"121";"F";22;2,9;3,25;3;2,83333333333333;16
"122";"F";22;2,1;2,75;3,375;3,41666666666667;16
"123";"M";21;3,1;3,375;2,75;3,33333333333333;19
"124";"M";22;4;3,75;4,5;2,58333333333333;30
"125";"F";21;3,1;4;2,625;2,83333333333333;23
"126";"F";21;2,3;4,125;2,75;3,33333333333333;19
"127";"F";21;2,8;3,875;3,25;3;18
"128";"F";21;3,7;4,75;4,125;2,58333333333333;28
"129";"F";20;2,6;3,625;3,375;2,41666666666667;21
"130";"F";21;2,4;3,375;2,75;3,58333333333333;19
"131";"F";25;3;3,75;4,125;2,08333333333333;27
"132";"M";21;2,8;2,25;3,25;4,33333333333333;24
"133";"F";24;2,9;4,125;2,875;2,66666666666667;21
"134";"F";20;2,4;3,5;2,875;3;20
"135";"M";21;3,1;4;2,375;2,66666666666667;28
"136";"F";20;1,9;3;3,875;2,16666666666667;12
"137";"F";20;2;3,375;2,125;2,66666666666667;21
"138";"F";18;3,8;3,125;4;2,25;28
"139";"F";21;3,4;3,5;3,25;2,66666666666667;31
"140";"F";19;3,7;3,375;2,625;3,33333333333333;18
"141";"F";21;2,9;3,875;2,75;3,5;25
"142";"F";20;2,3;3,625;4;2,75;19
"143";"M";21;4,1;4,375;3;2;21
"144";"F";20;2,7;3,125;3,375;2,83333333333333;16
"145";"F";21;3,5;4;3,875;3,5;7
"146";"F";20;3,4;3,625;3,25;2,5;21
"147";"F";18;3,2;4,375;3,375;3,16666666666667;17
"148";"M";22;3,3;3,75;4,125;3,08333333333333;22
"149";"F";22;3,3;3,625;3,5;2,91666666666667;18
"150";"M";24;3,5;2,5;2;3,16666666666667;25
"151";"F";19;3,2;4;3,625;2,5;24
"152";"F";20;3,1;3,375;3,375;3,83333333333333;23
"153";"F";20;2,8;4,375;2,125;2,25;23
"154";"F";17;1,7;3,375;4,625;3,41666666666667;26
"155";"M";19;1,9;2,25;2,5;3,75;12
"156";"F";20;3,5;3,25;2,875;3;32
"157";"F";20;2,4;3,5;2,75;2,58333333333333;22
"158";"F";20;2,1;3,75;4;3,33333333333333;20
"159";"F";20;2,9;4,25;2,375;2,83333333333333;21
"160";"F";19;1,9;3,75;3,875;3;23
"161";"F";19;2;4;3,375;2,83333333333333;20
"162";"F";22;4,2;3;1,75;3,16666666666667;28
"163";"M";35;4,1;3,75;3;2,75;31
"164";"F";18;3,7;3,375;2,625;3,41666666666667;18
"165";"F";19;3,6;3,5;2,625;3;30
"166";"M";21;1,8;4;3,375;2,66666666666667;19
Loading