Core dump (#5)

* fix organise typos * added core code to import, describe, inference, appendix * Installed pkgs, library chunk, and data folder --------- Co-authored-by: Your Name <[email protected]> Co-authored-by: lisaawilliams <[email protected]> Co-authored-by: Fonti Kar <[email protected]>
unsw-edu-au · May 20, 2024 · c754501 · c754501
1 parent 1640ff2
commit c754501
Show file tree

Hide file tree

Showing 35 changed files with 1,911 additions and 8 deletions.
diff --git a/.Rprofile b/.Rprofile
diff --git a/.gitignore b/.gitignore
diff --git a/README.md b/README.md
diff --git a/book/.DS_Store b/book/.DS_Store
diff --git a/book/.gitignore b/book/.gitignore
@@ -15,3 +15,4 @@ site_libs
 index.tex
 docs/ 
 .quarto/
+.DS_Store
diff --git a/book/LICENSE b/book/LICENSE
diff --git a/book/README.md b/book/README.md
diff --git a/book/_quarto.yml b/book/_quarto.yml
diff --git a/book/appendix.qmd b/book/appendix.qmd
@@ -4,6 +4,11 @@ format: html
 editor: visual
 ---
 
+# how to install R and RStudio on your machine 
+
+The marvellous Danielle Navarro has LOTS of useful R learning resources on her YouTube channel. [This playlist](https://www.youtube.com/playlist?list=PLRPB0ZzEYegOZivdelOuEn-R-XUN-DOjd) about how to install R and RStudio is particularly useful; no matter which operating system you are dealing with... Dani has you covered.  
+
+
 # how to install packages
 
 You only need to install a package once on your machine. Once the package is installed, you will need to use `library()` to load the functions in it every time you want to use it, but the installation is a one time job. So you can either do it in the console, or using the packages tab.
@@ -14,6 +19,7 @@ Install a package by typing the following command with the name of the package y
 
 ```         
 install.packages("packagename")
+
 ```
 
 ## option 2
@@ -23,3 +29,71 @@ Alternatively, search for the package you would like to install in the packages
 ![You can search for packages and install them from CRAN via the packages tab](images/install.png)
 
 > Remember once you have installed a package, you will need to use the `library()` function to load it before it will work.
+
+## useful packages for psychology
+
+
+- `tidyverse` this is a cluster of super helpful data wrangling and visualisation tools. 
+- `here` this package helps direct R to the correct place for files based on the current working directory.
+- `janitor` this package helps us clean up data - especially awkward variable names.
+- `qualtRics` this package is helpful in reading in data files from Qualtrics... except for .sav SPSS format files! (see next)
+- `haven` this package is a good one for reading in .sav SPSS format files
+- `sjplot` this package is helpful for making a 'codebook' of your variables and values from imported .sav files
+- `surveytoolbox` this package is helpful in drawing out the value labels of variables imported in .sav format
+-- note: because `surveytoolbox` is on github and not CRAN, you'll want to do the following two steps *in the console*. Note that we do this in the console since we only need to do it once! If the install asks you about updating packages, go ahead and do it!
+---(1) install the `devtools` package: install.packages("devtools") 
+---(2) install via github: devtools::install_github("martinctc/surveytoolbox") 
+- `ufs` this package (short for user friendly science) is a nice tool for computing the internal reliability of scales
+-- note: one of the commands we will use in `ufs` requires the `psych` package to be installed (but doesn't need to be loaded via `library()`). Ensure you install that first. Two steps:
+----(1) install the `remotes`` package: install.packages("remotes") 
+----(2) install via github_lab: remotes::install_gitlab('r-packages/ufs') 
+- `apa` nice for making statistical output into APA style
+- `gt` nice for making your tables look pretty
+- `apaTables` makes nice APA-styled tables of correlation, ANOVA, regression etc. output
+- `report` is a package to help with results reporting
+- `psych` is an umbrella package for lots of common psych tasks
+- `ez` is a great package for stats, including analysis of variance
+- `emmeans` is helpful for comparing specific means in a factorial design
+
+
+# using inline code
+
+> JR maybe this piece needs to go in a separate chapter about writing with RMarkdown, papaja etc
+
+
+```{r eval = FALSE}
+
+#pulls from the exclusions_summary tabyl created above
+
+comments_only_count <- exclusions_summary %>% filter(exclude_coded == "comments only") %>% pull(n)
+comments_only_count2 <- exclusions_summary$n[which(exclusions_summary$exclude_coded=="comments_only")]
+
+time_only_count <- exclusions_summary %>% filter(exclude_coded == "time only") %>% pull(n)
+variance_only_count <- exclusions_summary %>% filter(exclude_coded == "variance only") %>% pull(n)
+total <- exclusions_summary %>% filter(exclude_coded == "Total") %>% pull(n)
+kept <- exclusions_summary %>% filter(exclude_coded == "kept") %>% pull(n)
+
+```
+
+Use of inline code is really helpful in avoiding transcription errors and saving time when writing up! Here, we use code to pull in some descriptive statistics from the exclusion reason table we made above:
+
+> INSERT INLINE EXAMPLE HERE
+
+
+# helpful console commands
+
+- names(objectname) - returns a list of variable names for that dataframe, making it less likely you will type things incorrectly
+- getwd() - returns the path to the current working directory. Run this in the console.
+- rm(objectname) - removes the object from your global environment. Can be helpful in cleaning up any 'test' objects you make while troubleshooting code.
+- ?package - brings up the Help info for that package
+- ?function - brings up the Help info for that function
+
+# useful keyboard shortcuts
+
+Option-Command-I = inserts a new code chunk
+Command-Enter = runs the chunk of code that your cursor is in
+
+
+# commonly encountered errors
+
+
diff --git a/book/contributing.qmd b/book/contributing.qmd
diff --git a/book/data/sampledata.sav b/book/data/sampledata.sav
diff --git a/book/datasets.qmd b/book/datasets.qmd
diff --git a/book/describe.qmd b/book/describe.qmd
@@ -1 +1,48 @@
-# Describe
+# Describe
+
+## getting a feel for your data
+
+`str`
+
+`glimpse`
+
+`skimr`
+
+## counting things
+
+`group_by` + `summarise` + `count`
+
+`n`
+
+`tabyl`
+
+## getting descriptives
+
+`group_by` + `summarise` + `mean` & `sd`
+
+```{r eval = FALSE}
+
+scale1_by_condition12 <- data_scalescomputed %>%
+  group_by(condition12) %>%
+  summarise(mean_scale1 = mean(scale1_index, na.rm = TRUE),
+            sd_scale1 = sd(scale1_index, na.rm = TRUE))
+
+
+
+```
+
+### Three things to remember
+
+1.  When we compute means, we need to set the decimals via `round()`.
+2.  We also need to tell R to calculate the mean, even if some of the contributing data points are missing. This is what `na.rm = TRUE` does.
+3.  As noted above, `rowwise` asks R to do something for each row (which is what we want here -- to compute the mean of the contributing items for each participant). Whenever we use `rowwise` (or `group_by`), we need to `ungroup()` at the end to avoid issues down the line.
+
+## tables??
+
+`gt`
+
+```{r eval = FALSE}
+gt(scale1_by_condition12)
+```
+
+`apaTable`
diff --git a/book/images/computer-arson.png b/book/images/computer-arson.png
diff --git a/book/images/cover.png b/book/images/cover.png
diff --git a/book/images/install.png b/book/images/install.png
diff --git a/book/images/project.png b/book/images/project.png
diff --git a/book/images/readme.png b/book/images/readme.png
diff --git a/book/images/renv.png b/book/images/renv.png
diff --git a/book/images/structure.png b/book/images/structure.png
diff --git a/book/import.qmd b/book/import.qmd
@@ -1,5 +1,18 @@
 # Import
 
+# Packages for this chapter
+
+```{r}
+library(tidyverse)
+library(here)
+library(janitor)
+library(haven)
+library(sjPlot)
+library(surveytoolbox)
+
+```
+
+
 ## Reading in Excel spreadsheets
 
 This is gobbledygook.
@@ -12,3 +25,85 @@ This is gobbledygook.
 
 ### Reading in .sav
 
+# Read in the data
+
+Remember the file setup described above? This is where that starts to be important. Remember, our working directory (i.e., where R thinks "here" is) was set via the Rproj file -- so it is the "Williams Lab Core R" folder. You can check this by typing `getwd()` or `here()` in the console. 
+
+For most of this core script, we'll be using data from a file called sampledata.sav, which should be in the data subfolder from the zipped file. If not, sort that out now!
+
+A .sav file is in SPSS format. When you export from Qualtrics into .sav/SPSS format, it retains helpful information like question wording and response labels. If you export straight to .csv, you lose that info and will find yourself cross-checking back to Qualtrics. So, strong word of advice to always export to .sav.
+
+The code below uses the `here` command to direct R to the data folder *from the working directory*, and then the .sav file within it.
+
+The `glimpse` command gives a nice overview of the variables, their type, and a preview of the data.
+
+```{r}
+
+data <- read_sav(here("book", "data", "sampledata.sav")) 
+
+glimpse(data)
+
+```
+
+
+These variable names won't be very nice to work with with awkward and inconsistent capitalisation. Actual Qualtrics exports are even messier!
+
+The `clean_names` function from `janitor` helps clean them up!
+
+`data_cleanednames <-` at the start of the line saves the change to a new dataframe. Alternately, you could write it back to the same dataframe (e.g., `data <-` ), but this should be done very intentionally as it makes it harder to backtrack to the source of an error. The general rule is to create a new dataframe each time you implement a big change on the data.
+
+The `glimpse` command here shows you that you effectively cleaned the variable names!
+
+```{r}
+
+data_cleanednames <- clean_names(data)
+
+glimpse(data_cleanednames)
+
+```
+
+A few things about working with files in SPSS format (.sav) before we continue. The reason why we bother with this is that the SPSS format maximises the information in the file. Unlike exporting to .csv or another spreadsheet format, .sav retains information about question wording (saved as a variable label) and response labelling (saved as a value label).
+
+If you look at the variable types at the right of the `glimpse` output, you'll see the some of the variables are dbl (numeric) while some are dbl+lbl (numeric with labelled values). If you view the `data` object (by clicking on it in the Environment or using `view(data)`) you will see that some of the variables have the question wording below the variable name.
+
+Having this information on hand is really helpful when working with your data!
+
+The `view_df` function from the `sjPlot` package creates a really nicely formatted html file that includes variable names, question wording, response options, and response labelling. This code saves the html file to the `output_files` folder using the `here` package (which starts where your Rproj file is). This html file is nice as a reference for your own use or to share with a supervisor or collaborator!
+
+```{r }
+
+view_df(data_cleanednames)
+
+view_df(data_cleanednames, file=here("output_files","spsstest_codebook.html"))
+
+```
+
+The `data_dict` function from `surveytoolbox` makes a dataframe with all the variable and response labels - similar to the html created above, but this can be called upon later in R as it's now part of the environment. 
+
+```{r }
+datadictionary <- data_cleanednames %>%
+  data_dict()
+```
+
+Let's say you just want to know the question wording or response labels for a particular variable, you can do this via code rather than checking the whole dataset. The `extract_vallab` command from `surveytoolbox` returns the value labels for a given variable.
+
+```{r }
+data_cleanednames %>%
+  extract_vallab("demographicscateg")
+```
+
+There are (evidently) times when packages *do not like* labelled data. So, here are a few tools for removing labels from the `haven` package. Keep these up your sleeve for problem solving later! `zap_labels` and `zap_label` not surprisingly removes the labels - the first removes the value labels and the second removes the variable labels! The code below makes a new data dictionary of the zapped dataframe and glimpses the new dataframe to confirm the labels are gone.
+
+```{r }
+
+data_zapped <- data_cleanednames %>%
+  zap_labels() %>%
+  zap_label()
+
+datadictionary_zapped <- data_zapped %>%
+  data_dict()
+
+glimpse(data_zapped)
+```
+
+For the rest of this script, we will work with the *zapped* dataframe. This is the recommended approach to save headaches with errors down the line.
diff --git a/book/index.qmd b/book/index.qmd