Skip to content

Latest commit



149 lines (111 loc) · 5.29 KB

File metadata and controls

149 lines (111 loc) · 5.29 KB


This package implements the quantitative measures of straightlining outlined by Kim et al. (2019). The package includes measures for straightlining on the individual level, but can also generate a report on how much straightlining happens across a set (or several sets) of questions.


You can install the development version of straightliner from GitHub with:

# install.packages("devtools")


To see which respondents are straightlining, use straightlining


survey <- data.frame("respondent" = c("A", "B", "C"),
"item1" = c(5, 5, 4),
"item2" = c(5, 5, 2),
"item3" = c(5, 5, 5),
"item4" = c(5, 1, 3))

st_results <- straightlining(survey, varnames = c("item1", "item3" ,"item4"))
#> The scale point variation measure for this battery of questions is 0.37. 
#> The highest possible value of scale point variation for a given respondent is 0.67
#> The proportion of nondifferentiation is 0.33
#>         mrp       mir      qsd       spv nondiff
#> 1 1.0000000 1.0000000 0.000000 0.0000000    TRUE
#> 2 0.0000000 0.6666667 2.309401 0.4444444   FALSE
#> 3 0.2928932 0.3333333 1.000000 0.6666667   FALSE

This returns a dataframe where each row is a respondent and each column how they performed on a particular measure. You can cbind this to your original dataset if you want. Alternatively, you can pass the argument keep_original = TRUE to return your original dataset plus these measures.

To compare batteries of questions to one another, use the straightling_qset() function. This one needs a named list with one entry for each set of questions, so that it can output a decent table:

batteries <- list("first two" = c("item1", "item2"),
                   "last two" = c(4,5))

straightlining_qset(survey, batteries, measures = c("mrp", "spv"))
#>            mrp  spv spv_max
#> first two 0.67 0.17     0.5
#> last two  0.43 0.33     0.5


For details on the measures call ?straightlining() to pull up the documentation and see the ‘details’ section.

Individual-level Measures.

In the formulas below, $n$ refers to the number of questions in a battery, $r$ refers to the response option chosen on a given question, $i$ refers to the respondent, and $j$ is the counter for responses. \describe{

  • "sd": Battery Standard Deviation: simply calculates the standard deviation of the battery for each respondent. Lower values indicate more straightlining.

  • "spv": Scale Point Variation (Linville, Salovey, Fischer 1986; who called it probability of differentiation):

$$P = 1 - \sum_{j=1}^n{p_{j}^2}$$

where $p$ is, for each response option the respondent chooses, the proportion of responses that take the particular value. A respondent choosing A once and B twice on a 3-question battery has a rho of $1 - [\frac{1}{3}^2 + \frac{2}{3}^2] \approx 0.44 $. Lower values indicate more straightlining. A scale point variation measure for the battery of questions is calculated by averaging the rhos of all respondents

  • "mir": Maximum Identical Rating: In what proportion of items does a respondent use their most common response option?

$$MIR = \frac{r_{max}}{n}$$

Where $r_{max}$ refers to the response choice most often used by the respondent within this battery.

Battery-level Measures.

These measures give an overview of how much straightlining there was across the whole battery of questions. It might be useful to compare how respondents react to different ways of measuring the same concept.

  • "nondiff": simple nondifferentiation. Returns the proportion of respondents who always use the same response in a battery.

  • "mrp": Mean Root Pairs (Mulligan, Krosnick, Smith, Green, and Bizer, 2001). For each respondent, calculate an index of differentiation by summing the roots of absolute differences between all pairs of items, and then taking the average.Then normalize this with reference to the range this index takes on within the sample. E.g. on a battery consisting of three questions:

$$\text { Respondent } T_{i}= \frac{\sqrt{\left|r_1-r_2\right|}+\sqrt{\left|r_1-r_3\right|}+ \sqrt{\left|r_2-r_3\right|}}{3} $$

$$ MRP_i = \frac{T_i - max(T)} {min(T) - max(T)}$$


both straightlining and straightlining_qset automatically convert the passed data to numeric if it is not – these calculations don’t work otherwise. They will warn you if this happens, and let you know how many levels are present in the data for each variable. If you design a battery where all responses have the same amount of levels in the survey (say a 5-point likert) but on one of the questions people only pick the bottom 4, then the conversion to numbers will cause a misalignment across this battery and render these statistics useless.


  • This package has some different functions than careless, which is an excellent R package. With time I hope to integrate their straightlining measures here too, to make it a one-stop shop.
  • Examples in the docs won’t work