README.Rmd

---
output:
  md_document:
    variant: markdown_github
---


```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# Description

This package is for finding change points in sequences of responses. Putative change
points are first detected, then a statistical test at a predefined significance
level (Criterion) is applied to decide if the change point is supported or not.
Based on algorithm published by [Gallistel et al. (2004)](http://dx.doi.org/10.1073/pnas.0404965101), translated from Matlab into R. 

# Installation

You can use `devtools` to install the package from Github. Install the devtools package first, then execute this code in R(Studio):

```{r, eval = FALSE}
devtools::install_github('ontogenerator/cpdetectoR')
```

# Examples

There is a main wrapper function, called `cp_wrapper`, which can be used to analyze
sequences of responses. The first argument of the function is the input. Thes second argument indicates whether the input contains discrete-trial data (TRUE), or successive real value intervals (FALSE).  The third argument gives the statistical test ("KS", "ttest", "binomial", or "chisquare"). The different tests usually give similar, but often different results. The third and last argument is the criterion, with values between 1.3 and 6.

The input can be either a vector:

```{r}
library(cpdetectoR) # load package first
set.seed(25)
cp_wrapper(c(rbinom(50, 1, 0.3), rbinom(50, 1, 0.8)), TRUE, "binomial", 2)
```

or a data frame:

```{r}
d_responses <- data.frame(Responses = c(rbinom(50, 1, 0.3), rbinom(50, 1, 0.8)))
cp_wrapper(d_responses, TRUE, "chisquare", 2)
```

For the same value of the criterion, the chi square test usually gives a higher number of change points (in this case, false positives) than the binomial test. The value of the criterion should lie between 1.3 and 6, corresponding to p values of 0.05 and 0.000001, respectively. These values are the logarithms of the odds against the null (no-change) hypothesis.

Let us look first at the included `eyeblink` data set:
```{r}
eyeblink[,] # inspect data set
```

Gallistel et al. advise against using the chi square test on these data. Indeed, with a criterion of 2, the test fails:
```{r, eval = FALSE}
cp_wrapper(eyeblink, TRUE, "chisquare", 2)
```

However, using either a higher criterion or the "binomial" test gives a result:
```{r}
cp_wrapper(eyeblink, TRUE, "chisquare", 3)

cp_wrapper(eyeblink, TRUE, "binomial", 2)
```

With ggplot we can visualize the results by first generating a `data.frame` with the cumulative responses:
```{r}
library(ggplot2) # load the ggplot package
eyeblinkdata <- data.frame(Trial = 1:length(eyeblink[,]),
                           CumRespMeasure = cumsum(eyeblink)[,])
changepoints <- cp_wrapper(eyeblink, TRUE, "binomial", 4) # save the output of the change point analysis
#generate a cumulative response vs trial plot:
ggplot(eyeblinkdata) + geom_line(aes(Trial, CumRespMeasure)) +
  geom_point(data = changepoints, aes(Trial, CumSs), size = 3)
```

Another type of plot one can look at is the average response rate per trial vs trial.
The `plusmaze` data set included with the package contains frequency data, that are again preferrably analyzed with the random rate (binomial) test.

```{r}
plusmaze[,] # inspect data set
(cp.1 <- cp_wrapper(plusmaze, TRUE, "binomial", 1.3)) #find the change points
# plot average response rate per trial
ggplot() + geom_step(data = cp.1, aes(Trial,Slopes)) +
  ylab("Average Response Rate per Trial")
# for comparison, the cumulative response vs trial plot, as in the example above:
plusmazedata <- data.frame(Trial = 1:length(plusmaze[,]),
                           CumRespMeasure = cumsum(plusmaze)[,])
ggplot(plusmazedata) + geom_line(aes(Trial, CumRespMeasure)) +
  geom_point(data = cp.1, aes(Trial, CumSs), size = 3)
```

The attached data set `hopperentry` contains hopper-entry speeds from pigeons, an example of normally distributed data. Consequently the t test can be used.

```{r}
(cp.2 <- cp_wrapper(hopperentry, TRUE, "ttest", 4)) #find the change points
# cumulative response vs trial plot
hedata <- data.frame(Trial = 1:length(hopperentry[,]),
                           CumRespMeasure = cumsum(hopperentry)[,])
pl1 <- ggplotGrob(ggplot(hedata) + geom_line(aes(Trial, CumRespMeasure)) +
  geom_point(data = cp.2, aes(Trial, CumSs), size = 3))
# plot average response rate per trial
pl2 <- ggplotGrob(ggplot(cp.2) + geom_step(aes(Trial, Slopes)) +
  ylab("Average Response Rate per Trial"))
# stack the two plots vertically using the grid package
grid::grid.draw(rbind(pl1, pl2, size = "first"))
```

An example for continuous data is the attached `matching` data set. The plots need different axes and labels:

```{r}
(cp.3 <- cp_wrapper(matching, FALSE, "binomial", 2)) #find the change points
# cumulative response vs trial plot
matchingdata <- data.frame(Events = 1:length(matching[,]),
                           Time = cumsum(matching)[,])
pl3 <- ggplotGrob(ggplot(matchingdata) + geom_line(aes(Time, Events)) +
  geom_point(data = cp.3, aes(Time, Events), size = 3))
# plot average response rate per trial
pl4 <- ggplotGrob(ggplot(cp.3) + geom_step(aes(Time, Slopes)) +
  ylab("Events per Unit Time"))
grid::grid.draw(rbind(pl3, pl4, size = "first"))
```


# References
1. Gallistel CR, Fairhurst S, Balsam P (2004) The learning curve:
  Implications of a quantitative analysis. PNAS 101:13124-13131. doi: 10.1073/pnas.0404965101