README.Rmd

---
title: "`sport` an R package for Online Ranking Methods"
output:
  github_document:
    pandoc_args: --webtex
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
```

# sport <img src="man/figures/hexlogo.png" align="right" />

<!-- badges: start -->
[![CRAN badge](https://cranlogs.r-pkg.org/badges/sport)](https://cran.r-project.org/web/packages/sport/index.html)
[![Travis-CI Build Status](https://travis-ci.org/gogonzo/sport.svg?branch=master)](https://travis-ci.org/gogonzo/sport)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/gogonzo/sport?branch=master&svg=true)](https://ci.appveyor.com/project/gogonzo/sport)
[![License: GPL v2](https://img.shields.io/badge/License-GPL%20v2-blue.svg)](https://www.gnu.org/licenses/old-licenses/gpl-2.0.html)
[![Coverage status](https://codecov.io/gh/gogonzo/sport/branch/master/graph/badge.svg)](https://codecov.io/gh/gogonzo/sport)
<!-- badges: end -->

# About
Name `sport` is an abbreviation for Sequential Pairwise Online Rating Techniques. Package contains functions calculating ratings for two-player or multi-player matchups. Methods included in package are able to estimate ratings (players strengths) and their evolution in time, also able to predict output of challenge. 
Algorithms are based on Bayesian Approximation Method, and they don't involve any matrix inversions nor likelihood estimation. `sport` incorporates methods such glicko, glicko2, bayesian Bradley-Terry, dynamic logistic regression. Parameters are updated sequentially, and computation doesn't require any additional RAM to make estimation feasible. Additionally, base of the package is written in `C++` what makes `sport` computation even faster.

# Package Usage

## Installation
Install package from CRAN or development version from github.

```{r eval=FALSE}
devtools::install_github("gogonzo/sport")
install.packages("sport",repos = "https://cloud.r-project.org/")
```

## Available Data
Package contains actual data from Speedway Grand-Prix. There are two data.frames: 

1. `gpheats` - results SGP heats. Column `rank` is a numeric version of column `position` - rider position in race.
2. `gpsquads` - summarized results of the events, with sum of point and final position.

```{r echo=TRUE, message=FALSE, warning=FALSE}
library(sport) 
str(gpheats)
```

Data used in `sport` package must be in so called long format. Typically data.frame contains at least `id`, `name` and `rank`, with one row for one player within specific match. Package allows for any number of players within event and allows ties also. For all games, *output needs to be a rank/position in event*. Don't mix up rank output with typical 1-win, 0-lost. In `sport` package output for two player game is 1-winner 2-looser. Below example of two matches with 4 players each.

```{r echo=FALSE}
gpheats[1:8,c("id","rider","rank")]
```

## Estimate dynamic ratings

To compute ratings using each algorithms one has to specify formula. Form `rank | id ~ name` is required, which estimates `name` - rating of a player, by observing outputs - `rank`, nested within particular event - `id`. Variable names in formula are unrestricted, but model structure remains the same. All methods are named `method_run`.
`formula = rank|id ~ name` 

```{r warning=FALSE, message=FALSE}
glicko  <- glicko_run(formula = rank|id ~ player(rider), data = gpheats)
glicko2 <- glicko2_run(formula = rank|id ~ player(rider), data = gpheats)
bbt     <- bbt_run(formula = rank|id ~ player(rider), data = gpheats)
dbl     <- dbl_run(formula = rank|id ~ player(rider), data = gpheats)

print(dbl)
```

## Output

Objects returned by `method_run` are of class `rating` and have their own `print`
`summary` which provides most important informations. -`print.sport` shows 
condensed informations about model performance like accuracy and consistency of 
model predictions with observed probabilities. More profound summary are given 
by `summary` by showing ratings, ratings deviations and comparing model win 
probabilities with observed.

```{r message=FALSE}
summary(dbl)
```

To visualize top n ratings with their 95% confidence interval one can use 
dedicated `plot.rating` function. For DBL method top coefficients are presented 
not necessarily ratings. It's also possible to examine ratings evolution in time, 
by specifying `players` argument.

```{r message=FALSE, fig.show='hold', out.width = "50%"}
plot(glicko, n = 15)
plot(glicko, 
     players = c("Greg HANCOCK","Tomasz GOLLOB","Tony RICKARDSSON"))
```

Except dedicated `print`,`summary` and `plot` there is possibility to extract more detailed information to be analyzed. `rating` object contains following elements:

```{r message=FALSE}
names(glicko)
```

* `rating$final_r` and `rating$final_rd` contains ratings and ratings deviations estimations.
* `r` contains data.frame with sequential ratings estimations from first event to the last. Number of rows in `r` equals number of rows in input data.
* `pairs` pairwise combinations of players in analyzed events with prior probability and result of a challenge. 

```{r message=FALSE}
tail(glicko$r)
tail(glicko$pairs)
```