README.Rmd

---
output: github_document
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "##",
  fig.path = "man/images/"
)
```
```{r echo = FALSE, results = "hide", message = FALSE}
library("badger")
```

# quanteda.sentiment

<!-- badges: start -->
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/quanteda.sentiment)](https://cran.r-project.org/package=quanteda.sentiment)
`r badge_devel("quanteda/quanteda.sentiment", "royalblue")`
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![Codecov test coverage](https://codecov.io/gh/quanteda/quanteda.sentiment/branch/master/graph/badge.svg)](https://app.codecov.io/gh/quanteda/quanteda.sentiment?branch=master)
[![R-CMD-check](https://github.com/quanteda/quanteda.sentiment/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/quanteda/quanteda.sentiment/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

## Installation

You can install **quanteda.sentiment** from GitHub with:

```{r eval = FALSE}
remotes::install_github("quanteda/quanteda.sentiment")
```

The package is not yet on CRAN.

## About

**quanteda.sentiment** extends the **quanteda** package with functions for computing sentiment on text.  It has two main functions, for computing two types of sentiment.  These follow the structure of a **quanteda** dictionary, which consists of _key_ entries expressing the canonical concept, and _value_ patterns (such as "good", "sad*", etc.) to be matched in a text and counted as occurrences of that key.

The approach to sentiment in this package approaches sentiment computation in two ways, depending on whether sentiment is considered a key attribute, in which case the keys are assigned a _polarity_ such as _positive_ or _negative_, or whether individual values are assigned a _valence_, in the form of some continuous value indicating a degree of sentiment.  Each is implemented in a separate function:

*  **Polarity-based sentiment.**  This is implemented via `textstat_polarity()`, for computing a sentiment based on keys set as "poles" of positive versus negative sentiment.  Setting polarity is dones through the `polarity()<-` function and can be set for any dictionary, for any keys.  "Sentiment" here can be broadly construed as any contrasting pair of poles, such as "Democrat" versus "Republican", for instance.  More than one key can be associated with the same pole.

    Polar values are converted into sentiment scores using a flexible function, such as $\mathrm{log}(pos / neg)$, or $(pos - neg)/(pos + neg)$.  **quanteda.sentiment** offers three built-in functions, but the user can supply any function for combining polarities.

* **Valence-based sentiment.**  This is implemented via `textstat_valence()`, for computing sentiment as the average valence of a document's words, based on a dictionary whose values have numeric valence scores.  Valence scores are set using the `valence()<-` function.  Each key in a dictionary may have values with difference valences.

The package comes with the following built-in dictionaries:

| Name                             | Description                                                   | Polarity | Valence |
|:---------------------------------|:--------------------------------------------------------------|:--------:|:-------:|
| data_dictionary_AFINN            | Nielsen's (2011) 'new ANEW' valenced word list                |          |    ✔    |
| data_dictionary_ANEW             | Affective Norms for English Words (ANEW)                      |          |    ✔    |
| data_dictionary_geninqposneg     | Augmented General Inquirer _Positiv_ and _Negativ_ dictionary |     ✔    |         |
| data_dictionary_HuLiu            | Positive and negative words from Hu and Liu (2004)            |     ✔    |         |
| data_dictionary_LoughranMcDonald | Loughran and McDonald Sentiment Word Lists                    |     ✔    |         |
| data_dictionary_LSD2015          | Lexicoder Sentiment Dictionary (2015)                         |     ✔    |         |
| data_dictionary_NRC              | NRC Word-Emotion Association Lexicon                          |     ✔    |         |
| data_dictionary_Rauh             | Rauh's German Political Sentiment Dictionary                  |     ✔    |         |
| data_dictionary_sentiws          | SentimentWortschatz (SentiWS)                                 |     ✔    |    ✔    |


## Examples

For a polarity dictionary, we can use the positive and negative key categories from the General Inquirer dictionary:
```{r}
library("quanteda.sentiment")

# inspect the dictionary and its polarities
print(data_dictionary_geninqposneg, max_nval = 8)

# compute sentiment
tail(data_corpus_inaugural) |>
  textstat_polarity(dictionary = data_dictionary_geninqposneg)
```

For a valence dictionary, we can compute this for the "pleasure" category of the Affective Norms for English Words (ANEW): 
```{r}
library("quanteda", warn.conflicts = FALSE, quietly = TRUE)
library("quanteda.sentiment")

# inspect the dictionary and its valences
print(data_dictionary_ANEW, max_nval = 8)
lapply(valence(data_dictionary_ANEW), head, 8)

# compute the sentiment
tail(data_corpus_inaugural) |>
  textstat_valence(dictionary = data_dictionary_ANEW["pleasure"])
```

We can compare two measures computed in different ways (although they are not comparable, really, since they are different lexicons):
```{r}
# ensure we have this package's version of the dictionary
data("data_dictionary_LSD2015", package = "quanteda.sentiment")

sent_pol <- tail(data_corpus_inaugural, 25) |>
  textstat_polarity(dictionary = data_dictionary_LSD2015)
sent_pol <- dplyr::mutate(sent_pol, polarity = sentiment)
sent_val <- tail(data_corpus_inaugural, 25) |>
  textstat_valence(dictionary = data_dictionary_AFINN)

library("ggplot2")

ggplot(data.frame(sent_pol, valence = sent_val$sentiment),
       aes(x = polarity, y = valence)) +
  geom_point()
```

Good enough for government work!

## Where to learn more

Each dictionary and function has extensive documentation, including references to social scientific research articles where each sentiment concept is described in detail.  There is also a package vignette with more detailed examples.