limitations.qmd

---
title: "Limitations"
---

```{r import, echo=FALSE, message=FALSE, warning=FALSE}
library(rio)
library(tidyverse)

# read in the version number
version_number <- readLines("VERSION")[1]

### dynamically construct the file paths

# directory
data_dir <- "_data/database"

# politician
politicians_file_path <- file.path(data_dir, paste0("politician_", version_number, ".csv"))
politician <- read.csv(politicians_file_path)
```

## Missing URL's
`r nrow(politician[grep("index", politician$name_wikitag), ])` politicians listed in the database do not have a corresponding Wikipedia article, even though the Wikipedia tag at the end of the URL is essential to the data structure as it forms the basis for the id variable. The URL’s are used due to their consistency over time.

For the politicians without a corresponding article, Wikipedia automatically generates a warning default URL, beginning with "https://de.wikipedia.org/" and followed by "w/index.php?title= *Firstname as written in table_Surname as written in table* &action=edit&redlink=1".

Using the data provided, the absence of a corresponding Wikipedia article is not a big issue since the *id's* created with the error URL remain unique in all cases. However, in the event that a Wikipedia page is created for one of these politicians, the provided *id* would become outdated. Furthermore, the metadata such as birthplace and birthdate could not be obtained for these individuals. This affects `r nrow(politician[grep("/w/index", politician$name_wikitag), ])` out of `r nrow(politician)` MPs.

## Report an Issue

If you find any flaws in the data or suspect something please do not hesitate to report an issue on the projects [GitHub repository](https://github.com/StatePol/Database/issues/new).