Skip to content

Latest commit

 

History

History
99 lines (83 loc) · 3.73 KB

readme.md

File metadata and controls

99 lines (83 loc) · 3.73 KB

Board Games Database

This week's data comes from the Board Game Geek database. The site's database has more than 90,000 games, with crowd-sourced ratings. There is also an R package with the fulldataset (bggAnalysis) but it hasn't been updated in ~2 years.

To follow along with a fivethirtyeight article, I limited to only games with at least 50 ratings and for games between 1950 and 2016. This still leaves us with 10,532 games!

board_games <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-12/board_games.csv")

Data Dictionary

variable class description
game_id character Unique game identifier
description character A paragraph of text describing the game
image character URL image of the game
max_players integer Maximum recommended players
max_playtime integer Maximum recommended playtime (min)
min_age integer Minimum recommended age
min_players integer Minimum recommended players
min_playtime integer Minimum recommended playtime (min)
name character Name of the game
playing_time integer Average playtime
thumbnail character URL thumbnail of the game
year_published integer Year game was published
artist character Artist for game art
category character Categories for the game (separated by commas)
compilation character If part of a multi-compilation - name of compilation
designer character Game designer
expansion character If there is an expansion pack - name of expansion
family character Family of game - equivalent to a publisher
mechanic character Game mechanic - how game is played, separated by comma
publisher character Comoany/person who published the game, separated by comma
average_rating double Average rating on Board Games Geek (1-10)
users_rated double Number of users that rated the game

Data Cleaning

library(bggAnalysis)
library(tidyverse)

named_games <- BoardGames %>% 
  janitor::clean_names() %>% 
  set_names(~str_replace(.x, "details_", "")) %>% 
  set_names(~str_replace(.x, "attributes_boardgame", "")) %>% 
  set_names(~str_replace(.x, "stats_", "")) %>% 
  select(game_id:average, usersrated, averageweight) %>% 
  filter(!is.na(yearpublished)) %>% 
  filter(yearpublished <=2016 & yearpublished >= 1950) %>% 
  filter(usersrated >= 50, game_type == "boardgame")

named_games %>% 
  group_by(yearpublished) %>% 
  summarize(count = n()) %>% 
  ggplot(aes(x = yearpublished, y = count)) +
  geom_line()

tidy_names <- c("game_id",
  "game_type",
  "description",
  "image",
  "max_players",
  "max_playtime",
  "min_age",
  "min_players",
  "min_playtime",
  "name",
  "playing_time",
  "thumbnail",
  "year_published",
  "artist",
  "category",
  "compilation",
  "designer",
  "expansion",
  "family",
  "implementation",
  "integration",
  "mechanic",
  "publisher",
  "attributes_total",
  "average_rating",
  "users_rated",
  "average_weight")

tidy_games <- named_games %>% 
  set_names(nm = tidy_names) %>% 
  select(-attributes_total, -game_type, - implementation, -integration, - average_weight)

tidy_games %>% 
  tomtom::create_dictionary()

tidy_games %>% write_csv("board_games.csv")