Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding english-monarchs-and-marriages (enmoma) dataset #721

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions data/curated/enmoma/cleaning.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
library(rvest)

# url to scrape:
root <- "https://www.ianvisits.co.uk/articles/a-list-of-monarchs-by-marriage-6857/"

# get table
tables <- read_html(root) |> html_nodes("table")
df <- tables[1] |> html_table() |> as.data.frame()

df <- df[, -6] # remove spoiler
df <- df[-c(1,2), ] # remove double-header effect

cols <- c("king_name", "king_age", "consort_name", "consort_age", "year_of_marriage")
colnames(df) <- cols
90 changes: 90 additions & 0 deletions data/curated/enmoma/english-monarchs-and-marriages.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
king_name,king_age,consort_name,consort_age,year_of_marriage
Æthelwulf,?,Osburh,?,851(?)
Æthelwulf,50(?),Judith of Flanders,12,856
Æthelbald,24,Judith of Flanders,14,858
Æthelberht,–,–,–,–
Æthelred,?,Wulfthryth?,?,?
Alfred the Great,19,Ealhswith,16,868
Edward the Elder,19,Ecgwynn,?,893
Edward the Elder,28,Aelffaed,?,902
Edward the Elder,31,Eadgifu of Kent,?,905
Æthelstan,–,–,–,–
"Edmund
the Magnificent",,Ælfgifu of Shaftesbury,–,–
"Edmund
the Magnificent",22,Æthelflæd of Damerham,?,944
Eadred,–,–,–,–
Eadwig,,Ælfgifu,?,955(?)
Edgar the Peaceful,17,Æthelflæd,?,960
Edgar the Peaceful,21,Ælfthryth,19,964
Edward the Martyr,–,–,–,–
Æthelred the Unready,23,Ælfgifu of York,21,991
Æthelred the Unready,34,Emma of Normandy,17,1002
"Sweyn
Forkbeard",,Gunhild of Wenden,?,990
"Sweyn
Forkbeard",,Sigrid the Haughty,?,1000
"Edmund
Ironside",,Edith of East Anglia,?,?
Cnut,18,Aelfgifu of Northampton,?,1013(?)
Cnut,22,Emma of Normandy,?,1017
"Harold
Harefoot",,Ælfgifu,?,?
Harthacnut,–,–,–,–
Edward the Confessor,42,Edith of Wessex,20,1045
Harold Godwinson,24,Edith Swannesha,19,1044(?)
Harold Godwinson,42,Ealdgyth,?,1064
William I,25,Matilda of Flanders,22,1053
William II,–,–,–,–
Henry I,32,Matilda of Scotland,20,1100
Henry I,53,Adeliza of Louvain,18,1121
Stephen,29,Matilda of Boulogne,20,1125
Henry II,19,Eleanor of Aquitaine,30,1152
Henry the Young King,5,Margaret of France,3,1160
Richard I,34,Berengaria of Navarre,26,1191
John,23,Isabel of Gloucester,16,1189
John,34,Isabella of Angoulême,12,1200
Henry III,29,Eleanor of Provence,13,1236
Edward I,15,Eleanor of Castile,13,1254
Edward I,60,Margaret of France,20,1299
Edward II,24,Isabella of France,13,1308
Edward III,16,Philippa of Hainault,14,1328
Richard II,15,Anne of Bohemia,16,1382
Richard II,29,Isabella of Valois,7,1396
Henry IV,14,Mary de Bohun,12,1380
Henry IV,37,Joanna of Navarre,33,1403
Henry V,34,Catherine of Valois,19,1420
Henry VI,24,Margaret of Anjou,15,1445
Edward IV,22,Elizabeth Woodville,27,1464
Edward V,–,–,–,–
Richard III,20,Anne Neville,16,1472
Henry VII,29,Elizabeth of York,20,1486
Henry VIII,18,Catherine of Aragon,24,1509
Henry VIII,42,Anne Boleyn,32,1533
Henry VIII,45,Jane Seymour,28,1536
Henry VIII,49,Anne of Cleves,25,1540
Henry VIII,49,Catherine Howard,19,1540
Henry VIII,52,Catherine Parr,31,1543
Edward VI,–,–,–,–
Mary I,38,Philip II of Spain,27,1554
Elizabeth I,–,–,–,–
James I,23,Anne of Denmark,15,1589
Charles I,25,Henrietta Maria of France,16,1625
Charles II,32,Catherine of Braganza,24,1662
James II,27,Anne Hyde,22,1660
James II,40,Mary of Modena,15,1673
Mary II,15,William III,27,1677
William III,27,Mary II,15,1677
Anne,18,George of Denmark,30,1683
George I,22,Sophia Dorothea of Brunswick-Lueneburg-Celle,16,1682
George II,22,Caroline of Ansbach,22,1705
George III,23,Charlotte of Mecklenburg-Strelitz,17,1761
George IV,23,Maria Anne Fitzherbert,29,1785
George IV,33,Caroline of Brunswick,27,1795
William IV,53,Adelaide of Saxe-Meiningen,26,1818
Victoria,21,Albert of Saxe-Coburg and Gotha,21,1840
Edward VII,22,Alexandra of Denmark,19,1863
George V,28,Mary of Teck,26,1893
Edward VIII,43,Wallis Warfield Simpson,41,1937
George VI,28,Elizabeth Bowes-Lyon,23,1923
Elizabeth II,21,Philip of Greece and Denmark,26,1947
Binary file added data/curated/enmoma/gg_enmoma.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
52 changes: 52 additions & 0 deletions data/curated/enmoma/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# How to Submit a Dataset

To submit a dataset there are a few steps:

1. Find a dataset.
2. Prepare your repository.
3. Prepare the dataset.

## Find a dataset

Find a dataset that would be good for TidyTuesday: either one that is already ready for analysis, or one that you can clean so that it meets the criteria.
These are the requirements for a dataset:
* Files are `.csv` files.
* The whole dataset (all files) is less than 20MB.
* You can describe each variable (either using an existing data dictionary or by creating your own).
* The data is publicly available and free for reuse, either with or without attribution.

You will also need:
* The source of the dataset
* An article about the dataset or that uses the dataset
* At least one image related to or using the dataset

## Prepare your repository

To submit datasets, we use a fork/branch approach.
You're going to fork this repository, and then create a branch in your forked repository to submit the pull request.

1. Fork the tidytuesday repository (this one).
2. Create a new branch in that fork, with something similar to the name of the dataset you're submitting. For instance if it's a dataset on American baseball, something like "american-baseball"" or "baseball" works.
3. Do the next steps in this fork/branch.

## Prepare the dataset

These instructions are for preparing a dataset using the R programming language.
We hope to provide instructions for other programming languages soon.

1. Navigate to the `data/curated` folder in your branch of the repository.
2. Make a copy of the `template` folder for your dataset, inside the `curated` folder. Name it something descriptive, like "funspotr" or "ttmeta", not "my_dataset".
3. Navigate to the folder you just created. That's where you're going to do your work.
4. `cleaning.R`: Modify the `cleaning.R` file to get and clean the data.
* Write the code to download and clean the data in `cleaning.R`.
* If you're getting the data from a github repo, remember to use the 'raw' version of the URL.
5. `saving.R`: Use`saving.R` to save your datasets. This creates the `.csv` file(s), and the data dictionary template file(s).
6. `{dataset}.md`: Edit the `{dataset}.md` files to describe your datasets. There should be one file for each dataset saved in step 5. Most likely you only need to fill in the "description" column with a description of each variable.
7. `intro.md`: Edit the `intro.md` file to describe your dataset.
8. Find at least one image for your dataset, and ideally 2. These often come from the article about your dataset. Save the images in your folder as `png` files.
9. `meta.yaml`: Edit `meta.yaml` to provide information about your dataset.

### Submit your pull request with the data

1. Commit the changes with this folder to your branch.
2. Submit a pull request to https://github.com/rfordatascience/tidytuesday.
7 changes: 7 additions & 0 deletions data/curated/enmoma/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# English Monarchs and Marriages

This week we are exploring [English Monarchs and Marriages](https://github.com/frankiethull/english_monarch_marriages)!

> this dataset focuses on the names, ages, and marriages of various 'kings' and 'consorts'. the data ranges all the way back to 850 where the details are a bit fuzzy, spanning all the way to current day. names contain special characters; ages & years can be a bit of a regex puzzle as well. additionally, the age of kings and consorts may show quite a bit of an age gap.

The data was scraped from [Ian Visits](https://www.ianvisits.co.uk/articles/a-list-of-monarchs-by-marriage-6857/).
jonthegeek marked this conversation as resolved.
Show resolved Hide resolved
11 changes: 11 additions & 0 deletions data/curated/enmoma/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
title: English Monarchs and Marriages | Names, Ages, and Historical Years
frankiethull marked this conversation as resolved.
Show resolved Hide resolved
article:
title: monarchs and marriages
url: github.com/frankiethull/english_monarch_marriages
data_source:
title: A list of Monarchs by marriage
url: https://www.ianvisits.co.uk/articles/a-list-of-monarchs-by-marriage-6857/
images:
# Please include at least one image, and up to three images
- file: gg_enmoma.png
alt: Chart showing the relationship between the year of marriage and the age of king/consort. The x-axis lists the years and the y-axis represents the age range. The chart indicates that the average age of marriage for a consort being much lower than king's age
9 changes: 9 additions & 0 deletions data/curated/enmoma/saving.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
source("data/curated/enmoma/cleaning.R")
jonthegeek marked this conversation as resolved.
Show resolved Hide resolved

# like this?
dir_name <- "data/curated/enmoma/english-monarchs-and-marriages.csv"

# ??ttsave ??
# how do I use ttsave, tried looking around tidytuesdayR::: ?
# ttsave(df, dir_name = dir_name)
readr::write_csv(df, dir_name)