analysis/supplementary-materials/supp-mat.qmd

---
title: "Supplementary Materials"
execute: 
  echo: false
  warning: false
knitr: 
  opts_chunk: 
    message: false
format: 
  html:
    format-links: false
    echo: true
    toc: true
    code-fold: true
    embed-resources: true
  pdf:
    toc: true
bibliography: "../paper/references.bib"
---

```{r}
#| label: setup
#| include: false
library(here)
library(readr)
devtools::load_all()

# upload data
# metadata <- read_tsv(here("analysis/data/raw_data/metadata.tsv"))
# demography <- read_csv(here("analysis/data/raw_data/demography.csv"))
# lloq <- read_tsv(here("analysis/data/raw_data/lloq.tsv"))
# uhplc_data_comb <- read_csv(here("analysis/data/derived_data/uhplc-data_combined.csv"))
# #uhplc_calculus <- read_csv(here("analysis/data/derived_data/uhplc-calculus_cleaned.csv"))
# dental_inv <- read_csv(here("analysis/data/raw_data/dental-inv.csv"))
# caries <- read_csv(here("analysis/data/raw_data/caries.csv"))
# periodont <- read_csv(here("analysis/data/raw_data/periodontitis.csv"))
# periap <- read_csv(here("analysis/data/raw_data/periapical.csv"))
# calculus <- read_csv(here("analysis/data/raw_data/calculus.csv"))
# calculus_full <- read_csv(here("analysis/data/raw_data/calculus_full.csv"))
# sinusitis_clean <- read_csv(here("analysis/data/derived_data/sinusitis_cleaned.csv"))
# path_cond_clean <- read_csv(here("analysis/data/derived_data/path-conditions_cleaned.csv"))

source(here("analysis/scripts/setup-qmd.R"))
```

These supplementary figures and tables are a variable hodgepodge of things that
didn't fit in the main manuscript, and additional things I thought might be
useful. The best way to explore/verify the results and interpretations is to
download all the data and code (<https://doi.org/10.5281/zenodo.7649824>)
and just play around with it yourself. Enjoy!

## Samples

Calculus samples are a combination of leftovers from a previous aDNA study and
newly sampled individuals. In some cases individuals from the previous study were
sampled again (if not enough calculus was left over from the previous study)
(@tbl-sampling).

### Selection

```{r}
#| label: tbl-sampling
#| tbl-cap: "Table showing which individuals were sampled in this study and which individuals were sampled in the previous study. When both TRUE, the individual was sampled twice."
metadata %>% 
  select(id, element, KZ_element) %>% 
  mutate(
    element = case_when(is.na(element) ~ FALSE,
                   TRUE ~ TRUE),
    KZ_element = case_when(is.na(KZ_element) ~ FALSE,
                   TRUE ~ TRUE),
    ) %>%
  arrange(id) %>% 
  knitr::kable(col.names = c("ID", "this study", "previous study"))
```


### Demographics

```{r}
#| label: male-sample
males <- demography %>%
  filter(sex == "m" | sex == "pm")
```

The sample consists of `r nrow(demography)` individuals, most of which are
middle adult male individuals (@fig-age-distribution). Middle adult males were
preferentially targeted due to larger calculus deposits (observation) and the
sample age and sex distribution is therefore not
representative of the population. This was also done to limit potential confounding
factors, and because pipe notches, which served as a positive control for tobacco,
are predominantly seen in male individuals at the site.

```{r}
#| label: fig-age-distribution
#| fig-cap: "Distribution of age and sex in the sample. f = female; pf = probable female; pm = probable male; m = male; eya = early young adult (18-24 years); lya = late young adult (25-34 years); ma = middle adult (35-49 years); old = old adult (50+ years)."
demography %>%
  ggplot(aes(x = age, fill = sex)) +
    geom_bar() +
    theme_minimal()
```

### Missing data

An overview of the missing teeth can be found in @fig-missing-teeth. Missing
scores per tooth can be found in @tbl-missing-scores and @fig-missing-dental.

```{r}
#| label: fig-missing-teeth
#| fig-cap: "Heatmap of missing teeth per individual in the sample. 1 = present, 2 = missing."
dental_inv_long %>% 
  mutate(status = if_else(status == "p", 1, 0)) %>% 
  ggplot(aes(x = tooth, y = id, fill = status)) +
    geom_tile()
```


```{r}
#| label: missing-values-teeth
caries_missing <- caries %>%
  mutate(across(!id, ~ is.na(.x))) %>%
  select(!id) %>%
  colSums() %>% 
  as_tibble_row()

periodont_missing <- periodont %>%
  mutate(across(!id, ~ is.na(.x))) %>%
  select(!id) %>%
  colSums() %>% 
  as_tibble_row()

periap_missing <- periap %>%
  mutate(across(!id, ~ is.na(.x))) %>%
  select(!id) %>%
  colSums() %>% 
  as_tibble_row()

missing_tbl <- tibble(caries_missing) %>% 
  add_case(periodont_missing) %>% 
  add_case(periap_missing) %>% 
  mutate(score = c("caries", "periodontitis", "periapical"), .before = 1)
```

```{r}
#| label: tbl-missing-scores
#| tbl-cap: "Table of missing scores by tooth."
knitr::kable(missing_tbl)
```

```{r}
#| label: fig-missing-dental
#| fig-cap: "Plots"
#| layout-ncol: 2
#| fig-subcap: 
#|   - "Caries"
#|   - "Periodontitis"
#|   - "Periapical lesions"
#|   - "Combined caries, periodontitis, periapical."
caries %>%
  mutate(across(!id, ~ is.na(.x))) %>% 
  pivot_longer(-id, names_to = "tooth") %>% 
  mutate(value = if_else(value == TRUE, "missing", "present")) %>% 
  ggplot(aes(x = tooth, y = id, fill = value)) +
    geom_tile()

periodont %>%
  mutate(across(!id, ~ is.na(.x))) %>% 
  pivot_longer(-id, names_to = "tooth") %>% 
  mutate(value = if_else(value == TRUE, "missing", "present")) %>% 
  ggplot(aes(x = tooth, y = id, fill = value)) +
    geom_tile()

periap %>% 
  mutate(across(!id, ~ is.na(.x))) %>% 
  pivot_longer(-id, names_to = "tooth") %>% 
  mutate(value = if_else(value == TRUE, "missing", "present")) %>% 
  ggplot(aes(x = tooth, y = id, fill = value)) +
    geom_tile()

dental_long %>%
  mutate(across(c(caries, periodont, periap), is.na)) %>%
  select(c(caries, periodont, periap, id, tooth)) %>%
  #group_by(id, tooth) %>% 
  rowwise() %>% 
  mutate(missing = sum(caries, periodont, periap)) %>% 
  ggplot(aes(x = tooth, y = id, fill = missing)) +
    geom_tile()
```


## UHPLC analysis

```{r}
#| label: setup-uhplc

# presence/absence data frame
uhplc_calculus_bin <- uhplc_calculus_long %>%
  mutate(presence = if_else(quant > 0, 1, quant))

# successfully replicated samples only
uhplc_calculus_replicated <- uhplc_calculus_bin %>%
  mutate(compound = str_remove(compound, "_calc")) %>%
  group_by(id, sample, compound) %>% # combine batches
  summarise(presence = sum(presence)) #%>%


uhplc_calculus_replicated <- uhplc_calculus_bin %>%
  filter(id %in% filter(metadata, replicated == TRUE)$id) %>%
  group_by(id, sample, compound) %>% # combine batches
  summarise(presence = sum(presence)) %>%
  filter(presence == 0 | presence == 2) %>% # remove compounds only detected in one batch
  group_by(compound) %>%
  mutate(presence = if_else(presence == 0, 0, 1)) %>%  # convert replications to presence/absence
  ungroup()

uhplc_replicated_wide <- uhplc_calculus_replicated %>%
  mutate(compound = case_when(compound == "nicotine" ~ "tobacco",
                              compound == "cotinine" ~ "tobacco",
                              TRUE ~ compound)) %>%
  group_by(id, sample, compound) %>%
  summarise(presence = sum(presence)) %>% # combine nicotine and cotinine
  #remove_missing() %>%
  mutate(presence = case_when(presence > 0 ~ TRUE,
                              TRUE ~ FALSE)) %>%
  pivot_wider(names_from = "compound", values_from = "presence")
```


The UHPLC-MS/MS method was validated in a separate study on cadavers received
for forensic autopsy and toxicological analysis. Results from dental calculus were
validated against compounds detected in whole blood samples from the same
individuals [@sorensenDrugsCalculus2021].  
In the original method, samples were washed three times to remove residual
substances from the surface of the calculus that originated from oral fluids,
and only extract substances from the calculus. In our samples the washes served
to remove potential contaminants from the burial environment and post-excavation
handling.

Briefly, dental calculus was treated with citric acid and the dissolution extracts
were cleaned using weak and strong polymeric cation-exchange sorbents. Samples
were washed with 0.5 mL MeOH for 10 seconds. Samples were weighed before and after
each wash. The wash solvent was evaporated to a residual volume of 10 µl and added
50µl 30% methanol.
Samples were air-dried for 24 hours at room temperature after each wash.
Extracts from each wash were analysed by injecting 5 µL into the column
on an Exion UHPLC system that consisted of two Exion AD pumps, an Exion AD
multiplate autosampler set at 10 $\pm$ 2 &deg;C and an Exion AC column oven set
at 40 $\pm$ 2 &deg;C (Sciex, Ontario, Canada).
Separation was performed using a Raptor Biphenyl UHPLC column (2.7 mm, 2.1 mm I.D.
$\times$ 100 mm) (Restek, Bellefonte, PA). The mass spectrometer was a Sciex QTRAP
6500+ with a TurboIonSpray probe for electrospray ionisation.  
The remaining calculus was dissolved using lysing tube beads in 800 $\mu$L of 0.5
$\small{M}$ citric acid (CA) and 50 $\mu$L stable isotope-labelled analogue used
as internal standards (SIL-IS) solution for 1 h at ambient
temperature with gentle shaking. The suspension was then mixed with 800 $\mu$L
MeOH and centrifuged at 10,000 $\times$ g for 5 mins, and analysed by the same
method as the wash extracts.
Data analysis was performed using Analyst 1.7 and MultiQuant 3.0.3 (Sciex).
Raw quantities of compounds are presented in ng and concentrations as ng / mg.

The samples in the replication batch were processed in the same way, but
analysed on different equipment used exclusively for oral samples.

Raw quantities of compounds detected in the dissolved calculus from batches
1 and 2 are presented in @tbl-uhplc-batch-1 and @tbl-uhplc-batch-2. Since
these tables may or may not be legible in PDF format, not to mention that they
don't adhere to FAIR principles in this format, the raw data can be downloaded
from Zenodo (https://doi.org/10.5281/zenodo.8061483).

```{r}
#| label: tbl-lloq
#| tbl-cap: "Target compounds and lower limits of quantitation (LLOQ)."
lloq %>%
  arrange(compound) %>% 
  knitr::kable(col.names = c("Target", "LLOQ"))
```

```{r}
#| label: tbl-uhplc-batch-1
#| tbl-cap: "Results from the UHPLC analysis first batch. Quantity of compound in the dissolved calculus, represented in ng and rounded to 3 digits after the decimal."
uhplc_calculus_long %>%
  filter(batch == "batch1") %>%
  mutate(quant = round(quant, 3)) %>% 
  select(!c(sample, conc, extraction, batch, presence)) %>%
  pivot_wider(names_from = "compound", values_from = "quant") %>% 
  knitr::kable()
```

```{r}
#| label: tbl-uhplc-batch-2
#| tbl-cap: "Results from the UHPLC analysis second batch. Quantity of compound in calculus after third wash, represented in ng and rounded to 3 digits after the decimal."
uhplc_calculus_long %>%
  filter(batch == "batch2") %>%
  mutate(quant = round(quant, 3)) %>% 
  pivot_wider(names_from = "compound", values_from = "quant") %>%
  select(!c(extraction, batch)) %>% 
  knitr::kable()
```

### Authentication

No modern synthetic drugs were detected in any of the samples.

Samples were replicated to verify results from the initial analysis. Of the
`r nrow(demography)` samples initially analysed,
`r nrow(filter(metadata, replicated == TRUE))` samples were replicated.

Only caffeine, theophylline, nicotine, cotinine, and salicylic acid were found
in the replicated samples.

Most plots show a large increase in extracted mass of a compound between the
calculus wash extracts (wash 1-3) and the dissolved calculus (calc). Most samples
containing theophylline and caffeine had the largest quantity of the compound
extracted from the first wash, then decreasing in washes 2 and 3. There is
an increase between wash 3 and the dissolved calculus in all samples.
The patterns are consistent across batches 1 and 2. The pattern we expect to see
in a sample is a reduction in the quantity from wash 1 to wash 3, and then another
spike in the final extraction from the dissolved calculus, which means the compound
is actually 'ancient' or authentic. The compounds that are completely absent in
all three washes and present in high quantities in the final extraction may also
be suggestive of lab contamination.
This has not been thoroughly tested and is
only based on what we expect to see. Therefore, the interpretation of these graphs
is itself up for interpretation.

```{r}
#| label: fig-auth-plot-batch1
#| fig-cap: "Plot of extracted quantities of each compound across the three washes and calculus extraction in batch 1. Each line represents an individual."
uhplc_data_long %>%
  filter(
    batch == "batch1",
    ) %>% 
  semi_join(quant_filter, by = c("sample", "compound")) %>% 
  mutate(
    extraction = factor(extraction, levels = c("wash1", "wash2", "wash3", "calc")),
    sample = as.factor(sample)
    ) %>% 
  ggplot(aes(x = extraction, y = quant, group = sample, colour = sample)) +
    geom_line() +
    geom_point(size = 0.2) +
    facet_wrap(~ compound, scales = "free_y", ncol = 3) +
    theme_bw()
```

```{r}
#| label: fig-auth-plot-batch2
#| fig-cap: "Plot of extracted quantities of each compound across the three washes and calculus extraction in batch 2. Each line represents an individual."
uhplc_data_long %>%
  filter(
    batch == "batch2",
    !compound %in% c("cbd", "cbn", "cocaine", "thc", "thca-a", "thcva") # remove compounds not detected in batch 2
    ) %>% 
  semi_join(quant_filter, by = c("sample", "compound")) %>% # remove compounds not detected in each sample 
  mutate(
    extraction = factor(extraction, levels = c("wash1", "wash2", "wash3", "calc")),
    sample = as.factor(sample)
    ) %>% 
  ggplot(aes(x = extraction, y = quant, group = sample, colour = sample)) +
    geom_line() +
    geom_point(size = 0.2) +
    facet_wrap(~ compound, scales = "free_y", ncol = 2) +
    theme_bw()
```

### Quantity vs. sample weight

There is no clear relationship between the sample weight and the amount of compound
detected, except for salicylic acid, where the amount of extracted compound increases
with increasing sample weight. In batch 2 there is also a slight positive trend
for caffeine, nicotine, and cotinine.
Nicotine and cotinine display the same relative relationship between samples. Where
the nicotine quantity is high compared to other samples, the cotinine quantity
will be similarly high (@fig-quant-weight-1 and @fig-quant-weight-1).

The positive correlation between the weight of the calculus
sample and recovered quantities of the compounds suggests sample weight may affect
the ability to detect compounds; although, we were able to detect
compounds in samples as small as 2 mg (@fig-quant-weight-1 and @fig-quant-weight-2).

```{r}
#| label: fig-quant-weight-1
#| fig-cap: "Quantity of a compound (ng) found in a sample plotted against the weight of the calculus sample. Results from batch 1."
uhplc_data_comb %>%
  
  select(sample, contains("batch1")) %>% 
  select(sample, batch1_weight, contains("calc")) %>% 
  pivot_longer(
    -c(sample, batch1_weight),
    names_to = c("compound", "batch"),
    names_pattern = "(.*)_(.*)",
    values_to = c("conc")
  ) %>% 
  filter(conc > 0) %>% 
  mutate(compound = str_remove(compound, "_calc")) %>%
  ggplot(aes(x = batch1_weight, y = conc, col = as.factor(sample))) +
    geom_point() +
    facet_wrap(~ compound, scales = "free_y") +
    theme_bw() +
    theme(legend.position = "none") +
    labs(x = "Calculus weight (mg)", y = "Quantity (ng)")
```

```{r}
#| label: fig-quant-weight-2
#| fig-cap: "Quantity of a compound (ng) found in a sample plotted against the weight of the calculus sample. Results from batch 2."
uhplc_data_comb %>%
  select(sample, contains("batch2")) %>% 
  select(sample, batch2_weight, contains("calc")) %>% 
  pivot_longer(
    -c(sample, batch2_weight),
    names_to = c("compound", "batch"),
    names_pattern = "(.*)_(.*)",
    values_to = c("conc")
  ) %>% 
  mutate(compound = str_remove(compound, "_calc")) %>%
  #remove_missing() %>% 
  filter(conc > 0) %>% 
  ggplot(aes(x = batch2_weight, y = conc, col = as.factor(sample))) +
    geom_point() +
    facet_wrap(~ compound, scales = "free_y") +
    theme_bw() +
    labs(y = "Quantity (ng)", x = "Calculus weight (mg)", col = "Sample number")
```


### Distribution of compounds detected in the samples

<!-- absolute counts in each batch -->

```{r}
#| label: fig-compounds-detect
#| fig-cap: "Number of individuals in which each compound was detected between batch 1 and 2."

uhplc_calculus_long %>%
  filter(quant > 0) %>% 
ggplot(aes(x = compound, fill = compound)) +
    geom_bar() +
    facet_wrap(~ batch) +
    theme_bw() +
    theme(
      axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
      legend.position = "none"
      )
```

The replication showed that caffeine, theophylline, cotinine, nicotine, and
salicylic acid could be consistently detected in the samples, although theophylline
detection decreased between batches 1 and 2. CBD, CBN, cocaine, and THCA-A was not
detected at all in the second batch.

<!-- absolute counts in the replicated individuals -->

```{r}
#| label: fig-compounds-detect2
#| fig-cap: "Number of individuals in which each compound was detected between batch 1 and 2. Only showing replicated individuals."

uhplc_calculus_long %>%
  filter(id %in% filter(metadata, replicated == T)$id,
         quant > 0) %>%
  ggplot(aes(x = compound, fill = compound)) +
    geom_bar() +
    facet_wrap(~ batch) +
    theme_bw() +
    theme(
      axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
      legend.position = "none"
      )
```

### Detection and preservation

To see if preservation of the skeletal remains had any effect on the detection of
compounds, absolute quantities of compounds were compared to the various levels of
preservation.

```{r}
#| label: fig-detection-preservation
#| fig-cap: "Plot of relationship between the absolute quantity of a detected compound (ng) and the overall skeletal preservation of the individuals in which the compound was detected. Showing results for batch 1."
uhplc_calculus_long %>%
  filter(
    !is.na(preservation),
    batch == "batch1",
    quant > 0,
    ) %>% 
  ggplot(aes(x = preservation, y = quant)) +
    geom_violin(aes(fill = preservation), alpha = 0.6) +
    geom_boxplot(width = 0.2) +
    facet_wrap(~ compound, scales = "free_y", labeller = labeller(compound = compound_names), dir = "v") +
    theme_bw() +
    theme(
      legend.position = "none",
      panel.grid.major.x = element_blank(),
      axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)
      ) +
    labs(x = "Preservation", y = "Quantity (ng)") +
    scale_fill_viridis_d()
```

```{r}
#| label: fig-detection-preservation2
#| fig-cap: "Plot of relationship between the absolute quantity of a detected compound (ng) and the overall skeletal preservation of the individuals in which the compound was detected. Showing results for batch 2."
uhplc_calculus_long %>%
  filter(
    !is.na(preservation),
    quant > 0,
    batch == "batch2",
    #!compound %in% c("cbd", "cbn", "cocaine", "thc", "thca-a", "thcva"), # remove compounds not detected
    ) %>% 
  ggplot(aes(x = preservation, y = quant)) +
    geom_violin(aes(fill = preservation), alpha = 0.6) +
    geom_boxplot(width = 0.2) +
    facet_wrap(~ compound, scales = "free_y", labeller = labeller(compound = compound_names), dir = "v") +
    theme_bw() +
    theme(
      legend.position = "none",
      panel.grid.major.x = element_blank(),
      axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)
      ) +
    labs(x = "Preservation", y = "Quantity (ng)") +
    scale_fill_viridis_d()


```

Distribution of state-of-preservation in batches 1 and 2 to make sure the number
of skeletons are not affecting the relationships shown above. Given our sample
contains a smaller number of individuals with fair preservation, this may bias
our interpretations (@fig-preservation-detection).

```{r}
#| label: fig-preservation-detection
#| fig-cap: "Plot of the number of skeletons in each state of preservation separated by batch."
uhplc_calculus_long %>% 
  ungroup() %>% 
  distinct(id, batch, .keep_all = TRUE) %>%
  remove_missing(vars = c("preservation", "weight")) %>% 
  ggplot(aes(x = preservation)) +
    geom_bar() +
    facet_wrap(~ batch) +
    theme_bw()
```


### Detection of tobacco

Given that pipe notches are present in the majority of individuals, the presence
of pipe notch(es) in an individual and concurrent detection of nicotine and/or
cotinine is used as a rough indicator of the accuracy of the method.

```{r}
#| label: fig-corr-notches
#| fig-cap: "Plot of relationships between the total number of pipe notches in an individual and the concentration of detected compounds. The only relevant comparisons in this case are nicotine and cotinine. The others are just included because I couldn't be bothered filtering them out."
# correlation between nicotine and cotinine conc, and pipe notches
uhplc_calculus_long %>%
  filter(compound %in% c("caffeine", "cotinine", "nicotine", "salicyl", "theophyl")) %>% 
  ggplot(aes(x = pipe_notch, y = conc)) +
    geom_point() +
    facet_wrap(~ compound + batch, scales = "free_y") +
    theme_bw() +
    labs(x= "number of pipe notches", y = "detected compound concentration (ng/mg)")
```


```{r}
#| label: tobacco-accuracy-setup
tobacco <- uhplc_calculus_long %>%
  filter(compound %in% c("nicotine", "cotinine")) %>% 
  mutate(detection = if_else(is.na(conc), FALSE, TRUE))

tobacco <- uhplc_calculus_long %>%
  filter(
    #batch == "batch2",
    compound %in% c("nicotine", "cotinine"),
    sex == "m" | sex == "pm"
    ) %>% 
  mutate(detection = if_else(quant == 0, 0, 1)) # 0 = not detected; NA = not included in batch 2.

tobacco_accuracy <- tobacco %>%
  remove_missing(vars = "quant") %>%
  group_by(sample, id, .drop = F) %>% 
  summarise(
    detection = sum(detection),
    ) %>% 
  left_join(select(demography, id, pipe_notch, preservation, age), by = "id") %>% 
  mutate(
    pipe_notch = if_else(pipe_notch > 0, "Y", "N"),
    correct = case_when(
      detection == 0 & pipe_notch == "N" ~ 1,
      detection > 0 & pipe_notch == "Y" ~ 1,
      detection > 0 & pipe_notch == "N" ~ NaN, # no way of knowing macroscopically if the person smoked without pipe
      TRUE ~ 0
    )
  )


accuracy_age <- tobacco_accuracy %>% # move to supplementary material
  group_by(age) %>% 
  summarise(mean = mean(correct, na.rm = T))


tobacco_comb <- tobacco %>% 
  group_by(batch, sample, id, .add = TRUE) %>%
  summarise(
    detection = sum(detection), 
    .groups = "keep" # combine compounds: 0 = none detected; 1 = 1 detected; 2 = both detected
  ) %>%
  ungroup() %>%
  left_join(select(demography, id, pipe_notch), by = "id") %>%
  mutate(
    pipe_notch = if_else(pipe_notch > 0, "Y", "N"),
    correct = case_when(
      detection > 0 & pipe_notch == "Y" ~ TRUE,
      detection == 0 & pipe_notch == "N" ~ TRUE,
      detection > 0 & pipe_notch == "N" ~ NA, # can't be sure if this is true or not
      TRUE ~ FALSE
      )
    )

nicotine <- tobacco %>%
  filter(compound == "nicotine") %>% 
  mutate(
    pipe_notch = if_else(pipe_notch > 0, "Y", "N"),
    correct = case_when(
      detection > 0 & pipe_notch == "Y" ~ TRUE,
      detection == 0 & pipe_notch == "N" ~ TRUE,
      detection > 0 & pipe_notch == "N" ~ NA, # can't be sure if this is true or not
      TRUE ~ FALSE
    )
  )

cotinine <- tobacco %>%
  filter(compound == "cotinine") %>% 
  mutate(
  correct = case_when(
    detection > 0 & pipe_notch == "Y" ~ TRUE,
    detection == 0 & pipe_notch == "N" ~ TRUE,
    detection > 0 & pipe_notch == "N" ~ NA, # can't be sure if this is true or not
    TRUE ~ FALSE
    )
  )

# accuracy in replicated samples
uhplc_accuracy_batch2 <- uhplc_calculus_long %>%
  filter(compound == "nicotine" | compound == "cotinine",
         batch == "batch2") %>%
  group_by(id) %>% 
  arrange(desc(presence), .by_group = T) %>%
  distinct(id, .keep_all = T) %>% 
  mutate(
    pipe_notch = if_else(pipe_notch > 0, "Y", "N"),
    correct = case_when(presence == 1 & pipe_notch == "Y" ~ TRUE,
                             presence == 0 & pipe_notch == "N" ~ TRUE, # can we be sure if this is correct?
                             presence == 1 & pipe_notch == "N" ~ NA, # can't be sure if this is true or not
                             TRUE ~ FALSE))

# ratio of nicotine to cotinine
tobacco_ratio <- tobacco %>%
  select(id, batch, compound, conc) %>% 
  pivot_wider(names_from = compound, values_from = conc) %>% 
  group_by(id, batch) %>% 
  summarise(ratio = cotinine / nicotine) %>%
  remove_missing() %>% 
  left_join(select(demography, id, age, sex, preservation, pipe_notch)) %>% 
  filter(batch == "batch2",
         ratio != Inf) %>% 
  mutate(
    age = case_when(
      age == "eya" ~ 0,
      age == "lya" ~ 1,
      age == "ma" ~ 2,
      age == "old" ~ 3
    ),
    preservation = case_when(
      preservation == "fair" ~ 0,
      preservation == "good" ~ 1,
      preservation == "excellent" ~ 2
    )
  )
```

We found no correlation between the number of pipe notches
and the concentration of nicotine and cotinine, suggesting that our ability
to detect tobacco consumption in dental calculus does not necessarily rely on
targeting frequent smokers; here, we consider individuals with multiple pipe notches
as likely to have been heavy smokers.  

No apparent correlation between the number of pipe notches
and the concentration of nicotine or cotinine (@fig-corr-notches).

The presence of pipe notch(es) in an individual and concurrent detection of nicotine
and/or cotinine is used as a crude indicator of the accuracy of the method. When
combining the results of both batches, the method was able to detect some form
of tobacco in
`r nrow(filter(tobacco_accuracy, pipe_notch == "Y", detection > 0))`
of `r nrow(filter(tobacco_accuracy, pipe_notch == "Y"))`
individuals with a pipe notch
(`r scales::percent(nrow(filter(tobacco_accuracy, pipe_notch == "Y", detection > 0)) / nrow(filter(tobacco_accuracy, pipe_notch == "Y")), accuracy = 0.1)`).
When also considering correct the absence of a tobacco alkaloid together with the absence
of a pipe notch, the accuracy of the method is
`r scales::percent(mean(tobacco_accuracy$correct, na.rm = T), accuracy = 0.1)`.
Accuracy in the old adult age category is
`r scales::percent(filter(accuracy_age, age == "old")$mean, accuracy = 0.1)`.

In the replicated samples only, tobacco detection was successful in
`r nrow(filter(uhplc_accuracy_batch2, pipe_notch == "Y", presence > 0))`
out of
`r nrow(filter(uhplc_accuracy_batch2, pipe_notch == "Y"))`
pipe smokers
(`r scales::percent(nrow(filter(uhplc_accuracy_batch2, pipe_notch == "Y", presence > 0)) / nrow(filter(uhplc_accuracy_batch2, pipe_notch == "Y")), accuracy = 0.1)`)
Including individuals with absence of a pipe notch and concurrent absence of
compounds as a correct identification, gives an overall accuracy of
`r scales::percent(mean(uhplc_accuracy_batch2$correct, na.rm = T), accuracy = 0.1)`.

One individual---an old adult, probable female---was positive
for both nicotine and cotinine, and had no signs of a pipe notch.

## Dental analysis

Pipe notches were identified by wear on the mesial and distal
sides of the crowns between to teeth, resulting from the practice of clenching a
pipe between adjacent and isomeric teeth, and which differs from the occlusal
wear that occurs through mastication. Wear occurring between adjacent and isomeric
teeth were counted as a single pipe notch.

Some of the teeth were missing because they have been sent elsewhere for DNA
sampling. These teeth were considered present when determining antemortem loss
ratios, and absent when scoring caries, periodontitis, and calculus.

An overview of available teeth can be seen in @fig-dental-inv.

```{r}
#| label: fig-dental-inv
#| fig-cap: "Overview of the dental inventory of the sample. Teeth removed for DNA analysis considered 'present'."
dental_long %>%
  mutate(status = case_when(status == "dna" ~ "p", TRUE ~ status)) %>% 
  dental_plot(fill = status)
```

### AMTL

Ratios of antemortem lost teeth per present teeth at the site. Calculated per
individual (@tbl-aml-id), tooth class (@tbl-aml-class), and tooth type (@tbl-aml-type)

```{r}
#| label: tbl-aml-id
#| tbl-cap: "AMTL ratio per individual."
dental_long %>% 
  group_by(id) %>% 
  amtl_ratio(.status = status, .add = T) %>% 
  knitr::kable()
```

```{r}
#| label: tbl-aml-class
#| tbl-cap: "AMTL ratio per tooth class."
dental_long %>% 
  group_by(class) %>% 
  amtl_ratio(.status = status, .add = T) %>% 
  knitr::kable()
```

```{r}
#| label: tbl-aml-type
#| tbl-cap: "AMTL ratio per tooth type."
dental_long %>% 
  group_by(type) %>% 
  amtl_ratio(.status = status, .add = T) %>% 
  knitr::kable()
```

### Caries

Caries were scored as the location on each individual tooth. Multiple locations
on a single tooth were separated with `;`. The size of caries was also
recorded, but not used in further analysis. Large caries that cover multiple
surfaces with an unknown origin were recorded as 'crown'.

| code | surface |
|---|---|
| mes | mesial surface |
| dis | distal surface |
| lin | lingual surface |
| buc | buccal surface (including labial surface) |
| occ | occlusal surface (including incisal surface) |
| crown | caries covers 2+ surfaces |
| none | No caries visible on surface |
| NA | Not observable/tooth missing |

In the `r caries_ratio_site$n_teeth` that were examined, `r caries_ratio_site$count`
teeth had caries
(`r scales::percent(caries_ratio_site$ratio, accuracy = 0.1)`).
This frequency has very little meaning, and was further broken down into a ratio
for each individual and each tooth class (@tbl-caries-id-class and @fig-caries-class).
As expected, the molars have a higher frequency of caries than the other teeth.

```{r}
#| label: tbl-caries-id-class
#| tbl-cap: "Table of caries ratios per individual per tooth class."
caries_count %>%
  caries_ratio(.caries = count, id, class) %>% 
  knitr::kable()
```


```{r}
#| label: fig-caries-class
#| fig-cap: "Plot of caries ratios calculated per individual per tooth class."
caries_count %>%
  caries_ratio(.caries = count, class, id) %>%
  ggplot(aes(x = class, y = ratio)) +
    geom_violin(aes(fill = class)) +
    geom_boxplot(width = 0.1) +
    theme_minimal() +
    theme(legend.position = "none") +
    labs(x = "tooth class", y = "caries ratio")
```


```{r}
#| label: fig-caries-fun
#| fig-cap: "Plot of caries rate per tooth in pooled sample from all individuals. Teeth reordered along the x-axis to match position in the mouth (yes, the plot is supposed to resemble a mouth)."
upper_order <- c(paste0("t", 18:11), paste0("t", 21:28))
lower_order <- c(paste0("t", 48:41), paste0("t", 31:38))

maxilla <- caries_count %>%
  filter(region == "maxilla") %>% 
  mutate(tooth = factor(tooth, levels = c(upper_order, lower_order))) %>% 
  group_by(tooth) %>% 
  summarise(
    n_teeth = n(),
    count = sum(count, na.rm = T),
    rate = count / n_teeth
    ) %>% 
  ggplot(aes(x = tooth, y = rate)) +
    geom_col(fill = "white") +
    theme_dark() +
    scale_y_reverse(limits = c(0.33,0), sec.axis = sec_axis(~.)) +
    scale_x_discrete(position = "top") +
    theme(
      axis.title.x = element_blank(),
      axis.ticks.x = element_blank(),
      axis.line = element_line(colour = "red", size = 1),
      axis.line.x.top = element_line(colour = "red", size = 4)
      ) +
  labs(y = "")

mandible <- caries_count %>%
  filter(region == "mandible") %>% 
  mutate(tooth = factor(tooth, levels = c(upper_order, lower_order))) %>% 
  group_by(tooth) %>% 
  summarise(
    n_teeth = n(),
    count = sum(count, na.rm = T),
    rate = count / n_teeth
    ) %>% 
  ggplot(aes(x = tooth, y = rate)) +
    geom_col(fill = "white") +
    scale_y_continuous(sec.axis = sec_axis(~.)) +
    theme_dark() +
    theme(axis.line = element_line(colour = "red", size = 1),
          axis.line.y.right = element_line(colour = "red", size = 1),
          axis.line.x.bottom = element_line(colour = "red", size = 6),
          axis.ticks.x = element_blank()) +
    labs(y = "caries ratio")
    

maxilla / mandible + plot_layout(guides = "collect")
```


### Periodontitis

Periodontitis was scored qualitatively on a scale from 0-3 as the amount of
horizontal bone loss from the CEJ to the alveolar bone, accounting for ca. 2mm
of gingival thickness. The distribution of scores in the pooled sample dentitions
can be seen in @fig-periodont-scores.

```{r}
#| label: fig-periodont-scores
#| fig-cap: "Distribution of periodontitis scores in each tooth (FDI notation) in the pooled sample."
periodont %>% 
  dental_longer(-id) %>% 
  remove_missing(vars = "score") %>% 
  dental_plot(fill = score)
```


### Calculus

```{r}
calc_index <- calculus_full %>%
  dental_longer(-id) %>% 
  calculus_index()
```

Calculus was scored on each tooth surface (interproximal surfaces were given a single score)
on a scale of 0-3, representing absence of calculus (0) to heavy deposit (3).
Distribution of individual calculus indices within the sample, separated by
quadrant shows that the lower anterior quadrant had the largest deposits (@fig-calculus-quad).

No apparent influence of lower anterior calculus index on the presence/absence
of a compound (or vice versa) (@fig-calc-compound).

```{r}
#| label: fig-calculus-quad
#| fig-cap: "Calculus index per quadrant. LA = lower anterior, LP = lower posterior, UA = upper anterior, UP = upper posterior."
calc_index %>%
  ggplot(aes(x = quadrant, y = calc_index)) +
    geom_violin(aes(fill = quadrant), alpha = 0.6) +
    geom_boxplot(width = 0.1) +
    theme_minimal() +
    theme(panel.grid.major.x = element_blank())
```

```{r}
#| label: fig-calc-compound
#| fig-cap: "Relationship between the presence (1) or abesence (0) of a compound and the calculus index of the lower anterior quadrant of an individual."
calc_index %>%
  left_join(uhplc_calculus_long, by = "id") %>%
  filter(
    batch == "batch1",
    compound != "thc",
    compound != "cbd",
    compound != "thcva",
    quadrant == "LA"
    ) %>% 
  ggplot(aes(x = as.factor(presence), y = calc_index)) +
    geom_violin(aes(fill = as.factor(presence)), alpha = 0.6) +
    geom_boxplot(width = 0.1) +
    facet_wrap(~ compound) +
    labs(x = "Presence/absence", y = "Calculus index (LA)") +
    theme_bw() +
    theme(legend.position = "none")
```


## Pathological conditions

Pathological conditions and lesions that occur frequently in the population were
included in the analysis. Data were
dichotomised to presence/absence to allow statistical analysis. A conservative
approach was taken, so when in doubt, absence of a disease was assumed.
Osteoarthritis was considered present in cases where eburnation was visible
on one or more joint surfaces.
Vertebral osteophytosis is identified by marginal lipping and/or osteophyte
formation on the margin of the superior and inferior surfaces of the vertebral
body.
Cribra orbitalia was diagnosed based on the presence of pitting on the superior
surface of the orbit. No distinction was made between active or healing lesions.
Degenerative disc disease, or spondylosis, is identified as a large diffuse
depression of the
superior and/or inferior surfaces of the vertebral body [@rogersPalaeopathologyJoint2000].
Schmorl's nodes are identified as any cortical depressions on the surface of
the vertebral body. A note was made whether the lesion perforated the vertebral
margin, but both perforating and non-perforating lesions were recorded as present.

Data on chronic maxillary sinusitis from @casnaUrbanizationRespiratory2021 were
included in this study to assess the relationship between upper respiratory
diseases with environmental factors (i.e. tobacco smoke, caffeine consumption).
Chronic maxillary sinusitis (CMS) is the inflammation of the lower paranasal
sinuses, air-filled pockets located in the skull that defend the organism against
inhaled particulate matter and pathogens. This occurs through the production of
mucus carried by small hairs toward an opening situated on the superior part of
the sinus, where pathogens are drained [@slavinDiagnosisManagement2005]. Without
drainage, mucus begins to accumulate in the sinuses, providing an ideal environment
for bacterial growth and thereby contributing to inflammation of the mucous
membranes and subsequently of the bone surfaces [@jangBoneInvolvement2002].
Lesions associated with CMS as defined by @boocockMaxillarySinusitis1995 were
recorded for each individual and classified as "pitting", "spicule-type bone
formation", "remodeled spicules", or "white pitted bone". CMS was scored as absent
when the sinus presented smooth surfaces with little or no associated pitting.
To facilitate inspection, fragmented sinuses were cleaned using a dry tooth-brush
and water where necessary. If the sinuses were not observable with the naked eye,
they were examined with a flexible medical endoscope (Pentax, model: FNL-10RBS,
ø=4mm; view angle=30°) inserted through minor breaks naturally occurring on the
inferior nasal conchae and palatine bone, where the bone tissue is thinner.

<!-- description of other diseases
Osteoarthritis
Vertebral osteophytosis
Cribra orbitalia
Degenerative disc disease
Schmorl's nodes
-->

## Statistical analysis

### Point-biserial correlation

Point-biserial (Pearson) correlation was conducted on compound concentrations,
calculus index, caries ratio, and binary variables (@fig-pearson-corr).
This is done to see if any correlations exist prior to discretisation of continuous
variables. Irrelevant correlations (anything not between two continuous or a
continuous and binary variable) are removed from the plot.

```{r}
#| label: fig-pearson-corr
#| fig-cap: "Pearson correlation plot."
conc_cor %>%
  as_tibble(rownames = "var") %>% 
  mutate(across(!var, ~ if_else(.x == 1, NaN, .x))) %>%
  column_to_rownames("var") %>% 
ggcorrplot::ggcorrplot()
```

### Polychoric correlation

Before analysing the sample with a polychoric correlation (@tbl-polycorr and @fig-polycorr),
the calculus index and caries ratio for each individual was converted to an
ordinal variable by using quartiles, providing a score from 0--4.

```{r caries-calculus-quartiles}
caries_discrete
calculus_discrete
```

```{r}
#| label: tbl-polycorr
#| tbl-cap: "Table of polychoric correlations (rho)."
knitr::kable(polycorr$rho, digits = 3)
```

```{r}
#| label: fig-polycorr
#| fig-cap: "Heatmap of polychoric correlations (rho)."
ggcorrplot::ggcorrplot(polycorr$rho, type = "lower", show.diag = F)
```


The sample bias caused by targeting larger calculus deposits may also affect the detection of
compounds, as caffeinated drinks are often
acidic, and may cause a lower calculus formation.
We only found a weak negative correlation
between caffeine concentration and dental calculus index (@fig-pearson-corr),
but since we targeted individuals with
calculus, this may inadvertently have been controlled for during sample collection.

<!-- correlations on caries and calculus separated by quadrant/tooth type -->

## Data dictionary

All raw data are available for download from Zenodo (<https://zenodo.org/record/8061483>).

### metadata.csv

<!-- id	sample	element	KZ_sample	KZ_element	replicated	batch1_weight	batch2_weight -->

| variable | description |
|---|---|
| id | unique identifier for the individual |
| sample | sample number for the UHPLC-MS/MS analysis |
| element | which tooth (FDI notation) was sampled for UHPLC-MS/MS analysis |
| KZ_element | which tooth (FDI notation) was sampled for aDNA in original study |
| replicated | Whether the sample was included in the replication batch (TRUE/FALSE) |
| batch1_weight | weight (mg) of sample in batch 1 |
| batch2_weight | weight (mg) of sample in batch 2 |


### lloq.tsv

| variable | description |
|---|---|
| compound | name of target compound |
| lloq | Lower limit of quantitation |

### uhplc-results(_batch2).csv

| variable | description |
|---|---|
| sample | UHPLC-MS/MS sample number |
| weight | weight of calculus sample in ng before washes |
| weight_wash<1..3> | weight of calculus sample in ng following each wash |
| weight_avg | mean of weights |
| <compound>_wash<1..3> | Extracted quantity of compound in ng from washes |
| <compound>_calc | Extracted quantity of compound in ng from calculus |

### dental-inv.csv

Dental inventory

| variable | description |
|---|---|
| id | unique identifier for the individual |
| t11..t48 | status of tooth (FDI notation for variable name) |

Dental inventory key:

- p = present
- m = missing (for unknown reason - likely postmortem loss)
- aml = ante-mortem loss
- dna = previously removed for DNA sampling

### caries.csv

Caries lesions location

| variable | description |
|---|---|
| id | unique identifier for the individual |
| t11..t48 | location of caries lesion(s) |

Caries location key:

- none = no caries present
- mes = mesial
- dis = distal
- lin = lingual
- buc = buccal (and labial)
- occ = occlusal
- crown = crown (large caries lesion covering multiple surfaces)
- root = root
- blank = tooth not present

### periodontitis.csv

| variable | description |
|---|---|
| id | unique identifier for the individual |
| t11..t48 | periodontitis score (0-3) |

Periodontitis score:

- 0 = none
- 1 = slight
- 2 = moderate
- 3 = severe 

- blank = tooth not present

### periapical.csv

| variable | description |
|---|---|
| id | unique identifier for the individual |
| t11..t48 | location of periapical lesion |

- none = no lesion
- bucc = buccal
- lin = lingual
- perf = perforated alveolar bone
- blank = not scoreable

### calculus_full.csv

| variable | description |
|---|---|
| id | unique identifier for the individual |
| t11_bucc..t48_ip | calculus deposit size (0-3) per tooth surface (bucc = buccal; lin = lingual; ip = interproximal) |

- 0 = no calculus
- 1 = slight calculus
- 2 = moderate calculus
- 3 = heavy calculus

### path-conditions.csv

Pathological conditions

- OA = osteoarthritis
- IVDD = intervertebral disc disease
- TB = tuberculosis
- DISH = diffuse idiopathic skeletal hyperostosis
- VOP = vertebral osteophytosis
- SN = schmorl's nodes
- DDD = degenerative disc disease
- PNBF = periosteal new bone formation
- OD = osteochondritis dissecans
- CF = cribra femora
- CO = cribra orbitalia

### sinusitis.csv

| variable | description |
|---|---|
| id | Individual ID |
| CMS | Presence (YES) or absence (NO) of chronic maxillary sinusitis |
| IPR | Presence (YES) or absence (NO) of periosteal reaction on visceral surface of ribs |

### path-conditions.csv

| variable | description |
|----------|-------------|
| id | Individual ID |
| OA | Presence/absence of lesions related to osteoarthritis |
| IVDD | Presence/absence of lesions related to inter-vertebral disc disease |
|	TB | Presence/absence of lesions related to tuberculosis |
| Mastoiditis | Presence/absence of lesions related to mastoiditis |
| DISH | Presence/absence of lesions related to diffuse idiopathic skeletal hyperostosis |
| VOP | Presence/absence of lesions related to vertebral osteophytosis |
| SN | Presence/absence of lesions related to Schmorl's node(s) |
| DDD | Presence/absence of lesions related to degenerative disc disease |
| PNBF | Presence/absence of periosteal new bone formation |
| OD | Presence/absence of lesions related to osteochondritis dissecans |
| CF | Presence/absence of lesions related to cribra femora |
| CO | Presence/absence of lesions related to cribra orbitalia |

## Session information

This report was generated on `r Sys.Date()` using the following computational environment and dependencies:

```{r}
print(sessionInfo(), locale = F)
```

## References {-}