Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

357 stat methods vignette #365

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions vignettes/articles/creating-ards-for-stats-methods.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
---
title: "Creating ARDs with statistical methods"
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

The basic functions offered by {cards} can create a wide variety of ARDs. However, sometimes we may need to include the outputs from more complicated statistical methods in our ARDs. In this article we'll look at a few different ways to implement the output from these statistical methods.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put each sentence on a new line? It makes it easier to track changes to the article in the future.


## {cardx}

The {cardx} package is an extension of {cards}. The idea is that {cards} provides the core functions to create ARDs, while {cardx} contains a large number of extensions that implement various, commonly used statistical methods. There are a large number of extensions for a wide variety of methods, including (but not limited to):

* ANOVA
* Chi-squared Test
* t-test
* LS Mean Difference
* Survival Estimates and Differences


When looking to include the output from a statistical method your first port of call should be to see if it has already been implemented in {cardx}. You can find the full list of available functions [here](https://insightsengineering.github.io/cardx/main/reference/index.html).


As an example, let's consider a simple t-test to compare the mean age (variable `AGE`) across two treatments arms (`ARM`).

In {cardx} we have the function `ard_stats_t_test`

```{r cardx}
pharmaverseadam::adsl |>
dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
cardx::ard_stats_t_test(by = ARM, variables = AGE)
```

In the output we see all of the different outputs from the t-test; the mean difference, confidence interval limits and p-value. It's also really useful to see the functions inputs, for example we can see that we didn't use equal variances as the `stat` is `FALSE` for `stat_name` `var.equal`. This is useful for re-use, if we need to run the test again we can use the ARD to see what options we need to use to recreate the result.

# But what do we do if the statistical method that we want to use hasn't been implemented already in {cardx}?

Implementing a new statistical method in {cards} to output an ARD of it's results and inputs is really easy, all we need to do is write a function that outputs all of the information we want to include as a named list!

We'll first look at how `broom::tidy` can make this even easier for us, then will also give an example on how to implement from scratch.

## Implementing a a stats method using `broom::tidy`

The majority of commonly used statistical methods outputs are able to be passed through `broom::tidy`, which will convert the output into a tibble (which is just fancy named list!). We can then pass the output of `broom::tidy` through to the `statistic` argument of the ARD function we wish to use, this leads to an ARD output like we see in the above example, where we have one row per relevant input or output from the statistical method.

Let's extend our t-test example from above. This time we want to carry out a one-sample t-test. This isn't implemented in {cardx} (it is now but let's just pretend...). We can just pass the code to carry out the one-sample t-test and pass the output through `broom::tidy` to the `statistic` argument like so:

```{r tidy-stats-method}
pharmaverseadam::adsl |>
dplyr::filter(ARM %in% c("Xanomeline High Dose", "Xanomeline Low Dose")) |>
cards::ard_continuous(
variables = AGE,
statistic = everything() ~ list(t_test = \(x) t.test(x) |> broom::tidy())
) |>
dplyr::mutate(context = "t_test_one_sample")
```

Over 100 different statistical methods implemented in R are able to be 'tidied' using `broom::tidy`. However, the method you aim to use might not be, or the current `broom::tidy` implementation might not contain the information that you need to be in your ARD. In that case we'll have to format the output ourselves.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the distinction that only broom::tidy() methods that return a single row of results work here?


## Implementing a new stats method from scratch

As we mentioned above, we just need to define a function which carries out or required statistical method and outputs a named list of the information we wish to include in the ARD.

As an example, let's write a function which carries out a Wilcoxon signed rank test over one variable using the function `wilcox.test`. As an output we just want to record the method and the p-value.

```{r wilcox-function}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be easier to read as wilcox_one_var <- \(x) wilcox.test(x)[c("method", "p.value")]

wilcox_one_var <- function(x){

wilcox_results <- wilcox.test(x,
mu = median(x))

return(
wilcox_results[c("method", "p.value")]
)

}
```

Let's now use this function when creating an ARD with {cards}. Remember we just need the statistic to be a named list, so we'll call our function inside a named list. We also don't need to specify any arguments, in this case it will pick up that the one variable `x` corresponds to the data we are testing, in this case `AGE` for the individual treatment arms.

```{r wilcox-with-cards}
pharmaverseadam::adsl |>
cards::ard_continuous(
variables = AGE,
by = ARM,
statistic = ~ list(wilcox = wilcox_one_var)
)
```
We see here that we get an output of 8 rows, 2 rows (one for the method, and one for the p-value) for each of the 4 treatment arms.

# Handling errors

Let's consider what happens when we encounter an error in our statisical method.

```{r wilcox-fn-error}
wilcox_one_var_error <- function(x){

stop("AN ERROR!")

wilcox_results <- wilcox.test(x,
mu = median(x))

return(
wilcox_results[c("method", "p.value")]
)

}

pharmaverseadam::adsl |>
cards::ard_continuous(
variables = AGE,
by = ARM,
statistic = ~ list(wilcox = wilcox_one_var_error)
)
```

In the output we see that we only get 4 rows of output, the error has been stored in the `error` column but `stat_name` and `stat_label` now just take the list name of "wilcox" that we define in the statistic argument. This could have unintended effects in downstream code, we may be relying on the `stat_name` and `stat_label` having values of "method" and "p-value", or just that the output has 2 rows per treatment arm.

To handle this we can specify the expected results from our function, so that even if we encounter an error during the code run we can be assured that the output will be of a consistent format so as not to impact downstream code.

Here's an example of how to specify the expected output using `as_cards_fn`:

```{r as_cards_fn}
wilcox_one_var_error <- as_cards_fn(function(x) {
stop("AN ERROR!")

wilcox_results <- wilcox.test(x,
mu = median(x))

return(wilcox_results[c("method", "p.value")])

},
stat_names = c("method", "p.value"))

pharmaverseadam::adsl |>
cards::ard_continuous(
variables = AGE,
by = ARM,
statistic = ~ list(wilcox = wilcox_one_var_error)
)
```

Our function becomes the first argument to `as_cards_fn`, then the second argument is `stat_names` where we specify the expected names of the output list.

In the output shown here, the `error` column is still populated with the error. However, now we have the expected 8 rows and we can see that the `stat_name` and `stat_label` match the values specified in the `stat_names` argument in the `as_cards_fn()` - helping us to avoid problems in code that relies in this output.
Loading