Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for user-specified & cross-experiment-consistent column ordering #181

Open
katossky opened this issue Jan 18, 2025 · 0 comments
Open

Comments

@katossky
Copy link

First thanks so much for the package. This is not criticism even though I am not really good at making it sound nice.

It is at the moment quite tedious to preserve the ordering of corr-generated plots in an other plot.

Say I want to try 2 different correlation measures, or two ways to pre-process my data, I could not found any straightforward way to plot the correlations in exactly the same variable order. I've just spent 1h30 trying to hack the resulting cor_df and ggplot object to get what I wanted without success. I had to build the plot from scratch, inspecting the autoplot code in order to reach my goal.

I see two ways forward :

  1. add examples of how to do that simply (hopefully there is an undocumented easy way)
  2. add an option to rearrange for user-specified order

Current strategy :

g <- iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE) |>
  corrr::rearrange() |>
  corrr::shave() |>
  corrr::stretch() |>
  dplyr::mutate(
    x = factor(x, levels = unique(x)),
    y = factor(y, levels = unique(y))
  ) |>
  dplyr::filter(!is.na(r)) |>
  ggplot() + aes(x = x, y = y, fill = r) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(limits = c(-1, 1)) +
  scale_x_discrete() +
  labs(x = NULL, y = NULL, fill = NULL) +
  coord_fixed() +
  theme_minimal() +
  theme(
    panel.grid = element_blank(), 
    axis.text.x = element_text(angle = 315, vjust = 1, hjust = 0)
  )
g

Now change the default correlation to spearman.

ordering <- levels(g$data$x)

iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE, method = "spearman") |> # new !!!
  corrr::stretch() |>
  dplyr::mutate(
      x = factor(x, levels = ordering), # new !!!
      y = factor(y, levels = ordering)  # new, needed for filtering !!!
  ) |>
  dplyr::filter(!is.na(r), as.integer(x) < as.integer(y)) |> # new !!!
  dplyr::mutate(
      y = factor(y, levels = rev(ordering)) # back to standard plottting !!!
  ) |>
  ggplot() + aes(x = x, y = y, fill = r) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(limits = c(-1, 1)) +
  scale_x_discrete() +
  labs(x = NULL, y = NULL, fill = NULL) +
  coord_fixed() +
  theme_minimal() +
  theme(
      panel.grid = element_blank(), 
      axis.text.x = element_text(angle = 315, vjust = 1, hjust = 0)
  )

Ideal strategy :

iris_corr <- iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE) |>
  corrr::rearrange(method = "PCA")
cols <- setdiff(colnames(iris_corr), "term") # long term ordering for side-to-side comparisons
autoplot(iris_corr)

# then variations
iris |>
  select(-Species) |>
  corrr::correlate(quiet = TRUE, method = "pearsons") |>
  autoplot(ordering = cols)

iris |>
  mutate(Sepal.Length = ifelse(Sepal.Length > 0, 0, Sepal.Length)) |> # or whatever
  select(-Species) |>
  corrr::correlate(quiet = TRUE) |>
  autoplot(ordering = cols) # if this path is chosen, maybe should ordering supersede method but there might be edge cases in programming where this might create problems ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant