Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strata ordering fails to respect factor levels #13

Open
corybrunson opened this issue Jan 19, 2018 · 7 comments
Open

strata ordering fails to respect factor levels #13

corybrunson opened this issue Jan 19, 2018 · 7 comments
Labels

Comments

@corybrunson
Copy link
Owner

corybrunson commented Jan 19, 2018

Originally raised in a comment on #6 . See example here.

@corybrunson
Copy link
Owner Author

corybrunson commented Jan 20, 2018

The problem is that some strata are used at multiple axes. Inside stat_alluvium(), when to_lodes() combines factor levels to create a single column comprising all strata, these levels are unique()ed, which puts them in order of appearance; "Level 2" and "Level 3" therefore appear first, among the other strata at axis1, while the other levels at axis2 appear after them. I'm not sure what the best way is to resolve this, e.g. annotate the strata to distinguish them or keep track of their positions and restore them later.

In the meantime, @svenhalvorson, you can dodge the problem either (a) by slightly changing the names of the strata at one axis:

levels(subsank_math$`Winter Projection`) <-
  paste0(levels(subsank_math$`Winter Projection`), " ")

or (b) by assigning the same, larger set of levels to both axes:

all_levels <- apply(
  expand.grid(c(" (+)", "", " (-)"), paste("Level", 5:1)),
  1,
  function(x) paste(rev(x), collapse = "")
)
subsank_math$subgroup <-
  factor(subsank_math$subgroup, levels = all_levels)
subsank_math$`Winter Projection` <-
  factor(subsank_math$`Winter Projection`, levels = all_levels)

@corybrunson
Copy link
Owner Author

corybrunson commented Jan 20, 2018

Current plan for to_lodes():

  1. Include a parameter, maybe axis.strata, with options TRUE to paste stratum values with axis values or FALSE (default) to put combined stratum levels in order of appearance. Also allow the user to provide a vector of combined stratum levels to use in the given order?
  2. If axis.strata is not set, and any stratum appears at more than one axis, then print a warning.
  3. Have stat_*() functions accept axis.strata and (a) pass it to to_lodes() if data is in alluvia format or (b) print a warning if data is already in lodes format.

@corybrunson
Copy link
Owner Author

An attempt to resolve this issue is at the order branch. It adds a parameter relevel.strata that can be logical (ignored if FALSE) or a character vector of (a subset of) the levels in the desired order. Some examples of its use can be found in ex-alluvial-data.r (accessible via help(to_lodes)).

I'm not especially fond of this patch, since it requires passing (yet another) cumbersome parameter to every ggalluvial layer. But the problem is likely to be rare enough that it shouldn't be a problem (and it can be easily programmed). Feedback would be very welcome!

@svenhalvorson
Copy link

Thank you corybrunson! I've got some beautiful graphs out of this package!

@corybrunson
Copy link
Owner Author

@svenhalvorson , that's great to know! Thank you for saying so. : )

Have you tried the relevel.strata parameter (still in the order branch)? I think what might work better is to give the user a few simple options:

  • NULL, the default and current behavior, to add new levels for each axis in sequence (in the order of their levels, if they are factor variables)
  • "appearance" to put them in order of appearance (first along strata within axes, then along axes)
  • "unique" to add a suffix (probably the axis number) to each duplicated stratum level before combining them, analogous to make.unique()

Anything more creative than that, and it's probably better for the user to handle it manually (as in the solution posted on SO). Does that sound reasonable?

@rbutleriii
Copy link

Hi @corybrunson , trying to accomplish something similar, but following along with the solutions and converting to lodes form I get errors that it isn't a recognized alluvial form, despite is_lodes_form returning true.

library(ggalluvial)
a <- data.frame(
  "Older structure" = c(
    "Class", 
    "Neighborhood", 
    "Neighborhood", 
    "Subclass", 
    "Cluster", 
    "None"
    ), 
  "Current structure" = c(
    "Neurotransmitter type", 
    "Neighborhood", 
    "Class", 
    "Subclass", 
    "Supertype", 
    "Cluster"
  ),
  N = rep(1, 6)
)
a$Older.structure <- factor(a$Older.structure, levels = unique(a$Older.structure))
a$Current.structure <- factor(a$Current.structure, levels = a$Current.structure)
ggplot(data = a, aes(axis1 = Older.structure, axis2 = Current.structure, y = N)) +
  geom_alluvium() +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Older structure", "Current structure")) +
  theme_classic() +
  ggtitle("Naming structure changes",
          "Yao et al 2021 --> Yao et al 2023")

Produces plot with second axis strata out of order:
image

Trying to lodes form can't get past checks

b <- to_lodes_form(a[1:2])
is_lodes_form(b, key = x, value = stratum, id = alluvium, silent = TRUE)
# [1] TRUE
ggplot(data = b, aes(x = x,  stratum = stratum, alluvium = alluvium)) +
  geom_alluvium() +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  scale_x_discrete(limits = c("Older structure", "Current structure")) +
  theme_classic() +
  ggtitle("Naming structure changes",
          "Yao et al 2021 --> Yao et al 2023")

Error in `geom_alluvium()`:
! Problem while computing stat.Error occurred in the 1st layer.
Caused by error in `setup_data()`:
! Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).
Run `rlang::last_trace()` to see where the error occurred.

Trying in a form more like the stackoverflow version still errors at the same step

b$N <- 1
ggplot(b, aes(x = x,  stratum = stratum, alluvium = alluvium, weight = N, label = stratum)) +
  geom_flow(aes(fill = stratum)) +
  geom_stratum() +
  geom_text(stat = "stratum") +
  scale_x_discrete(limits = c("Older structure", "Current structure")) +
  theme_classic() +
  ggtitle("Naming structure changes",
          "Yao et al 2021 --> Yao et al 2023")

@corybrunson
Copy link
Owner Author

Hi @rbutleriii and i apologize for taking a while to respond—i've been traveling and distracted for several weeks.

I think what you need to do here is define both variables as factors with the same combined set of levels—and with the levels in the desired order. Could you give that a try and report back?

If that's not clear then let me know and i'll do an illustration using your data within a few days.

@corybrunson corybrunson reopened this Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants