Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package issue with alphabetizing #138

Open
etbeilfuss opened this issue Dec 12, 2024 · 4 comments
Open

Package issue with alphabetizing #138

etbeilfuss opened this issue Dec 12, 2024 · 4 comments

Comments

@etbeilfuss
Copy link

Description of the issue

I'm working on a visualization of timber species groupings. I'm taking individual species and grouping them into aggregate groups based on their properties. To best visualize this I'm using ggalluvial. The package seems to be automatically sorting species alphabetically, but when it comes to sorting groups alphabetically, the term "White Oak" seems to be registering above "A" on the list.

If this is a bug, please explain the behavior you want versus the behavior you get, and feel free to add any additional comments.

I want the term "White Oak" to be on the bottom as it would be alphabetically without needing to coerce or trick the package. I think needing to coerce the package should be the rare exception, not the standard approach. Another approach could be to allow the factoring of variables to allow for custom orders. I'm not sure if that's already a feature or not, I wasn't able to find anything on it.

If this is not a bug, please just explain the issue as you see fit!

Reproducible example (preferably using reprex::reprex())

library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.1.3
library(ggalluvial)
#> Warning: package 'ggalluvial' was built under R version 4.1.3

species <- data.frame( 
  Species = c("Black Ash", "Green Ash", "White Ash", "Balsam Poplar", "Big-Tooth Aspen", "Eastern Cottonwood", "Quaking Aspen", "Paper Birch", "River Birch", "Yellow Birch", "Eastern Red Cedar", "Northern White-Cedar", "Choke Cherry", "Hackberry", "Hop Hornbeam", "Hornbeam", "Mountain-Ash", "Pin Cherry", "Red Mulberry", "Serviceberry", "White Mulberry", "Willow", "American Elm", "Rock Elm", "Siberian Elm", "Slippery Elm", "Bitternut Hickory", "Shagbark Hickory", "Douglas-Fir", "Scotch Pine", "Black Oak", "Northern Pin Oak", "Northern Red Oak", "Black Maple", "Norway Maple", "Red Maple", "Silver Maple", "Blue Spruce", "Norway Spruce", "White Spruce", "Bur Oak", "Chinkapin Oak", "Swamp White Oak", "White Oak"), 
  Group = c("Ash", "Ash", "Ash", "Aspen", "Aspen", "Aspen", "Aspen", "Birch", "Birch", "Birch", "Cedar", "Cedar", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Elm", "Elm", "Elm", "Elm", "Hickory", "Hickory", "Other Pine", "Other Pine", "Red Oak", "Red Oak", "Red Oak", "Soft Maple", "Soft Maple", "Soft Maple", "Soft Maple", "Spruce", "Spruce", "Spruce", "White Oak", "White Oak", "White Oak", "White Oak") ) 

# Plot without forced alphabetization (White Oak on top)
defaultPlot <- ggplot(data = species,
       aes(axis1 = Species, axis2 = Group)) +
  geom_alluvium(aes(fill = Group)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = after_stat(stratum)))+
  scale_x_discrete(limits = c("Species", "Group"),
                   expand = c(0.15, 0.05))+
  theme_void()+
  theme(legend.position = "none")

# Plot with forced alphabetization
## Done by adding "ss" in front of "White Oak" then subbing "ss" for ""
species$Group <- ifelse(species$Group == "White Oak", "ssWhite Oak", species$Group)

desiredPlot <- ggplot(data = species,
       aes(axis1 = Species, axis2 = Group)) +
  geom_alluvium(aes(fill = Group)) +
  geom_stratum() +
  geom_text(stat = "stratum",
            aes(label = sub("^ss", "", after_stat(stratum)))) +
  scale_x_discrete(limits = c("Species", "Group"),
                   expand = c(0.15, 0.05))+
  theme_void()+
  theme(legend.position = "none")

defaultPlot
#> Warning in to_lodes_form(data = data, axes = axis_ind, discern =
#> params$discern): Some strata appear at multiple axes.
#> Warning in to_lodes_form(data = data, axes = axis_ind, discern =
#> params$discern): Some strata appear at multiple axes.

#> Warning in to_lodes_form(data = data, axes = axis_ind, discern =
#> params$discern): Some strata appear at multiple axes.

desiredPlot

reprex(venue = "gh")
#> Error in reprex(venue = "gh"): could not find function "reprex"

Created on 2024-12-12 with reprex v2.1.1

If this is a bug, please provide as small an example as you can that reproduces it.

If this is not a bug, feel free to drop this section!

@corybrunson
Copy link
Owner

corybrunson commented Dec 12, 2024

Hi @etbeilfuss, thanks for mentioning this. The problem arises from multiple axes having values in common, in this case "White Oak". It's a familiar problem that i originally halfway resolved by introducing the discern parameter, as illustrated below; see help("alluvial-data") for details and other examples. Does that address your immediate needs?

The warning about some strata appearing at multiple axes is meant to flag this; i'm sure the warnings in this package could be improved, but i'm not yet familiar with how to write them more informatively. Moreover, this is not an optimal solution, and changes to {ggplot2} in the intervening years might have opened up an opportunity to solve it some other way. I don't have the bandwidth to focus on it right now but i'll create a new issue leave this issue open to attempt a fix when i can.

library(ggplot2)
library(ggalluvial)

species <- data.frame( 
  Species = c("Black Ash", "Green Ash", "White Ash", "Balsam Poplar", "Big-Tooth Aspen", "Eastern Cottonwood", "Quaking Aspen", "Paper Birch", "River Birch", "Yellow Birch", "Eastern Red Cedar", "Northern White-Cedar", "Choke Cherry", "Hackberry", "Hop Hornbeam", "Hornbeam", "Mountain-Ash", "Pin Cherry", "Red Mulberry", "Serviceberry", "White Mulberry", "Willow", "American Elm", "Rock Elm", "Siberian Elm", "Slippery Elm", "Bitternut Hickory", "Shagbark Hickory", "Douglas-Fir", "Scotch Pine", "Black Oak", "Northern Pin Oak", "Northern Red Oak", "Black Maple", "Norway Maple", "Red Maple", "Silver Maple", "Blue Spruce", "Norway Spruce", "White Spruce", "Bur Oak", "Chinkapin Oak", "Swamp White Oak", "White Oak"), 
  Group = c("Ash", "Ash", "Ash", "Aspen", "Aspen", "Aspen", "Aspen", "Birch", "Birch", "Birch", "Cedar", "Cedar", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Dropped", "Elm", "Elm", "Elm", "Elm", "Hickory", "Hickory", "Other Pine", "Other Pine", "Red Oak", "Red Oak", "Red Oak", "Soft Maple", "Soft Maple", "Soft Maple", "Soft Maple", "Spruce", "Spruce", "Spruce", "White Oak", "White Oak", "White Oak", "White Oak") ) 

# overlap between axis levels
intersect(species$Species, species$Group)
#> [1] "White Oak"

# default plot with `discern = TRUE`
ggplot(data = species,
       aes(axis1 = Species, axis2 = Group)) +
  geom_alluvium(aes(fill = Group), discern = TRUE) +
  geom_stratum(discern = TRUE) +
  geom_text(stat = "stratum", discern = TRUE,
            aes(label = after_stat(stratum)))+
  scale_x_discrete(limits = c("Species", "Group"),
                   expand = c(0.15, 0.05))+
  theme_void()+
  theme(legend.position = "none")

Created on 2024-12-12 with reprex v2.1.1

@corybrunson
Copy link
Owner

Note: Stat*$finish_layer() might be used to resolve this.

@corybrunson
Copy link
Owner

@etbeilfuss a possible fix is implemented in the discern branch. Could you try installing from there, as follows, and let me know if the problem is resolved on your end? You'll still need to use discern = TRUE, but the suffix .1 should not appear in the plot.

remotes::install_github("corybrunson/ggalluvial", ref = "discern")

@etbeilfuss
Copy link
Author

SOLVED

The "discern = TRUE" argument solved the issue. I hadn't thought of the name existing in both columns creating an issue, but it makes sense that it could break something.

I appreciate your quick response. Hopefully anyone experiencing this issue in the future will be able to find the solution here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants