Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to order rectangles while minimizing crossing of flows #79

Open
Alex7722 opened this issue Mar 19, 2021 · 12 comments
Open

Attempt to order rectangles while minimizing crossing of flows #79

Alex7722 opened this issue Mar 19, 2021 · 12 comments

Comments

@Alex7722
Copy link

Alex7722 commented Mar 19, 2021

A first attempt to order rectangles while minimizing crossing of flows:

Data used to build the function can be found here

minimize_crossing <- function(alluv_dt = alluv_dt, stratum = new_Id_com, alluvium = Id, x = Window){
  
  #' This function 
  #' 
  #' @alluv_dt
  #' The dt used for the alluvial
  #' @stratum
  #' Stratum column
  #' @alluvium
  #' Alluvium column
  #' @x
  #' x column
  require(tidyverse)
  require(data.table)
  require(ggalluvial)
  require(tidygraph)
  require(ggplot2)
  require(forcats)
  
  new_Id_com <- deparse(substitute(stratum))
  Id <- deparse(substitute(alluvium))
  Window <- deparse(substitute(x))
  
  dt<-alluv_dt[order(Id,Window)][,.(new_Id_com, Id, Window)]
  
  dt[,tot_window_leiden:=.N,.(Window,new_Id_com)]
  
  dt[,Source:=new_Id_com,Id]
  dt[,Target:=shift(new_Id_com),Id]
  
  
  dt <- dt %>% rename(tot_window_leiden_Source = tot_window_leiden)
  dt[,tot_window_leiden_Target:=shift(tot_window_leiden_Source),Id]
  
  dt <- dt[Source > Target, c("Target", "Source") := list(Source, Target)] # exchanging
  dt <- dt[Source > Target, c("tot_window_leiden_Target", "tot_window_leiden_Source") := list(tot_window_leiden_Source, tot_window_leiden_Target)] # exchanging

  
  dt[,link_strength:=.N,.(Source,Target,Window)]
  
  dt <- dt[is.na(Target)==FALSE & Source!=Target]
  
  dt[,cosine_strength:=link_strength/sqrt(tot_window_leiden_Target*tot_window_leiden_Source)]
  dt[,max_cosine_strength:=max(cosine_strength),.(Source,Target)]
  
  dt<-dt[,.N,.(Source,Target,max_cosine_strength)][order(-N)]
  
  #Make the dt for naming
  edges_meta<-dt
  edges_meta[,Source:=as.character(Source)]
  edges_meta[,Target:=as.character(Target)]
  edges_meta[,from:=Source]
  edges_meta[,to:=Target]
  edges_meta[,weight:=max_cosine_strength]
  
  nodes_meta <-alluv_dt[,.N,new_Id_com]
  nodes_meta <-nodes_meta[,new_Id_com:=as.character(new_Id_com)]
  nodes_meta <-nodes_meta[,Id:=new_Id_com]
  
  tbl_meta<-tbl_graph(nodes = nodes_meta, edges = edges_meta, directed = FALSE, node_key = "new_Id_com")
  components <- tbl_meta %>% 
    activate(nodes) %>% 
    mutate(components_att = group_components(type = "weak")) %>% 
    as.data.table()
  
  components[,size_compo:=.N,components_att][order(-N)]
  components <- components[size_compo==1,components_att:=0]
  setnames(components, "components_att", paste0("components_att_","0"))
  components <- components[,.(new_Id_com, "components_att_0"= get("components_att_0"))]
  
  for (links_to_remove in unique(dt[order(max_cosine_strength)]$max_cosine_strength)) {
    dt<-dt[max_cosine_strength>links_to_remove]
    edges_meta<-dt
    edges_meta[,Source:=as.character(Source)]
    edges_meta[,Target:=as.character(Target)]
    edges_meta[,from:=Source]
    edges_meta[,to:=Target]
    edges_meta[,weight:=max_cosine_strength]
    
    nodes_meta <-alluv_dt[,.N,new_Id_com]
    nodes_meta <-nodes_meta[,new_Id_com:=as.character(new_Id_com)]
    nodes_meta <-nodes_meta[,Id:=new_Id_com]
    
    tbl_meta<-tbl_graph(nodes = nodes_meta, edges = edges_meta, directed = FALSE, node_key = "Id")
    
    components2 <- tbl_meta %>% 
      activate(nodes) %>% 
      mutate(components_att = group_components(type = "weak")) %>% 
      as.data.table()
    
    components2[,size_compo:=.N,components_att][order(-N)]
    components2 <- components2[size_compo==1,components_att:=0]
    name <- paste0("components_att_", links_to_remove)
    setnames(components2, "components_att", name)
    components2 <- components2[,.(new_Id_com, get(name))]
    setnames(components2, "V2", name)
    
    components <- merge(components, components2, all.x = TRUE, all.y = TRUE, by= "new_Id_com")
    
  }
  
  columns_to_paste <- names(components)
  columns_to_paste <- columns_to_paste[columns_to_paste != "new_Id_com"] 
  
  community_order <- components %>% unite(order, c(columns_to_paste), sep = " ", remove = FALSE)
  community_order <- community_order[,.(new_Id_com,order)][order(order)]
  
  
  alluv_dt_meta <-merge(alluv_dt,community_order, by="new_Id_com", all.x = TRUE)
  
  alluv_dt_meta$new_Id_com <- fct_reorder(alluv_dt_meta$new_Id_com, alluv_dt_meta$order,min, .desc = TRUE)
  return(alluv_dt_meta)
}
@corybrunson
Copy link
Owner

@Alex7722 thank you! As i mentioned by email, i'll come back to this when i have more bandwidth, but it would be fantastic to finally address this need in the package itself.

I wonder if you could edit the issue into a formal reproducible example and include the generation of the plot you shared by email? Reproducible examples can be relatively easily prepared using the reprex package, and this vignette offers some guidance. (Since the data set is large, it's OK with me if you include a readRDS() call rather than use dput() to reconstruct it from scratch.) If this turns out to be too time-consuming, then i can put one together from your function code when i come back to this issue.

@corybrunson
Copy link
Owner

Note: This would address #6.

@corybrunson
Copy link
Owner

I may not come back to this for another month or two, but i've come across two more points of reference: This IEEE paper formalizes the optimization problem, and the CRAN Task View on Optimization lists several packages that might be used to implement it.

@ajwilk
Copy link

ajwilk commented Jun 28, 2021

Hi @corybrunson, do you have an update on this by chance? I saw that you posted an article on this a couple weeks ago, but I wasn't able to reproduce the behavior in that article with the development version or the optimization branch. I can share examples but figured I'd check and see if these changes were even pushed yet. Thanks!!

@corybrunson
Copy link
Owner

Hi @ajwilk, sorry, i haven't gotten started on this. The vignette you linked to has actually been around for a while and does not contain any optimization of this kind. Once something is implemented, though, i'll definitely include an example of it in the vignette! (Note: If the method from this issue is used for optimization, then the optimization branch will be rendered obsolete.)

@Alex7722
Copy link
Author

My and a coauthor just released a beta version of our package for dynamic network analysis. We cleaned up the function to minimize the crossing of flows in a way that is very flexible with some examples: https://github.com/agoutsmedt/networkflow/blob/master/R/minimize_crossing_alluvial.R

It seems to work well even for complex alluvials, however, right now it is for a particular kind of alluvial. An example:
image

@corybrunson
Copy link
Owner

@Alex7722 nifty! Could you provide a side-by-side comparison of plots prepared using versus not using your function? That would provide solid justification for me to make {networkflow} an optional dependency for the next release. (Smaller examples would be included in documentation, but a large example might be suitable for a vignette.)

@Alex7722
Copy link
Author

Alex7722 commented Feb 23, 2023

Here is an example using as.factor(dt$stratum) vs the function:
image

image

On the student example, the difference is not as great, but a little bit better (before/after):
image
image

As you can see, the function only orders stratums a particular way, but not alluvium inside stratum. That why the student example does not change much. The function works bests when there are a lot of stratums that have strong relationships with a few other stratums + the student example is a bit lucky in its randomness.

In the way we use it, it is better than nothing and good enough, but it might not be for different uses.

@corybrunson
Copy link
Owner

It is great to be able to describe the cases for which the technique works best. Do you mind my adapting the larger example into a vignette? Would you prefer to? And when do you expect networkflow to be on CRAN? (I don't think it can be included as a dependency until then.)

@Alex7722
Copy link
Author

We should be on CRAN in the coming months, but no specific timeline yet. However the big drawback of our function is that it's also dependent on quite a few packages, and most notably the latest development version of data.table(). Also, the function parameters are really suited for our package as it is really a middle-step function in a larger workflow for network analysis and its name doesn't make a lot of sense. Finally, the alluvial needs to (1) be structured as a long format and (2) be a data.table.

I made a minimal example by changing the student data with a bad-case scenario where "Sculpure" (btw a small typo there) go into "Digital Art" :

data(majors)
majors <- majors %>% as.data.table()
majors[(student == 6 | student == 9 | student == 12) & curriculum !="Sculpure", curriculum := "Digital Art"] 
majors$curriculum <- as.factor(majors$curriculum)

ggplot(majors,
       aes(x = semester, stratum = curriculum, alluvium = student,
           fill = curriculum, label = curriculum)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  # geom_text(label = student) +
  geom_flow(stat = "alluvium", lode.guidance = "frontback",
            color = "darkgray") +
  geom_stratum() +
  theme(legend.position = "bottom") +
  ggtitle("student curricula across several semesters")

image

data(majors)
majors <- majors %>% as.data.table()
majors[(student == 6 | student == 9 | student == 12) & curriculum !="Sculpure", curriculum := "Digital Art"] 
majors <- networkflow::minimize_crossing_alluvial(majors, intertemporal_cluster_column = "curriculum", node_key = "student", window_column = "semester")
majors$curriculum <- forcats::fct_reorder(majors$curriculum, majors$minimize_crossing_order,min, .desc = TRUE)

ggplot(majors,
       aes(x = semester, stratum = curriculum, alluvium = student,
           fill = curriculum, label = curriculum)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  # geom_text(label = student) +
  geom_flow(stat = "alluvium", lode.guidance = "frontback",
            color = "darkgray") +
  geom_stratum() +
  theme(legend.position = "bottom") +
  ggtitle("student curricula across several semesters")

image

Its even possible to play with the Iode.guidance (backfront) to get results close to optimum, but it would be better if a function also ordered the alluvium in an optimal way inside the stratums.

image

@corybrunson
Copy link
Owner

Ah, that's an interesting caveat. For comparison, {ggalluvial} doesn't Import: {ggrepel}, but that package is used in a vignette, so it's listed under Suggests: instead. I don't know whether it would make more sense to suggest a dependency-heavy package like {networkflow} or to simply leave it out but refer to it in the README. At least it's mentioned here. What do you think? When {networkflow} is on CRAN, we can revisit this.

@Alex7722
Copy link
Author

I will let you know once we are on CRAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants