Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in Results Using sum(. != 3) within Pipeline Operations (%>%) dplyr version 1.1.4 #7037

Closed
ZHBHSMILE opened this issue May 31, 2024 · 4 comments

Comments

@ZHBHSMILE
Copy link

ZHBHSMILE commented May 31, 2024

I encountered an inconsistency in the results obtained from different approaches while working with R code. Specifically, when using the sum(. != 3) expression within a pipeline operation (%>%), the outcome varied from expectations, leading to discrepancies compared to direct computations.

  data <- data.frame(
    GO.BiologicalProcess = c("-", "-", "A", "B"),
    GO.CellularComponent = c("-", "C", "-", "D"),
    GO.MolecularFunction = c("-", "-", "-", "E")
  )
  
  # Calculate the number of occurrences of "-" in each row
  go_num <- rowSums(data[, c("GO.BiologicalProcess", "GO.CellularComponent", "GO.MolecularFunction")] == "-")
  
  # Calculate the sum of occurrences where the count is not equal to 3
  go_bg_num <- sum(go_num != 3)
  
  # Using pipe operator and sum(. != 3)
  error_sum <- rowSums(data[, c("GO.BiologicalProcess", "GO.CellularComponent", "GO.MolecularFunction")] == "-") %>% sum(. != 3)
@ZHBHSMILE ZHBHSMILE changed the title Inconsistency in Results Using sum(. != 3) within Pipeline Operations (%>%) Inconsistency in Results Using sum(. != 3) within Pipeline Operations (%>%) dplyr version 1.1.4 May 31, 2024
@SpatLyu
Copy link

SpatLyu commented Jun 10, 2024

Calculate it like this:

library(magrittr)

data <- data.frame(
  GO.BiologicalProcess = c("-", "-", "A", "B"),
  GO.CellularComponent = c("-", "C", "-", "D"),
  GO.MolecularFunction = c("-", "-", "-", "E")
)

{data[, c("GO.BiologicalProcess", "GO.CellularComponent", "GO.MolecularFunction")] == "-"} %>% 
  rowSums() %>% 
  {. != 3} %>% 
  sum()
#> [1] 3

Created on 2024-06-10 with reprex v2.1.0

@philibe
Copy link

philibe commented Jun 10, 2024

initial datas

  foo<-{data[, c("GO.BiologicalProcess", "GO.CellularComponent", "GO.MolecularFunction")] == "-"} %>% rowSums() 
  foo
  # [1] 3 2 2 0

{. != 3} make boolean :

  foo2 <-  foo  %>%    {. != 3}
  foo2
  # [1] FALSE  TRUE  TRUE  TRUE

Therefore sum(foo2) is the number of TRUE = 3

At the opposite

  foo %>%  {. != 3} %>% sum(.=TRUE)
  # [1] 4

And in basic R or basic R with %>%

  sum(foo[foo!=3] )
  # [1] 4
  foo[foo!=3] %>% sum(.)
  # [1] 4

or piped function or not piped function:

  testsum <- function (x) if_else(x!=3,x,0)
  foo %>% testsum(.) %>% sum(.)
  # [1] 4
  sum(testsum(foo))
  # [1] 4

@DavisVaughan
Copy link
Member

DavisVaughan commented Jun 10, 2024

Read ?`%>%` closely, particularly Using the dot for secondary purposes and you will see that this is actually known and expected behavior! The LHS is being placed as the first argument of sum(), and as . in the second argument to sum() (the . != 3 expression).

You can use braces to avoid this

rowSums(data[, c("GO.BiologicalProcess", "GO.CellularComponent", "GO.MolecularFunction")] == "-") %>% {
  sum(. != 3)
}

but I think avoiding the pipe entirely is cleaner here

sums <- rowSums(data[, c("GO.BiologicalProcess", "GO.CellularComponent", "GO.MolecularFunction")] == "-")
sum <- sum(sums != 3)

@philibe
Copy link

philibe commented Jun 10, 2024

Thanks for the explanation. I've looked for why c(3, 2, 2, 0) %>% sum(.!=3)=10, which seemed very weird. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants