-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve spread, gather error message: Computation failed in stat_*()
: Each row of output must be identified by a unique combination of keys.
#55
Comments
Hi @guangingmai, thanks for raising the issue. It's difficult to know exactly what the problem is without a reproducible example. Would you be able to share a subset of the data you're using that produces the same error? Check out the reprex package for how to generate an example. The error message comes from Please let me know if this doesn't help! |
First of all, Thanks for your reply. |
I'm glad you've found a solution, at least! I don't know what the columns contain, so i can't be sure why it works when another doesn't. If you can't share your entire data set, see if you can boil it down to a small data set that hits the same problem and that you can share. |
You can try my code where the dataset is stored on the website.
Warning output:
|
Could you say in more detail what sort of plot you're trying to produce? Most alluvial plots require three aesthetic specs: |
I hope it's okay if I piggyback here. I am trying to do a similar thing over a timecourse. I have multiple days and (for the reprex) multiple US states reporting some value (pct), but not every state reports every day, so there aren't always alluvia going between consecutive days. I've discovered that something about the shape of the data determines whether this fails or not, but I can't determine what, since the error message about duplicated rows is either misleading, or referring to the data in an in-between stage that is not exposed to me. The difference between the plots below is just the sampling to generate the fake data. The second plot is exactly the output desired. library(reprex)
#> Warning: package 'reprex' was built under R version 3.6.1
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.3
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
#> Warning: package 'tidyr' was built under R version 3.6.2
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.6.3
library(ggalluvial)
set.seed(123) # fails
fake_tmp <- data.frame(rowname = 1:20,
date = c("Day 1", "Day 2", "Day 3", "Day 4", "Day 5"),
pct = rnorm(20, mean = 5, sd = 2),
gene = sample(state.abb[1:20], 20, replace = TRUE))
tmp2 <- fake_tmp %>%
gather(key, stratum, -rowname, -date, -pct)
ggplot(tmp2, aes(x = date,
y = pct,
stratum = stratum,
alluvium = stratum)) +
geom_alluvium(aes(fill = stratum)) +
geom_stratum(aes(fill = stratum)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#> Warning: Computation failed in `stat_alluvium()`:
#> Each row of output must be identified by a unique combination of keys.
#> Keys are shared for 2 rows:
#> * 7, 8 # Error refers to rows 7 & 8
tmp2[7:8,]
#> rowname date pct key stratum
#> 7 7 Day 2 5.921832 gene GA
#> 8 8 Day 3 2.469878 gene CT
set.seed(464) # succeeds
fake_tmp <- data.frame(rowname = 1:20,
date = c("Day 1", "Day 2", "Day 3", "Day 4", "Day 5"),
pct = rnorm(20, mean = 5, sd = 2),
gene = sample(state.abb[1:20], 20, replace = TRUE))
tmp2 <- fake_tmp %>%
gather(key, stratum, -rowname, -date, -pct)
ggplot(tmp2, aes(x = date,
y = pct,
stratum = stratum,
alluvium = stratum)) +
geom_alluvium(aes(fill = stratum)) +
geom_stratum(aes(fill = stratum)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) Created on 2020-04-22 by the reprex package (v0.3.0) |
@mfoos absolutely fine. Thanks for bringing it up. First, an apology: I have not yet learned how to produce the intelligent and informative warning and error messages of other packages, in particular ggplot2 and its tidyverse siblings. I should probably create an issue and invite help on that. The error message that identifies rows 7 and 8 in your first example was spit out by count(tmp2, date, stratum) Please check back if this doesn't resolve the issue. I'll at least have the next version check for this sort of problem and throw an error earlier, since i still run into the same issue from time to time. |
awesome awesome awesome, this is super helpful, thank you! |
@corybrunson The spread function is "Retired lifecycle". Quote: "Development on spread() is complete, and for new code we recommend switching to pivot_wider()" |
@andzandz11 thanks for mentioning this. A future major release, probably the one after next, will indeed replace |
stat_alluvium()
stat_*()
: Each row of output must be identified by a unique combination of keys.
I want to run ggalluvial in barplot. But it have some warning message, when i run the following code. Dose anyone know how to fix it?
Warning message:
The text was updated successfully, but these errors were encountered: