Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Bacterial Relative Abundance Over Time #136

Open
AntonKjellberg opened this issue Oct 17, 2024 · 3 comments
Open

[Question] Bacterial Relative Abundance Over Time #136

AntonKjellberg opened this issue Oct 17, 2024 · 3 comments

Comments

@AntonKjellberg
Copy link

AntonKjellberg commented Oct 17, 2024

Hi!

What an amazing package.

I'm trying to display how the mean relative abundance of different bacterial genera develops over time. I filtered for IDs that have samples for all three time points; however, I still can't get it to work.

Here is a fraction of the dataset. abundance has one entry for every combination of time(3), id(only 2 here) and genus(11). 3x2x11=66 in total.

toy <- tibble(
    id = c(rep(1, 33), rep(2, 33)),
    abundance = c(
      5.097338e-01, 1.447320e-01, 1.391562e-01, 6.961131e-02, 3.244924e-02, 2.139261e-02, 7.220953e-02,
      5.860208e-03, 2.465460e-03, 2.361152e-03, 2.844761e-05, 9.675987e-01, 1.484639e-02, 1.070846e-02,
      3.937304e-03, 8.777429e-04, 8.275862e-04, 1.203762e-03, 0.000000e+00, 0.000000e+00, 0.000000e+00,
      0.000000e+00, 5.081549e-01, 2.959873e-01, 8.429322e-02, 4.622756e-02, 2.640779e-02, 2.235469e-02,
      1.338496e-02, 2.144936e-03, 1.044612e-03, 0.000000e+00, 0.000000e+00,
      9.718995e-01, 2.220788e-02, 5.055938e-03, 4.302926e-04, 1.434309e-04, 1.195257e-04, 1.195257e-04,
      2.390514e-05, 0.000000e+00, 0.000000e+00, 0.000000e+00, 7.839328e-01, 1.552875e-01, 5.078613e-02,
      5.729054e-03, 2.110704e-03, 8.184364e-04, 5.169072e-04, 4.738316e-04, 2.153780e-04, 1.292268e-04,
      0.000000e+00, 8.063558e-01, 9.668371e-02, 4.877554e-02, 2.358499e-02, 2.120435e-02, 3.081920e-03,
      2.768192e-04, 3.690922e-05, 0.000000e+00, 0.000000e+00, 0.000000e+00),
    genus = c(
      "Staphylococcus", "Haemophilus", "Moraxella", "Corynebacterium", "Streptococcus", "Veillonella", "Other",
      "Gemella", "Escherichia-Shigella", "Neisseria", "Dolosigranulum", "Staphylococcus", "Streptococcus", 
      "Corynebacterium", "Veillonella", "Moraxella", "Gemella", "Other", "Escherichia-Shigella", "Neisseria", 
      "Haemophilus", "Dolosigranulum", "Moraxella", "Staphylococcus", "Corynebacterium", "Streptococcus", 
      "Dolosigranulum", "Veillonella", "Other", "Gemella", "Haemophilus", "Escherichia-Shigella", "Neisseria", 
      "Staphylococcus", "Streptococcus", "Gemella", "Other", "Corynebacterium", "Haemophilus", "Moraxella", 
      "Escherichia-Shigella", "Veillonella", "Neisseria", "Dolosigranulum", "Other", "Streptococcus", 
      "Staphylococcus", "Moraxella", "Veillonella", "Gemella", "Corynebacterium", "Haemophilus", "Dolosigranulum", 
      "Neisseria", "Escherichia-Shigella", "Streptococcus", "Moraxella", "Other", "Staphylococcus", "Dolosigranulum", 
      "Corynebacterium", "Haemophilus", "Gemella", "Veillonella", "Escherichia-Shigella", "Neisseria"),
    time = rep(c("1w", "1m", "3m"), each = 11, times = 2)
  )
  
ggplot(toy, aes(x = time, stratum = genus, alluvium = id, y = abundance)) +
  geom_stratum() +
  geom_flow()

Error in geom_stratum():
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in setup_data():
! Data is not in a recognized alluvial form (see help('alluvial-data') for details).
Run rlang::last_trace() to see where the error occurred.

@corybrunson
Copy link
Owner

Hi @AntonKjellberg, thanks for raising the issue.

Looking back, i think the query functions is_alluvia_form() and is_lodes_form() need to be better documented and their parameters overhauled to match the aesthetic mappings. Here's the check you want to run, based on the aesthetic mappings you've specified:

is_lodes_form(toy, key = time, value = genus, id = id)

When i run it, i get the following message:

#> Duplicated id-axis pairings.
#> [1] FALSE

So, the problem is that some values of id appear with the same value of time more than once, which is not allowed in an alluvial plot. In fact, there are many such duplications:

#> count(toy, time, id)
#> # A tibble: 6 × 3
#>   time     id     n
#>   <chr> <dbl> <int>
#> 1 1m        1    11
#> 2 1m        2    11
#> 3 1w        1    11
#> 4 1w        2    11
#> 5 3m        1    11
#> 6 3m        2    11

You'll need to think carefully about what information you want to convey in the plot. What are the individuals or groups (alluvium) that you want to track across multiple measurements (x), and what values can they take (stratum)? Is there a plot in the examples that is similar to what you want?

@AntonKjellberg
Copy link
Author

AntonKjellberg commented Oct 18, 2024

Thank you for your reply, Cory

That makes sense! Unfortunately, I still struggle to display the data how I want.

I want a plot like this where streams connect the blocks based on the genus abundance within the different ids

ggplot(toy, aes(x = time, stratum = genus, y = abundance, fill = genus)) +
  geom_stratum()

image

This plot represents the same overall structure, but the data wasn't available. (wave as time, n as abundance, key as genus, and alluvium id)
image
https://longitudinalanalysis.com/visualizing-transitions-in-time-using-r-and-alluvial-graphs/

I couldn't find a similar plot in the examples

@corybrunson
Copy link
Owner

Hi @AntonKjellberg—notice from the source that the second plot is based on an id variable derived from a row index when the data were in wide (or "alluvia") form, which is why each value of id only appears once in the same row with any value of wave. In your data, id is manually defined to be several repetitions of only two values, which would only allow for two alluvia in the plot. You'll need a different identifier if you want a similar plot; since i don't know the provenance of your data i don't want to speculate on how it's structured, and therefore how the identifiers should be defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants