Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variables are collinear #41

Open
lwaldron opened this issue Jun 15, 2024 · 6 comments
Open

variables are collinear #41

lwaldron opened this issue Jun 15, 2024 · 6 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@lwaldron
Copy link
Member

This is a new warning being generated by the tests, and I don't think there are actually should be collinear variables here so I'm not sure what's happening.

suppressPackageStartupMessages(library(lefser))
data("zeller14")
zellersub <- zeller14[1:150, zeller14$study_condition != "adenoma"]
## subsetting a DataFrame with NULL
zellersub <- relativeAb(zellersub)
results <- lefser(zellersub, groupCol = "study_condition", blockCol = NULL)
#> The outcome variable is specified as 'study_condition' and the reference category is 'control'.
#>  See `?factor` or `?relevel` to change the reference category.
#> Warning in lda.default(x, grouping, ...): variables are collinear

Created on 2024-06-15 with reprex v2.1.0

@lwaldron lwaldron added the bug Something isn't working label Jun 15, 2024
@lwaldron
Copy link
Member Author

Likely cause - the relab_sub_t_dfin the call raw_lda_scores <- ldaFunction(relab_sub_t_df, lgroupf) within lefser() looks like this, when these should be relative abundances:

image

@lwaldron
Copy link
Member Author

Actually these relative abundances are correct, I forgot that they are scaled to add to 1e6. I think this is real collinearity, caused by the presence of synthetic clades (e.g. "Bacteria") where a parent node has only one child, or two children but one is dominant. Two TODOs:

  1. modify tests and examples to do the analysis at species level only.
  2. make warning more helpful, to explain this likely cause of collinearity and recommend analyzing a single taxonomic level only
  3. modify documentation to recommend analyzing a single taxonomic level only

@lwaldron lwaldron added documentation Improvements or additions to documentation and removed bug Something isn't working labels Jun 15, 2024
@lwaldron lwaldron self-assigned this Jun 15, 2024
@shbrief
Copy link
Contributor

shbrief commented Jun 16, 2024

It seems like you are solving this with the get_terminal_nodes function. Is this something you'd like to add to the lefser function?

@shbrief
Copy link
Contributor

shbrief commented Jun 16, 2024

I also want to clarify points 1 & 3: if the terminal node of an input data is a mix of strain, species, and genus, for example, what will be the recommendation?

@shbrief
Copy link
Contributor

shbrief commented Jun 21, 2024

9e1afa1

@sdgamboa
Copy link
Contributor

sdgamboa commented Sep 10, 2024

One of the features of lefse-conda is determining which clades are differentially abundant.
For example this result. This clade was created by the lefse program.

I think the abundance of synthetic clades (e.g., Bacteria, etc.) is necessary for the cladogram if the internal nodes are of interest. Otherwise, the cladogram could just depict which taxa in the terminal nodes are differentially abundant.

If the taxonomy is included in the rowData instead of the rownames and we restrict (recommend) the input to only terminal nodes at the same taxonomic level we could use the mia package:

library(mia)
data("GlobalPatterns")
l <- splitByRanks(GlobalPatterns)
l$features <- GlobalPatterns
mergedSE <- mergeSEs(l, collapse.cols = TRUE)
## Instead of merging, the lefse analysis could be run at all taxonomic levels individually

This was referenced Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants