Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Tengeler2020 #545

Merged
merged 4 commits into from
May 17, 2024
Merged

Improve Tengeler2020 #545

merged 4 commits into from
May 17, 2024

Conversation

RiboRings
Copy link
Member

@RiboRings RiboRings commented May 16, 2024

Hi! Tengeler2020 now has more informative feature names and unique labels for tree tips and nodes. Related to microbiome/miaViz#123.

library(mia)
data("Tengeler2020", package = "mia")
tse <- Tengeler2020

head(rownames(tse))
# [1] "Bacteroides"     "Bacteroides_1"   "Parabacteroides"
# [4] "Bacteroides_2"   "Akkermansia"     "Bacteroides_3"

head(rowTree(tse)$tip.label)
# [1] "[Eubacterium]_coprostanoligenes_group"
# [2] "[Clostridium]_innocuum_group"         
# [3] "[Clostridium]_innocuum_group_1"       
# [4] "Dielma"                               
# [5] "Unidentified_Lachnospiraceae_12"      
# [6] "Epulopiscium"

head(rowTree(tse)$node.label)
# [1] "node_1" "node_2" "node_3" "node_4" "node_5" "node_6"

@antagomir
Copy link
Member

Just checking: would it make sense to have identical rownames and tip.labels? If there is a one-to-one match (?)

@RiboRings
Copy link
Member Author

RiboRings commented May 17, 2024

There is a one-to-one match, but the rownames and the tree tips are in a different order. Not sure why. In GlobalPatterns they have the same order, in Tengeler2020 they don't (see below).

data(GlobalPatterns)
head(rownames(GlobalPatterns))
[1] "549322" "522457" "951"    "244423" "586076" "246140"
head(rowTree(GlobalPatterns)$tip.label)
[1] "549322" "522457" "951"    "244423" "586076" "246140"

data(Tengeler2020)
head(rownames(Tengeler2020))
[1] "1726470"  "1726471"  "17264731" "17264726" "1726472"  "17264724"
head(rowTree(Tengeler2020)$tip.label)
[1] "172647198" "1726478"   "1726479"   "172647201" "172647222" "17264798"

But the situation with different orders seems right when looking at the tree, whereas the same order gives an incorrect tree:

library(miaViz)
plotRowTree(Tengeler2020, edge_colour_by = "Phylum")

Screenshot 2024-05-17 at 10 05 44

rowTree(Tengeler2020)$tip.label <- rownames(Tengeler2020)
plotRowTree(Tengeler2020, edge_colour_by = "Phylum")

Screenshot 2024-05-17 at 10 06 28

@antagomir
Copy link
Member

Probably no one paid attention to this when the dataset was constructed.

It might be good for educational purposes to fix the data set and give them the same order?

Also check if TreeSummarizedExperiment constructor throws a warning if these are not in the same order; I think it should. If it doesn't we could at least propose an issue in that package?

@TuomasBorman
Copy link
Contributor

TuomasBorman commented May 17, 2024

Tips can be in any order / any name. rowLinks links rows and rowTree tips. rowLinks allows more flexible matching; it is not problem or bug. There could be a feature that is not in tips. For example, if dataset is agglomerated to Phylum level, and certain taxon has only kingdom level information. Then this row is not linked to tips but to node.

@antagomir
Copy link
Member

A main consideration is that the methods should be sufficientyl intuitive so that users will at least notice if they are trying to do something unexpected. Does the plotRowTree example indicate any need for changes?

@TuomasBorman
Copy link
Contributor

This is related to this issue microbiome/miaViz#123

I checked the problem, and the problem was that tree nodes/tips were not named unique way. I believe we have not taken into account the fact the links between rows and tree. We have just used rows and tree directly

@antagomir
Copy link
Member

We should take some measures to ensure that mistakes would not happen too easily with this.

To some extent it should be checked already when constructing TreeSE (ie in the TreeSummarizedExperiment package) but it might not be a bad idea to have some generic extra checks (hidden functions) in place in mia, to throw warning if there is something suspicious going on?

@RiboRings
Copy link
Member Author

I'm not familiar specifically with the construction of phylo objects, but to me it doesn't seem that many things can go wrong when building a TreeSE. If something is out of place, the constructor usually raises an error, or some function later on (like in this case).

Also maybe it's not that bad to have a TreeSE that is working but is not perfect, and realise the problem only when a function (like plotRowTree) specifically needs the component that has a problem. So I would mainly focus on nice messages when errors or warnings arise.

@antagomir
Copy link
Member

Yes I agree that in some sense it is nice to have problematic demo data sets, this helps us to constantly test the robustness of the methods and workflows.

Pedagogically it is sometimes useful, sometimes harmful. It depends also on the level of teaching and learners.

@RiboRings
Copy link
Member Author

Is this PR then safe to merge or should we consider changes?

@antagomir
Copy link
Member

For me OK to merge (if @TuomasBorman agrees) but could we check the mia & OMA examples that use the Tengeler tree data (if any?) for possible issues? In addition, it would be good to have a clear example demonstrating this somewhat unexpected behavior; for instance in the data set documentation?

@TuomasBorman
Copy link
Contributor

Yes, I think it is ok to merge. I also fixed the issue about plotRowTree microbiome/miaViz#124

@TuomasBorman
Copy link
Contributor

We have not been able to fix rworkflows yet, but I think it is ok to bypass check since we are only modifying the dataset

@TuomasBorman TuomasBorman merged commit 1de8231 into devel May 17, 2024
1 of 3 checks passed
@TuomasBorman TuomasBorman deleted the radboud_prep branch May 17, 2024 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants