-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Tengeler2020 #545
Improve Tengeler2020 #545
Conversation
Just checking: would it make sense to have identical rownames and tip.labels? If there is a one-to-one match (?) |
There is a one-to-one match, but the rownames and the tree tips are in a different order. Not sure why. In GlobalPatterns they have the same order, in Tengeler2020 they don't (see below).
But the situation with different orders seems right when looking at the tree, whereas the same order gives an incorrect tree:
|
Probably no one paid attention to this when the dataset was constructed. It might be good for educational purposes to fix the data set and give them the same order? Also check if TreeSummarizedExperiment constructor throws a warning if these are not in the same order; I think it should. If it doesn't we could at least propose an issue in that package? |
Tips can be in any order / any name. rowLinks links rows and rowTree tips. rowLinks allows more flexible matching; it is not problem or bug. There could be a feature that is not in tips. For example, if dataset is agglomerated to Phylum level, and certain taxon has only kingdom level information. Then this row is not linked to tips but to node. |
A main consideration is that the methods should be sufficientyl intuitive so that users will at least notice if they are trying to do something unexpected. Does the plotRowTree example indicate any need for changes? |
This is related to this issue microbiome/miaViz#123 I checked the problem, and the problem was that tree nodes/tips were not named unique way. I believe we have not taken into account the fact the links between rows and tree. We have just used rows and tree directly |
We should take some measures to ensure that mistakes would not happen too easily with this. To some extent it should be checked already when constructing TreeSE (ie in the TreeSummarizedExperiment package) but it might not be a bad idea to have some generic extra checks (hidden functions) in place in mia, to throw warning if there is something suspicious going on? |
I'm not familiar specifically with the construction of phylo objects, but to me it doesn't seem that many things can go wrong when building a TreeSE. If something is out of place, the constructor usually raises an error, or some function later on (like in this case). Also maybe it's not that bad to have a TreeSE that is working but is not perfect, and realise the problem only when a function (like plotRowTree) specifically needs the component that has a problem. So I would mainly focus on nice messages when errors or warnings arise. |
Yes I agree that in some sense it is nice to have problematic demo data sets, this helps us to constantly test the robustness of the methods and workflows. Pedagogically it is sometimes useful, sometimes harmful. It depends also on the level of teaching and learners. |
Is this PR then safe to merge or should we consider changes? |
For me OK to merge (if @TuomasBorman agrees) but could we check the mia & OMA examples that use the Tengeler tree data (if any?) for possible issues? In addition, it would be good to have a clear example demonstrating this somewhat unexpected behavior; for instance in the data set documentation? |
Yes, I think it is ok to merge. I also fixed the issue about plotRowTree microbiome/miaViz#124 |
We have not been able to fix rworkflows yet, but I think it is ok to bypass check since we are only modifying the dataset |
Hi! Tengeler2020 now has more informative feature names and unique labels for tree tips and nodes. Related to microbiome/miaViz#123.