-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding matrix completion in rclr/rAitchison #570
Comments
Hey Devs & co., Thanks as ever for the hard work. Came across this issue today (#433, #570) - as stands This seems fairly dangerous given that there's no warning in the output or documentation - the average vegan user, being trusting and kind, would have no reason to suspect their data was suddenly a strange, non-sparse beast. Could a note be added in the meantime to flag this?... I could mock up a PR or similar, but not sure warranted. |
Thanks @handibles - I agree, and planning to go through this during the Xmas vacations if the situation allows. If there are further bottlenecks we should add a warning. |
Gentle post-yule pre-student bump. |
Yes, almost getting there..! Just a bit more time. |
Hi antagomir, and thanks for incorporating and developing the robust Aitichson method within the vegdist function. I appreciate your work! I have hesitated to apply the robust Aitichson in vegdist, as I observed a decrease in the explained variation in my models without matrix completion, but I prefer to keep my workflow in R. Have you been able to incorporate the matrix completion algorithm? I'm keen to give it a whirl! |
If I am lucky I might be able to do it next week. |
Ok @LukeLikesDirt , experimental version for the matrix completion is now available for testing and feedback at https://github.com/antagomir/vegan. I am preparing a PR after doing the following:
To get started testing the code:
|
Hi @antagomir Thanks for getting to this so quickly, and apologies for taking so long to get back to you. Unfortunately, I can't provide feedback because I can't download your VEGAN version due to some kind of Gfortran comparability error. I have Mac Monterey and Gfortran v12.1 for intel (the latest version for Monterey). Not asking you to troubleshoot my issue, but I tried the older versions of Gfortran but still failed to download your version of VEGAN. I have limited coding ability - I'm an ecologist - and it looks like I'll need to wait until the update to go live before testing. Thank you for your work. Much appreciated. |
Noprob. I can try to check and finalize this soonish. Are you able to specify which vegan version you had problems with? Just for my checking purposes. |
The initial issue I had was in early 2022, when I was running R 4.0.something, and a version of VEGAN that I would have installed during late 2021 (sorry for the vague response). I was comparing my results from VEGAN to those of DEICODE and noted a stark variation in my results - results based on DEICODE had greater power compared to those from VEGAN. I then manually computed robust Aichison distances using rCLR + Euclid and realised that VAGAN was not implementing the matrix completion step, at which point I returned to using DEICODE. Looking forward to the new release! Luke |
Ok I have some DEICODE installation issues on 3 different systems it seems but would be happy to hear if anyone has the chance to run comparisons (while I continue to troubleshoot that one..) |
Thank you for this so interesting chat and issue. As @LukeLikesDirt , I have been trying to obtain similar transformed matrices as those obtained with DEICODE (so RCLR + Matrix completion) in R, but seems like those with vegan in R do not perform the Matrix completion, and seem to provide less robust outcomes. In cross talks, also people working with DEICODE in QIIME2 were trying to obtain the RCLR+Matrix transformed data from the DEICODE pipeline, with indeed only provide as outputs the ordinationa the distance matrix, but not the transformed data applying RCLR + Matrix completion, see issues: biocore/DEICODE#59 and biocore/DEICODE#51. I am wondering if any has at date gotten this wished matrix transform type (RCLR+Matrix Completion) resolved to be obtained in a straightforward way, for those not bioinformatic programers... I have tried both the R options here and those in the issue mentioned before. But, I am not sure I am doing things right. In R: I installed Vegan 2.6.5, and still get diverse results, and less robust as with DEICODE. But, I am not sure if my functions are doing/calling the matrix completion, as if I copy and run the new scripts provided by @antagomir in the revised scritps for the function "vegan::decostand(otu_table(phy), "rclr")", I am not sure how to force vegan to call those functions with Matrix completion with the same names you provided, and not the ones coming in the original package vegan version 2.6.5. Not sure if I can explain well... as said, I am not a bioinformatics programer... Same with trying to obtain the RCLR+Matrix completion on the DEICODE pipeline: Here, even if I substitute the rpca.py file by the one revised in Github which provides RCLR+matrix completion output object, I do not get this to work... So, is there a way to obtain, either with R or with QIIME2 this RCLR+Matrix Completion transformed matrix data?... Thank you and happy day! LAU |
@antagomir Is this "robust.aitchison" distance suitable for humann3(https://huttenhower.sph.harvard.edu/humann/) output data, the CPM(counts per million reads) normalized data? Thank you! |
Hi @LauGuaUuu and @LukeLikesDirt - these should now be solved in PR #667 , also thanks to great help from @cameronmartino . The PR is pending for review and these updates should become available through the pkg if the PR is accepted. |
@Ivy-ops I think this should be discussed elsewhere but yes, in general these are applicable to relative abundance data. |
Thanks for your work and for keeping me updated Leo! I am looking forward to giving this a whirl. Cheers |
Looking forward to whirling this also! Thanks as ever @antagomir . Re using it on HUMAnN as per above:
As:
, had thought that compositional analysis would not be suitable for pathway abundances (genes should be okay but those tables are exhaustive and often uninformative). However, the HUMAnN community regularly use CLR transforms for pathways 1 2, and can see some logic for it. As per @antagomir this could do with discussion elsewhere, but if using CLR on the pathway abundances, I would make sure that you're using the right features (i.e. exclude |
The issue #433 has received request (from @johannesbjork) to add robust matrix completion in the robust CLR (rclr) method in the vegan::decostand function, following the procedure in Martino et al. (2019). I agree with this suggestion.
The R package ROptSpace includes an implementation of the OptSpace robust matrix completion algorithm used by Martino et al. (2019). There are many variants of matrix completion for missing values but this solution has been recently promoted in microbial ecology, and I am not immediately aware of other solutions that should be prioritized over this one.
It would be easy to add matrix completion in the rclr transformation in vegan::decostand by using the ROptSpace functionality. However, this would introduce a new dependency in vegan, and this may not be ideal.
Fortunately, it seems that the ROptSpace R package has been released with the MIT license and vegan with GPL-2. License compatibility allows including MIT licensed code in GPL-2. It seems possible to add this function in vegan and cite the original source, then use this to add the matrix completion step in decostand
rclr
method .I can implement this change and open a PR but would like to cross-check first if the vegan developers?
Edit: another solution is to use R function filling::fill.OptSpace but the license is here MIT as well, so it doesn't really change the discussion; Edit2: ROptSpace is primary since "filling" uses it as dependency.
The text was updated successfully, but these errors were encountered: