Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to Capture Common Sources of Variance in Data #57

Open
whitleyo opened this issue Aug 27, 2020 · 1 comment
Open

Failure to Capture Common Sources of Variance in Data #57

whitleyo opened this issue Aug 27, 2020 · 1 comment

Comments

@whitleyo
Copy link

Hi,

I've tried MOFA with several datatypes, some having ~50 samples while others have 15 or 22 samples.

Here's an overview of the data:

image

RNA.vst = vst transformed RNA-seq data
DNAm = DNAm m-values
metab_annot_extract = metabolites, cell extract, annotated
metab_annot_secreted = metabolites, cell secretion, annotated
metab_unannot_extract = metabolites, cell extract, unannotated
crispr_qBF = quantile normalized Bayes Factors from CRISPR screens similar to that in Hart et al. 2015, but using a smaller library.

I ran MOFA with the following training options on this data (20 other models were run, most producing similar results, none having a common axis of variation shared between all datatypes)

## $maxiter
## [1] 20000
## 
## $tolerance
## [1] 0.02
## 
## $DropFactorThreshold
## [1] 0
## 
## $verbose
## [1] 0
## 
## $seed
## [1] 2020

The resulting model has the following explained variance:
image

and correlattion between factors:
image

The results would seem to imply that metabolomics data do not share a common axis with the RNA-seq and DNA methylation data. When I run PCA on eahc of the data matrices as input to MOFA individually however, I get clean or relatively clean separation of clusters identified in RNA-seq data in each datatype:

RNA-seq:
image

DNA methylation:
image

CRISPR Screen:
image

Metabolites, Annotated Cell Extract:
image

Metabolites, Unannotated Cell Extract:
image

image

Metabolites, Annotated Secretion:
image
image

I would have expected a common factor to be found for all datatypes, but I'm wondering if there's too much missing data here.

Session Info:

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
##  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
##  [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] reticulate_1.15             ggplot2_3.3.0              
##  [3] pheatmap_1.0.12             su2cproj_0.1.034           
##  [5] MultiAssayExperiment_1.8.3  SummarizedExperiment_1.12.0
##  [7] DelayedArray_0.8.0          BiocParallel_1.16.6        
##  [9] matrixStats_0.55.0          Biobase_2.42.0             
## [11] GenomicRanges_1.34.0        GenomeInfoDb_1.18.2        
## [13] IRanges_2.16.0              S4Vectors_0.20.1           
## [15] BiocGenerics_0.28.0         MOFA_1.3.1                 
## 
## loaded via a namespace (and not attached):
##  [1] ggrepel_0.8.2          Rcpp_1.0.4.6           lattice_0.20-38       
##  [4] assertthat_0.2.1       digest_0.6.25          foreach_1.5.0         
##  [7] R6_2.4.1               plyr_1.8.6             evaluate_0.14         
## [10] highr_0.8              pillar_1.4.4           zlibbioc_1.28.0       
## [13] rlang_0.4.6            Matrix_1.2-18          rmarkdown_2.1         
## [16] labeling_0.3           stringr_1.4.0          RCurl_1.98-1.2        
## [19] munsell_0.5.0          compiler_3.5.2         vipor_0.4.5           
## [22] xfun_0.12              pkgconfig_2.0.3        ggbeeswarm_0.6.0      
## [25] htmltools_0.4.0        tidyselect_1.0.0       tibble_3.0.1          
## [28] GenomeInfoDbData_1.2.0 codetools_0.2-15       reshape_0.8.8         
## [31] withr_2.2.0            crayon_1.3.4           dplyr_0.8.5           
## [34] rappdirs_0.3.1         bitops_1.0-6           grid_3.5.2            
## [37] GGally_1.4.0           jsonlite_1.6.1         gtable_0.3.0          
## [40] lifecycle_0.2.0        magrittr_1.5           scales_1.1.1          
## [43] stringi_1.4.6          farver_2.0.3           XVector_0.22.0        
## [46] reshape2_1.4.3         doParallel_1.0.15      ellipsis_0.3.1        
## [49] vctrs_0.3.0            cowplot_1.0.0          Rhdf5lib_1.4.3        
## [52] RColorBrewer_1.1-2     iterators_1.0.12       tools_3.5.2           
## [55] glue_1.4.1             beeswarm_0.2.3         purrr_0.3.3           
## [58] yaml_2.2.1             rhdf5_2.26.2           colorspace_1.4-1      
## [61] corrplot_0.84          knitr_1.28
@rargelaguet
Copy link
Contributor

Hi @whitleyo ,
it is an interesting problem. Based on the PCA results, the separation is more clear in some data types than in others, so it is not be surprising if MOFA does not capture a factor that is common across all data types. But it should definitely capture a factor that separates the groups that you are considering. Can you check if this is the case?

P.S. Please upgrade to MOFA2 (https://github.com/bioFAM/MOFA2), it is a better software and we are no longer maintaining this repository. Also, feel free to join the Slack group where we can have a more interactive discussion.

Best,
Ricard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants