Skip to content

Commit

Permalink
minor update getting started notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
rargelaguet committed Feb 11, 2023
1 parent a43018c commit 09062fb
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 64 deletions.
20 changes: 10 additions & 10 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,8 @@
*Icon*
.DS_Store

# Rstudio projects
*.Rproj
.Rhistory

*_site/

# Pycharm
.idea

Expand Down Expand Up @@ -34,14 +31,17 @@ var/
*.egg-info/
.installed.cfg
*.egg
.Rproj.user
*.Rcheck
.Rproj.user
.Rhistory
.RData

*.ipynb_checkpoints

*_cache/
*_files/
*_site/

*.tar.gz

# pypi
push_pypi.sh

*.tar.gz
# virtual environments
.venv
65 changes: 11 additions & 54 deletions mofapy2/notebooks/getting_started_python.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
"metadata": {},
"source": [
"This notebook contains a detailed tutorial on how to train MOFA using Python.\n",
"A template script to run the code below can be found [here](https://github.com/bioFAM/MOFA2/blob/master/inst/scripts/template_script.py)."
"A template script to run the code below can be found [here](https://github.com/bioFAM/MOFA2/tree/master/inst/scripts)."
]
},
{
Expand Down Expand Up @@ -253,7 +253,6 @@
},
"source": [
"### 2.3 Define data options\n",
"- **scale_groups**: if groups have different ranges/variances, it is good practice to scale each group to unit variance. Default is False\n",
"- **scale_views**: if views have different ranges/variances, it is good practice to scale each view to unit variance. Default is False"
]
},
Expand All @@ -269,7 +268,6 @@
"outputs": [],
"source": [
"ent.set_data_options(\n",
" scale_groups = False, \n",
" scale_views = False\n",
")"
]
Expand All @@ -281,7 +279,7 @@
"### 2.4) Add the data to the model\n",
"\n",
"This has to be run after defining the data options\n",
"- **likelihoods**: a list of strings, either \"gaussian\", \"poisson\" or \"bernoulli\". If None (default), they are guessed internally"
"- **likelihoods**: a list of strings, either \"gaussian\" (default), \"poisson\" or \"bernoulli\""
]
},
{
Expand Down Expand Up @@ -320,14 +318,11 @@
}
],
"source": [
"# option 1: data.frame format (slower)\n",
"# option 1: data.frame format\n",
"ent.set_data_df(data_dt, likelihoods = [\"gaussian\",\"gaussian\"])\n",
"\n",
"# option 2: nested matrix format (faster)\n",
"ent.set_data_matrix(data_mat, likelihoods = [\"gaussian\",\"gaussian\"])\n",
"\n",
"# AnnData format\n",
"# (...)"
"# option 2: nested matrix format\n",
"ent.set_data_matrix(data_mat, likelihoods = [\"gaussian\",\"gaussian\"])"
]
},
{
Expand All @@ -338,8 +333,7 @@
"\n",
"- **factors**: number of factors\n",
"- **spikeslab_weights**: use spike-slab sparsity prior in the weights? default is TRUE\n",
"- **ard_factors**: use ARD prior in the factors? Default is TRUE if using multiple groups. This is guessed by default.\n",
"- **ard_weights**: use ARD prior in the weights? Default is TRUE if using multiple views. This is guessed by default.\n",
"- **ard_weights**: use ARD prior in the weights? Default is TRUE if using multiple views.\n",
"\n",
"Only change the default model options if you are familiar with the underlying mathematical model!\n"
]
Expand Down Expand Up @@ -374,7 +368,6 @@
"ent.set_model_options(\n",
" factors = 10, \n",
" spikeslab_weights = True, \n",
" ard_factors = True,\n",
" ard_weights = True\n",
")"
]
Expand All @@ -385,13 +378,9 @@
"source": [
"## 4) Set training options\n",
"\n",
"- **iter**: number of iterations. Default is 1000.\n",
"- **convergence_mode**: \"fast\", \"medium\", \"slow\". For exploration, the fast mode is good enough.\n",
"- **startELBO**: initial iteration to compute the ELBO (the objective function used to assess convergence)\n",
"- **freqELBO**: frequency of computations of the ELBO (the objective function used to assess convergence)\n",
"- **convergence_mode**: \"fast\" (default), \"medium\", \"slow\".\n",
"- **dropR2**: minimum variance explained criteria to drop factors while training\n",
"gpu_mode: use GPU mode? (needs cupy installed and a functional GPU, see https://cupy.chainer.org/)\n",
"- **verbose**: verbose mode?\n",
"- **gpu_mode**: use GPU? (needs cupy installed and a functional GPU, see https://biofam.github.io/MOFA2/gpu_training.html)\n",
"- **seed**: random seed"
]
},
Expand Down Expand Up @@ -420,45 +409,13 @@
],
"source": [
"ent.set_train_options(\n",
" iter = 1000, \n",
" convergence_mode = \"fast\", \n",
" startELBO = 1, \n",
" freqELBO = 1, \n",
" dropR2 = 0.001, \n",
" gpu_mode = True, \n",
" verbose = False, \n",
" seed = 1\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### 5) (Optional) stochastic inference options\n",
"\n",
"If the number of samples is very large (in the order of >1e4), you may want to try the stochastic inference scheme. However, it requires some additional hyperparameters that in some data sets may need to be optimised (see the [stochastic vignette](https://raw.githack.com/bioFAM/MOFA2/master/MOFA2/vignettes/stochastic_inference.html)).\n",
"\n",
"- **batch_size**: float value indicating the batch size (as a fraction of the total data set: either 0.10, 0.25 or 0.50)\n",
"- **learning_rate**: learning rate (we recommend values from 0.25 to 0.50)\n",
"- **forgetting_rate**: forgetting rate (we recommend values from 0.75 to 1.0)\n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# We do not want to use stochastic inference for this data set\n",
"\n",
"# ent.set_stochastic_options(\n",
"# batch_size = 0.5,\n",
"# learning_rate = 0.5, \n",
"# forgetting_rate = 0.25\n",
"# )"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -565,13 +522,13 @@
"source": [
"## 7) Downstream analysis\n",
"\n",
"This finishes the tutorial on how to train a MOFA object from python. To continue with the downstream analysis you have to switch to R. Please, follow [this tutorial](https://raw.githack.com/bioFAM/MOFA2/master/MOFA2/vignettes/downstream_analysis.html)."
"This finishes the tutorial on how to train a MOFA model from python. To continue with the downstream analysis you can either use the [mofax](https://github.com/gtca/mofax) python package or the [MOFA2](https://www.bioconductor.org/packages/release/bioc/html/MOFA2.html) R package. Please, visit our [tutorials](https://biofam.github.io/MOFA2/tutorials.html) webpage for more information."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -585,7 +542,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
"version": "3.10.5"
},
"latex_metadata": {
"affiliation": "European Bioinformatics Institute, Cambridge, UK",
Expand Down

0 comments on commit 09062fb

Please sign in to comment.