minor update getting started notebook

bioFAM · Feb 11, 2023 · 09062fb · 09062fb
1 parent a43018c
commit 09062fb
Show file tree

Hide file tree

Showing 2 changed files with 21 additions and 64 deletions.
diff --git a/.gitignore b/.gitignore
@@ -2,11 +2,8 @@
 *Icon*
 .DS_Store
 
-# Rstudio projects
-*.Rproj
-.Rhistory
 
-*_site/
+
 # Pycharm
 .idea
 
@@ -34,14 +31,17 @@ var/
 *.egg-info/
 .installed.cfg
 *.egg
-.Rproj.user
-*.Rcheck
-.Rproj.user
-.Rhistory
-.RData
+
 *.ipynb_checkpoints
 
 *_cache/
 *_files/
+*_site/
+
+*.tar.gz
+
+# pypi
+push_pypi.sh
 
-*.tar.gz
+# virtual environments
+.venv
diff --git a/mofapy2/notebooks/getting_started_python.ipynb b/mofapy2/notebooks/getting_started_python.ipynb
@@ -16,7 +16,7 @@
    "metadata": {},
    "source": [
     "This notebook contains a detailed tutorial on how to train MOFA using Python.\n",
-    "A template script to run the code below can be found [here](https://github.com/bioFAM/MOFA2/blob/master/inst/scripts/template_script.py)."
+    "A template script to run the code below can be found [here](https://github.com/bioFAM/MOFA2/tree/master/inst/scripts)."
    ]
   },
   {
@@ -253,7 +253,6 @@
    },
    "source": [
     "### 2.3 Define data options\n",
-    "- **scale_groups**: if groups have different ranges/variances, it is good practice to scale each group to unit variance. Default is False\n",
     "- **scale_views**: if views have different ranges/variances, it is good practice to scale each view to unit variance. Default is False"
    ]
   },
@@ -269,7 +268,6 @@
    "outputs": [],
    "source": [
     "ent.set_data_options(\n",
-    "    scale_groups = False, \n",
     "    scale_views = False\n",
     ")"
    ]
@@ -281,7 +279,7 @@
     "### 2.4) Add the data to the model\n",
     "\n",
     "This has to be run after defining the data options\n",
-    "- **likelihoods**: a list of strings, either \"gaussian\", \"poisson\" or \"bernoulli\". If None (default), they are guessed internally"
+    "- **likelihoods**: a list of strings, either \"gaussian\" (default), \"poisson\" or \"bernoulli\""
    ]
   },
   {
@@ -320,14 +318,11 @@
     }
    ],
    "source": [
-    "# option 1: data.frame format (slower)\n",
+    "# option 1: data.frame format\n",
     "ent.set_data_df(data_dt, likelihoods = [\"gaussian\",\"gaussian\"])\n",
     "\n",
-    "# option 2: nested matrix format (faster)\n",
-    "ent.set_data_matrix(data_mat, likelihoods = [\"gaussian\",\"gaussian\"])\n",
-    "\n",
-    "# AnnData format\n",
-    "# (...)"
+    "# option 2: nested matrix format\n",
+    "ent.set_data_matrix(data_mat, likelihoods = [\"gaussian\",\"gaussian\"])"
    ]
   },
   {
@@ -338,8 +333,7 @@
     "\n",
     "- **factors**: number of factors\n",
     "- **spikeslab_weights**: use spike-slab sparsity prior in the weights? default is TRUE\n",
-    "- **ard_factors**: use ARD prior in the factors? Default is TRUE if using multiple groups. This is guessed by default.\n",
-    "- **ard_weights**: use ARD prior in the weights? Default is TRUE if using multiple views. This is guessed by default.\n",
+    "- **ard_weights**: use ARD prior in the weights? Default is TRUE if using multiple views.\n",
     "\n",
     "Only change the default model options if you are familiar with the underlying mathematical model!\n"
    ]
@@ -374,7 +368,6 @@
     "ent.set_model_options(\n",
     "    factors = 10, \n",
     "    spikeslab_weights = True, \n",
-    "    ard_factors = True,\n",
     "    ard_weights = True\n",
     ")"
    ]
@@ -385,13 +378,9 @@
    "source": [
     "## 4) Set training options\n",
     "\n",
-    "- **iter**: number of iterations. Default is 1000.\n",
-    "- **convergence_mode**: \"fast\", \"medium\", \"slow\". For exploration, the fast mode is good enough.\n",
-    "- **startELBO**: initial iteration to compute the ELBO (the objective function used to assess convergence)\n",
-    "- **freqELBO**: frequency of computations of the ELBO (the objective function used to assess convergence)\n",
+    "- **convergence_mode**: \"fast\" (default), \"medium\", \"slow\".\n",
     "- **dropR2**: minimum variance explained criteria to drop factors while training\n",
-    "gpu_mode: use GPU mode? (needs cupy installed and a functional GPU, see https://cupy.chainer.org/)\n",
-    "- **verbose**: verbose mode?\n",
+    "- **gpu_mode**: use GPU? (needs cupy installed and a functional GPU, see https://biofam.github.io/MOFA2/gpu_training.html)\n",
     "- **seed**: random seed"
    ]
   },
@@ -420,45 +409,13 @@
    ],
    "source": [
     "ent.set_train_options(\n",
-    "    iter = 1000, \n",
     "    convergence_mode = \"fast\", \n",
-    "    startELBO = 1, \n",
-    "    freqELBO = 1, \n",
     "    dropR2 = 0.001, \n",
     "    gpu_mode = True, \n",
-    "    verbose = False, \n",
     "    seed = 1\n",
     ")"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "##### 5) (Optional)  stochastic inference options\n",
-    "\n",
-    "If the number of samples is very large (in the order of >1e4), you may want to try the stochastic inference scheme. However, it requires some additional hyperparameters that in some data sets may need to be optimised (see the [stochastic vignette](https://raw.githack.com/bioFAM/MOFA2/master/MOFA2/vignettes/stochastic_inference.html)).\n",
-    "\n",
-    "- **batch_size**: float value indicating the batch size (as a fraction of the total data set: either 0.10, 0.25 or 0.50)\n",
-    "- **learning_rate**: learning rate (we recommend values from 0.25 to 0.50)\n",
-    "- **forgetting_rate**: forgetting rate (we recommend values from 0.75 to 1.0)\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# We do not want to use stochastic inference for this data set\n",
-    "\n",
-    "# ent.set_stochastic_options(\n",
-    "#     batch_size = 0.5,\n",
-    "#     learning_rate = 0.5, \n",
-    "#     forgetting_rate = 0.25\n",
-    "# )"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -565,13 +522,13 @@
    "source": [
     "## 7) Downstream analysis\n",
     "\n",
-    "This finishes the tutorial on how to train a MOFA object from python. To continue with the downstream analysis you have to switch to R. Please, follow [this tutorial](https://raw.githack.com/bioFAM/MOFA2/master/MOFA2/vignettes/downstream_analysis.html)."
+    "This finishes the tutorial on how to train a MOFA model from python. To continue with the downstream analysis you can either use the [mofax](https://github.com/gtca/mofax) python package or the [MOFA2](https://www.bioconductor.org/packages/release/bioc/html/MOFA2.html) R package. Please, visit our [tutorials](https://biofam.github.io/MOFA2/tutorials.html) webpage for more information."
    ]
   }
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -585,7 +542,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.7"
+   "version": "3.10.5"
   },
   "latex_metadata": {
    "affiliation": "European Bioinformatics Institute, Cambridge, UK",