add some notes to example notebook

kassonlab · Jun 12, 2018 · 1630d1a · 1630d1a
1 parent 0b2f9a9
commit 1630d1a
Showing 1 changed file with 67 additions and 17 deletions.
diff --git a/examples/example.ipynb b/examples/example.ipynb
@@ -1,5 +1,22 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# gmxapi sample workflow using restrained ensemble plugin\n",
+    "\n",
+    "In this notebook, we will walk through a workflow in which we examine a toy system (alanine-dipeptide) with several distinct regions of conformation space, then apply a restrained ensemble biased sampling method to explore the conformational ensemble near the configuration of interest.\n",
+    "\n",
+    "This system is chosen for its low computational cost and well established literature.\n",
+    "\n",
+    "The biased sampling method we will use is follows a restrained ensemble technique that applies a pair restraint between selected atoms to use an (experimentally) observable pair distribution to guide MD sampling. The restraint force is a function of the difference between the target distribution and the simulated ensemble distribution. Our intent is not to promote this biasing technique for this particular system, but rather to simultaneously demonstrate a gmxapi workflow, the gmxapi MD plug-in framework, and one of the example plugin implementations included in the sample_restraint repository. The plugin was developed for simulations requiring tens of thousands of CPU hours, but these examples run in at most a few minutes on a desktop computer.\n",
+    "\n",
+    "The `gmx` Python module is from the gmxapi package. The plugins built with this `sample_restraint` repository are bundled in a package named `myplugin`. While some users may find the restrained ensemble plugin useful, the repository is intended to serve as a template and starting point to develop custom pair restraint potentials. Hopefully, I have removed the least interesting name from the set of possible plugin names, and researchers are encouraged to change the name of the repository and the Python module.\n",
+    "\n",
+    "A note on nomenclature: In Python lingo, `myplugin` is a Python package, a Python module, and a Python C++ extension, but these classifications are not generally equivalent. In this case, the code to calculate forces is written in C++ and built into a shared object library that can be imported into Python. Python objects created with the functions in the package can be passed through gmxapi to allow GROMACS to create local (C++ compiled binary) objects supporting high-performance MD simulation to execute a specified workflow."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -31,18 +48,19 @@
    "outputs": [],
    "source": [
     "# This only works if the gmx binary path was set in the parent process before launching the Jupyter server.\n",
-    "def find_program(program): \n",
-    "    \"\"\"Return the first occurrence of program in PATH or None if not found.\"\"\"\n",
-    "    for path in os.environ[\"PATH\"].split(os.pathsep):\n",
-    "        fpath = os.path.join(path, program)\n",
-    "        if os.path.isfile(fpath) and os.access(fpath, os.X_OK):\n",
-    "            return fpath\n",
-    "    return None\n",
-    "gmx_path = find_program(\"gmx\")\n",
-    "if gmx_path is None:\n",
-    "    gmx_path = find_program(\"gmx_mpi\")\n",
-    "if gmx_path is None:\n",
-    "    raise UserWarning(\"gmx executable not found in path.\")"
+    "# \\todo Make the docker image use the jovyan user PATH\n",
+    "# def find_program(program): \n",
+    "#     \"\"\"Return the first occurrence of program in PATH or None if not found.\"\"\"\n",
+    "#     for path in os.environ[\"PATH\"].split(os.pathsep):\n",
+    "#         fpath = os.path.join(path, program)\n",
+    "#         if os.path.isfile(fpath) and os.access(fpath, os.X_OK):\n",
+    "#             return fpath\n",
+    "#     return None\n",
+    "# gmx_path = find_program(\"gmx\")\n",
+    "# if gmx_path is None:\n",
+    "#     gmx_path = find_program(\"gmx_mpi\")\n",
+    "# if gmx_path is None:\n",
+    "#     raise UserWarning(\"gmx executable not found in path.\")"
    ]
   },
   {
@@ -51,27 +69,45 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Get the path to the `gmx` executable associated with the library we linked against so that we can wrap CLI tools not yet in the API.\n",
     "gmx_path = os.path.join(os.environ['HOME'], 'install/gromacs/bin/gmx')\n",
     "assert os.access(gmx_path, os.X_OK)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the following cell, we set the path to the directory where some input files have been stashed.\n",
+    "It is a subdirectory of the `examples` directory and should contain a topology, MD parameters file, and four (previously equilibrated) atomic configurations from the same alanine-dipeptide system for independent trajectories in an ensemble simulation."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Make sure we've got access to the files we expect.\n",
     "datadir = os.path.abspath('alanine-dipeptide')\n",
     "workingdir = os.path.basename(datadir)\n",
     "os.listdir(datadir)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "gmxapi 0.0.5 requires TPR files for input, but does not have an API tool to generate them from MDP files. Wrap the command-line tool to generate run input files for the four simulations."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
+    "# Turn input files into runnable binary job input.\n",
     "for structure in range(4):\n",
     "    structure_file = os.path.join(datadir, 'equil{}.gro'.format(structure))\n",
     "    tpr_file = os.path.join(datadir, 'input{}.tpr'.format(structure))\n",
@@ -82,6 +118,13 @@
     "    subprocess.call([gmx_path, \"grompp\"] + grompp_args)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We forumulaically generated input files above. We will load the array of four files into a specification of work. The result is a dependency graph of gmxapi operations that is nominally human-readable, but more importantly serializeable and sufficient to direct the construction of a graph of data flow and lower-level API calls to execute the intended work."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -91,15 +134,22 @@
     "tpr_files = [os.path.join(datadir, 'input{}.tpr'.format(i)) for i in range(4)]\n",
     "md = gmx.workflow.from_tpr(input=tpr_files, grid=[1,1,1])\n",
     "\n",
-    "print(\"MD simulation element:\\n{}\".format(md.serialize()))\n",
+    "print(\"MD simulation element:\\n\\n{}\".format(md.serialize()))\n",
     "\n",
-    "print(\"\\nWork specification (pretty printed)\")\n",
+    "print(\"\\nWork specification (pretty printed)\\n\")\n",
     "print(str(md.workspec))\n",
     "\n",
-    "print(\"\\nSerialized work specification\")\n",
+    "print(\"\\nSerialized work specification\\n\")\n",
     "print(md.workspec.serialize())"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For the initial version of this walk-through, we have not chosen or implemented a way to execute the 4-rank simulation ensemble to perform this work. We can run a single ensemble member (below) or we can resort to a Python script in this same directory. From `sample_restraint/examples`, run `mpiexec -n 4 python -m mpi4py example.py` to run the 4-member ensemble and generate the data for the first Ramachandran plot."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -280,9 +330,9 @@
    "outputs": [],
    "source": [
     "potential = gmx.workflow.WorkElement(namespace=\"myplugin\",\n",
-    "                                     operation=\"create_restraint\",\n",
+    "                                     operation=\"ensemble_restraint\",\n",
     "                                     params=[1, 4, 2.0, 10000.0])\n",
-    "potential.name = \"harmonic_restraint\"\n",
+    "potential.name = \"restrained_ensemble\"\n",
     "md.add_dependency(potential)"
    ]
   },