work in progress

kassonlab · Dec 24, 2018 · 8ba891a · 8ba891a
1 parent 150914a
commit 8ba891a
Showing 1 changed file with 385 additions and 0 deletions.
diff --git a/examples/updatedExample.ipynb b/examples/updatedExample.ipynb
@@ -0,0 +1,385 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# gmxapi Python module demonstration\n",
+    "This notebook illustrates the Python interface for gmxapi with current and planned functionality and syntax.\n",
+    "Additional design aspects are illustrated where possible."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# module built and installed with https://github.com/kassonlab/gmxapi\n",
+    "import gmx\n",
+    "# module built and installed with https://github.com/kassonlab/sample_restraint\n",
+    "import myplugin"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Some test files are bundled with the package\n",
+    "from gmx.data import tpr_filename"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# wrap `grompp` CLI while procedural interface evolves\n",
+    "source = gmx.commandline_operation('gmx', 'grompp',\n",
+    "                                   input={},\n",
+    "                                   output={})\n",
+    "simulation = gmx.mdrun(input=source)\n",
+    "print(simulation)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The object returned is an element of a complete specification of runnable work.\n",
+    "\n",
+    "(version 0.1.0) WorkElement is a view into a WorkSpec object\n",
+    "\n",
+    "(version 0.0.5) WorkElement has a reference to the WorkSpec it is associated with.\n",
+    "\n",
+    "Though the helper function generates more than one WorkElement, the element associated with the MD simulation is the only meaningful thing to return a handle to. With a convention that all elements have an attribute for the associated workspec, functions requiring workspec inputs can be much more flexible and intuitive to users.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "simulation.workspec"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# WorkElement is individually serializeable as JSON\n",
+    "simulation.serialize()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Referenced WorkSpec is serializeable for persistence and portability.\n",
+    "simulation.workspec.serialize()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(simulation.workspec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The first version of the schema to represent user-requested work has a data structure that is easily serialized with simple grammar. Dependencies of one element on another determines order of processing and allows binding between API objects to be managed at launch. \"gmxapi\" and \"gromacs\" namespaces are reserved for operations provided by the libraries. Other namespaces are assumed to be accessible Python modules.\n",
+    "\n",
+    "Schema version 0.2 should probably specify a character encoding, but will not have major syntactical differences. Its primary purpose will be to establish tighter constraints on content and more elaborate semantics.\n",
+    "\n",
+    "The workflow (and subsets thereof) must be uniquely identifiable to support artifact management and optimal restartability."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "simulation.workspec.uid()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "MD extension code binds itself to the MD runner at run time with a library API. The user interface just expresses the work components and dependencies in launching the workflow. The WorkSpec is buit up with additional elements that assert a loadable module (_namespace_), a Director functor for building up the execution graph (_operation_), parameters to the operation (_params_), and launch sequence (_depends_). The user should not be required to view or understand these details, but automated generation of appropriate helper functions has not yet been added to gmxapi or the sample plugin template code."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Parameters for Hooke's law pair restraint\n",
+    "params = {'sites': [1,4], # indices of participating sites\n",
+    "          'r0': 2.0,      # equilibrium separation\n",
+    "          'k': 100.0      # spring constant\n",
+    "         }"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if gmx.version.api_is_at_least(0, 1, 0):\n",
+    "    # Near-term planned feature is automatic helper functions for plugins\n",
+    "    potential = myplugin.harmonic_restraint(**params)\n",
+    "    \n",
+    "else:\n",
+    "    # This API should not really be visible to users...\n",
+    "    potential = gmx.workflow.WorkElement(namespace='myplugin',\n",
+    "                                         operation='create_restraint',\n",
+    "                                         params=params)\n",
+    "    potential.name = 'harmonicRestraint'\n",
+    "\n",
+    "print(gmx.__version__)    "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Procedural interface hides several abstraction layers\n",
+    "simulation.add_dependency(potential)\n",
+    "status = gmx.run(simulation)\n",
+    "\n",
+    "# gmx.Status interface needs some work\n",
+    "bool(status)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`gmx.run()` provides the \"big green go button\" that users expect, but the hidden layers of abstraction are also accessible to users. gmxapi 0.1 specificies that gmx.run() will use or configure an appropriate execution context for the specified work, launch a session, and run until data flow for results is resolved. It is assumed to be essentially an alias for the following."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with gmx.get_context(simulation.workspec) as session:\n",
+    "    session.run()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The Context abstraction exists to allow modular handling of computing resources and environments. The user could specify non-default Context implementations or load Contexts from other Python modules.\n",
+    "\n",
+    "```\n",
+    "context = gmx.context.ParallelArrayContext(simulation)\n",
+    "with context as session:\n",
+    "    session.run()\n",
+    "```\n",
+    "\n",
+    "Near term plans for Context extensions include\n",
+    "\n",
+    "* support for tMPI or MPI GROMACS builds,\n",
+    "* serial execution fallback for workflows when MPI is not available\n",
+    "* remote SLURM execution\n",
+    "* container management"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Operations in work specifications are essentially assumed to be \"vectorized\" for parallel pipelines. The degree of required synchrony depends on whether and how often \"ensemble\" operations occur that block across multiple pipelines.\n",
+    "\n",
+    "The API to express synchrony is not yet specified. For the short term, parallel pipelines must be executed synchronously to allow operations to communicate using Session-provided ensemble resources.\n",
+    "\n",
+    "Through 0.0.5, gmxapi uses mpi4py to execute arrays of work synchronously. \"Width\" of the workflow can be asserted by work elements according to input parameters, such as the length of a list of input filenames."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "simulation = gmx.workflow.from_tpr([tpr_filename, tpr_filename])\n",
+    "print(simulation.workspec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`simulation` now represents an array of MD simulation jobs. The singular restraint potential is \"broadcast\" across the trajectory ensemble. Each simulation is bound to separate instances of the restraint potential, each initialized with the same parameters.\n",
+    "\n",
+    "At launch, the restraint code is provided with ensemble resources by the Session with which to perform operations such as an ensemble-wide reduction of array data.\n",
+    "\n",
+    "Individual restraint objects don't know or care whether or how many other members are in the ensemble.\n",
+    "\n",
+    "The Context proxies access to the various filesystem-backed results and to the parallel (MPI) execution context."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if gmx.version.api_is_at_least(0, 1, 0):\n",
+    "    potential = myplugin.harmonic_restraint(**params)\n",
+    "else:\n",
+    "    potential = gmx.workflow.WorkElement(namespace='myplugin',\n",
+    "                                         operation='create_restraint',\n",
+    "                                         params=params)\n",
+    "    potential.name = 'harmonicRestraint'\n",
+    "\n",
+    "simulation.add_dependency(potential)\n",
+    "\n",
+    "print(simulation.workspec)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`sample_restraint/examples/restrained-ensemble.py` is the (updated) script used for the restrained ensemble proof-of-concept workflow implemented with gmxapi 0.0.4. [DOI: 10.1093/bioinformatics/bty484](https://doi.org/10.1093/bioinformatics/bty484)\n",
+    "\n",
+    "As formulated below, an ensemble of 20 trajectories was coupled by a restrained ensemble plugin potential that periodically combined statistical information from across the ensemble to refine a restraint potential."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Restrained-ensemble formalism is a variant of that defined by Roux et al., 2013\n",
+    "\n",
+    "import os\n",
+    "import sys\n",
+    "import gmx\n",
+    "\n",
+    "import logging\n",
+    "logging.getLogger().setLevel(logging.DEBUG)\n",
+    "# create console handler\n",
+    "ch = logging.StreamHandler()\n",
+    "ch.setLevel(logging.DEBUG)\n",
+    "# create formatter and add it to the handler\n",
+    "formatter = logging.Formatter('%(asctime)s:%(name)s:%(levelname)s: %(message)s')\n",
+    "ch.setFormatter(formatter)\n",
+    "# add the handlers to the logger\n",
+    "logging.getLogger().addHandler(ch)\n",
+    "logger = logging.getLogger()\n",
+    "\n",
+    "import myplugin\n",
+    "\n",
+    "logger.info(\"myplugin is {}\".format(myplugin.__file__))\n",
+    "\n",
+    "if len(sys.argv) > 1:\n",
+    "    size = int(sys.argv[1])\n",
+    "else:\n",
+    "    size = 20\n",
+    "input_dir_list = ['aa_{:02d}'.format(i) for i in range(size)]\n",
+    "print(\"Input directory list: {}\".format(input_dir_list))\n",
+    "\n",
+    "tpr_list = [os.path.abspath(os.path.join(directory, 'mRMR.tpr')) for directory in input_dir_list]\n",
+    "\n",
+    "# dt = 0.002\n",
+    "# First restraint applied between atoms 387 and 2569\n",
+    "# Second restraint applied between atom 1330 and 2520\n",
+    "# Restraint site coordinates relative to atom 1735\n",
+    "# Gathers 50 distance samples over 10ps, then averages the histogram across the ensemble to\n",
+    "# get a smooth histogram for the sample window. At each update (10 ps), updates the bias\n",
+    "# potential with the average statistics from the last 20 windows.\n",
+    "params = {\n",
+    "    'sites': [387, 1735, 2569],\n",
+    "    'k': 100.,\n",
+    "    'sigma': 0.2,\n",
+    "    'nbins': 70,\n",
+    "    'binWidth': 0.1,\n",
+    "    'max_dist': 6.0,\n",
+    "    'min_dist': 1.9,\n",
+    "    'experimental': [1.799741371805743e-21, 1.394386099050501e-19, 8.502972718446353e-18, 4.085581973134053e-16, 1.548764419813057e-14, 4.638792711370235e-13, 1.0996581066864261e-11, 2.0673577567003664e-10, 3.0896196375481035e-09, 3.680765701683053e-08, 3.5071251461730994e-07, 2.683161946307409e-06, 1.6559233749685584e-05, 8.289071953350906e-05, 0.00033870482482321125, 0.0011381605345541928, 0.0031720369601255603, 0.007403098031042665, 0.014627984136430199, 0.024781412546281113, 0.036538520471987135, 0.04776926450937005, 0.056703917249967935, 0.06290983130284482, 0.06723680313442071, 0.07080726976949929, 0.07402160052652465, 0.0765402296381977, 0.07828089499810763, 0.08047585508402359, 0.08570745927698609, 0.09674399700081715, 0.11510583738691207, 0.14051216332953817, 0.17140325711911122, 0.20554101371952832, 0.23970384865306096, 0.2690817283268939, 0.2883208679737597, 0.2947929622276882, 0.2912919631495654, 0.28445334990710786, 0.27916526960634136, 0.2740440567694397, 0.2627885645671059, 0.24084583312911054, 0.2119920710658402, 0.18961266641719554, 0.19160148241368294, 0.23183520488057596, 0.31258390292452404, 0.4212625964074282, 0.5329315503397933, 0.6177862653849041, 0.6510130306580447, 0.621354685844679, 0.5350130330039692, 0.4131558700737114, 0.28383821525616004, 0.17174126508493523, 0.0904869618596048, 0.04102083542122555, 0.0158113507543527, 0.00512381828028717, 0.0013817258262933331, 0.00030725601604590235, 5.5898129564429734e-05, 8.270145724798172e-06, 1.066958972950409e-06, 8.525674649177577e-07],\n",
+    "    'nsamples': 5, # window size: 100 ps\n",
+    "    'sample_period': 10000*0.002, # 20 ps\n",
+    "    'nwindows': 100, # averaging period: 10 ns\n",
+    "    }\n",
+    "\n",
+    "potential1 = gmx.workflow.WorkElement(\n",
+    "    namespace=\"myplugin\",\n",
+    "    operation=\"ensemble_restraint\",\n",
+    "    depends=[],\n",
+    "    params=params\n",
+    "    )\n",
+    "potential1.name = \"ensemble_restraint_1\"\n",
+    "\n",
+    "params['sites'] = [1330, 1735, 2520]\n",
+    "params['experimental'] = [8.750538172089207e-20, 4.963054541010076e-18, 2.2585602895138136e-16, 8.296295971141421e-15, 2.4727072025999945e-13, 6.001704592322284e-12, 1.188164346191529e-10, 1.917922517069427e-09, 2.52033705254017e-08, 2.691327568605142e-07, 2.3326113123212396e-06, 1.641120092881496e-05, 9.38976910432654e-05, 0.0004385427322814972, 0.0016815454179692623, 0.005334700709331302, 0.014137960662648396, 0.03164990069467528, 0.06058416858044986, 0.10044829089636065, 0.1462620702334438, 0.19008247987025195, 0.22511473247247643, 0.2496090827957139, 0.2672559820151655, 0.28375718473756745, 0.3027960501795795, 0.3245600144118122, 0.346907132615181, 0.3674499825469857, 0.3851742255922683, 0.40052661472987944, 0.4135408960076755, 0.4218337765359014, 0.42147736029942173, 0.4107213639593705, 0.39238589395514606, 0.37160127657310765, 0.3512844290946452, 0.3307655537130543, 0.30883452118638083, 0.2868146457416455, 0.2677676912313197, 0.2532530719313838, 0.2417481938021803, 0.23018156335307405, 0.21644933221299598, 0.20050997978961216, 0.18377185607646654, 0.16786393271557773, 0.15362016010974153, 0.1405238761751657, 0.12680145743115132, 0.11034786248372368, 0.09017088591908493, 0.06741463593670398, 0.045069844741355614, 0.026422842565654883, 0.013359579997979203, 0.005742341678209477, 0.0020723966306519146, 0.0006212767709345187, 0.00015329363473072974, 3.088693304526221e-05, 5.048219168357052e-06, 6.655228160961857e-07, 7.04490968489331e-08, 6.246463184699169e-09, 4.5050655561489735e-09, 4.676670950356501e-08]\n",
+    "\n",
+    "potential2 = gmx.workflow.WorkElement(\n",
+    "    namespace=\"myplugin\",\n",
+    "    operation=\"ensemble_restraint\",\n",
+    "    depends=[],\n",
+    "    params=params\n",
+    ")\n",
+    "potential2.name = \"ensemble_restraint_2\"\n",
+    "\n",
+    "\n",
+    "# Settings for a 20 core HPC node. Use 18 threads for domain decomposition for pair potentials\n",
+    "# and the remaining 2 threads for PME electrostatics.\n",
+    "md = gmx.workflow.from_tpr(tpr_list, tmpi=20, grid=[3, 3, 2], ntomp_pme=1, npme=2, ntomp=1)\n",
+    "md.add_dependency(potential1)\n",
+    "md.add_dependency(potential2)\n",
+    "\n",
+    "context = gmx.context.ParallelArrayContext(md)\n",
+    "\n",
+    "with context as session:\n",
+    "    session.run()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}