This document, which is a work in progress (contributions welcome), provides information on preparing your computing environment for PharmSci 175/275.
This class involves using a considerable amount of Python and computational chemistry software. There are two main ways you can use this software, requiring rather different setup and requirements:
- Install and run in the cloud via Google Colab (minimal hardware requirements; may even work on tablets and/or phones)
- Install and run locally on your computer
Here, we recommend and support the first approach, especially for novice users. The second approach is also documented, but historically poses major challenges for many Windows users (most scientific computing is done in a Linux-like environment, e.g. Mac or Linux) and/or those without significant computing/command-line experience.
Thus, here, we first describe the Google Colab approach as recommended for all users, then discuss the local installation approach further down.
The one major downside of the Google Colab approach is that doing calculations requires your computer to be on, awake, and have internet access. Some of the assignments in this course can run a few hours or more, so these will require planning (or running overnight) if you are using Google Colab.
For Google Colab, you will be running everything in the cloud ON Google Colab, which is free. If you plan to go this route (which we recommend unless you have a strong reason to run locally, though see caveat above about internet access) this getting started notebook can be opened in Colab and used to install/test out the requisite software. (Two getting started notebooks for Colab are provided; the condacolab
one is probably superior, but the other is a fallback option.)
Each time you begin using Google Colab, it's like beginning on a new computer. This means that before every lecture or assignment, you will need to install the required software on Google Colab, which takes just a few minutes. In other words, you will need to insert the commands from the getting started notebook at the beginning of the notebook you will be using on Colab, and run them.
Notebooks we use in class will have pointers to this "getting started" content to remind you of this.
If you are using Colab, you do not need to read the remainder of this document.
As noted, we can support Google Colab across platforms and even for novices. Local installation requires more expertise, and is often problematic for Windows users.
Before getting started, note that the below assumes at least some modest familiarity with the BASH shell and the idea of paths, file names, and basic Linux commands. If you do not have this familiarity, you may need to consult the BASH cheat sheet and other sources of information (such as the Linux/bash crash course here) before proceeding. (MIT's The Missing Semester of Your CS Education provides a more complete and diverse introduction in the form of a full course.)
If you are on Windows, you basically have three options:
- Dual boot into Linux (best option, but requires some expertise and/or care to set up, and not something this course will help you do)
- Use Windows Subsystem for Linux (may work for much but likely not all of the software used in this course)
- Boot into Linux from a USB drive, e.g. a thumb drive with a persistent Linux distribution
We discuss each of those in turn here:
Dual boot (WINDOWS ONLY):
This is your best option on Windows, but not one we can help you set up. It requires some level of technical expertise/proficiency, not because it is difficult, but because missteps can result in your computer becoming unbootable, e.g. "a brick". This is the best route if you can manage it, but this class cannot support you in going this route, and if you decide to go this route, we are not responsible for any damage you might cause to your computer.
Windows Subsystem for Linux (WSL):
Install BASH on Windows: Follow the official guide linked here
If the BASH terminal guide works and you can successfully use BASH commands (i.e cd
, ls
). Now, try performing the installation steps to see if we can get Anaconda/OpenEye installed on your local machine too.
There may be some software we use in this course which does not work well on Windows (specifically, the Windows Subsystem for Linux (WSL) as discussed here) as Windows is not broadly used in scientific computing/our open source stack.
In the past, we have on occasion explored an alternative approach involving bootable USB drives (e.g. thumb drives) with a persistent Linux distribution available pre-installed.
If needed, we can explore this with Dr. Mobley, though it has not been attempted in several years.
Because of hardware differences, it is unlikely that the same installation (below) will work both on a personal laptop and on the computers in the classroom, so you would need to pick one or the other and coordinate with Dr. Mobley.
If using the USB drive approach, see docs/persistent-usb.md
for additional information and instructions before following the below instructions.
Here, you will need to complete several main steps in the install, each of which has its own section:
- Install Anaconda Python and get the repository
- (Optionally) Configure a conda environment if desired
- Use conda to install gfortran
- Use f2py3 to compile libraries for lectures/assignments (sometimes breaks if step (5) is done first)
- Use conda to finish installing prerequisites
- Install the OpenEye license
Download the Anaconda Python 3 installation file or download it from the website or use (from the command prompt):
(Linux/OSX)
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-MacOSX-x86_64.sh
(You can get a related link for Windows or Linux and use a similar command.)
Install Anaconda (this may take 15-30mins), filling in the "fill in the rest here" part with the appropriate name of the file you downloaded above (or run the interactive installer if you downloaded that):
bash Anaconda_fillintheresthere.sh -b
Make sure the anaconda3 path is added to your ~/.bash_profile
(often this is automatically added by the installer, but make sure it ends up there), e.g. via:
echo 'PATH="$HOME/anaconda3/bin:$PATH"' >> ~/.bash_profile
When it asks you to add Anaconda to your bash shell PATH, select YES. (If you are using a different shell, you need to make a similar change to your shell's configuration.)
Check that Anaconda installed properly by first running which python
, which should show your newly installed python, e.g.:
#: which python
$HOME/anaconda3/bin/python
To ensure it works, run the command python
in a new terminal. Its output should look something like:
Python 3.8.8 (default, April 13 2021, 12:59:45)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
Type exit()
or ctrl-d ("control-d") to leave the python shell.
Troubleshooting python
If which python
just gives a blank line, then it means it cannot find any python in your $PATH
. Ensuring that ~/.bash_profile
was modified correctly, use grep -c anaconda3/bin ~/.bash_profile
and check that it prints a number greater than 0 (0 means not found). If you do get 0, then go back and follow the above steps. Now, try to source it (source ~/.bash_profile
) and repeat the checks above. If it works this time, it means your terminal is not sourcing ~/.bash_profile
automatically. Some OS e.g. certain linux distributions, will source ~/.bashrc
rather than ~/.bash_profile
, since bash_profile
is only sourced for login shells (this is a technicality not important here). A common thing to do is put everything in bashrc, and have bash_profile source it, ensuring every terminal will work the same:
$: cat ~/.bash_profile
# /etc/skel/.bash_profile
# This file is sourced by bash for login shells. The following line
# runs your .bashrc and is recommended by the bash info pages.
[[ -f ~/.bashrc ]] && . ~/.bashrc
where cat
is a program that prints the file. The last line is bash code for "if ~/.bashrc is a file, then source it" where . ~/.bashrc
is short for source ~/.bashrc
and ~
is short for $HOME
.
Q: What if which python
shows a python that is not from anaconda3?
Then another python is installed on your system, and you likely did not follow the above steps regarding modifying $PATH
. If you prefer to not have anaconda load its own python for every terminal, you can add the following to your ~/.bashrc
:
alias miniconda="source ~/miniconda3/etc/profile.d/conda.sh; conda activate base"
where alias
is essentially a macro. The conda.sh is what is normally sourced by automatic vanilla installation, and will modify $PATH
to give its own python first choice. This will then activate a "base" environment, which you may or may not want to use (see below in the next section).
Q: Anaconda versus miniconda? The only difference is the inclusion of GUI (graphical user interface) and a lot of prepackaged software in the full Anaconda installation given above. You may opt to install miniconda instead, which provides the exact same terminal functionality, and only installs the basics (python, conda, etc.). As a quick comparison, Anaconda is ~500MB where miniconda is ~50MB. See this thread for more discussion).
Go to [https://conda.io/en/latest/miniconda.html] to download miniconda.
Anytime after Anaconda is installed (and before step 4 involving pre-compiling the libraries), run the following on the command-prompt
git clone [email protected]:MobleyLab/drug-computing.git
cd drug-computing
This checks out (obtains) a copy of this repository so you can work with it and the files in it, if you like (you'll be using this to access lecture content and other materials from this class.) (If you have trouble with this, you may want to try the https version of the command, git clone https://github.com/MobleyLab/drug-computing.git
)
Anaconda includes with it the conda
environment/package manager, meaning that it can also install other software which you need.
Here we will use the conda
package manager to install the software you need.
First, you need to decide whether or not to use a conda environment (env
) for the course:
- If you have no idea what this means, only just installed anaconda, and do not have an existing set of conda packages you use extensively, you probably do not want to use an
env
- If you already have an extensive set of packages managed with
conda
and you want to ensure you do not break or modify your existing installation, you probably DO want to create a custom environment (env
) for this course.
If you are do not need an env
, just proceed straight to installation.
If you do need an env
, use this info to create a new Python 3.8 conda environment called drugcomp
(e.g. conda create -n drugcomp python=3.8
) and activate this environment (source activate drugcomp
) before doing the installs discussed below.
Whenever you do work for the class, you will need to activate this environment.
For some of our assignments/lectures (energy minimization, MD, MC) we will use f2py3
to compile some fortran code for use in Python (to make some numerical operations fast), which requires a fortran compiler.
We do this step first because in some cases, we're seeing that other packages cause problems for gfortran.
Proceed to installation:
conda install -c conda-forge libgfortran --yes
If on Mac OS, you also need XCode's command-line tools:
To use this on OS X, you will need to install the XCode command-line tools via something like xcode-select --install
from the command-line.
Without this you will get an error message relating to limits.h
when attempting to execute f2py3.
A subtle problem arises if you install a compiler with conda (e.g. gcc
) and have XCode installed as well. This can be a source of headache/confusion, so just be aware that multiple compilers will exist on your machine, and care must be taken to ensure only one is used at a time.
f2py3 can turn prepared Fortran code into Python libraries; here, we use this for a few computationally intensive portions of the course.
You should pre-compile the course libraries before finishing installation as we find that sometimes, subsequent installations in the same conda environment break gfortran.
Using the command-line, pre-compile the relevant libraries listed here:
- assignments:
- MC/mclib.f90
- MD/mdlib.f90
- energy_minimization/emlib.f90
- lectures:
- MC/mc_sandbox.f90
- MD/md_sandbox.f90
To compile each, navigate to the relevant directory (which you will have checked out to your local computer from GitHub) and use the command f2py3 -c -m mclib mclib.f90
, for example (for the MC library); the first argument is the final name of the module and the second argument is the name of the .f90 file. You would execute the above in the MC
directory, then do a similar thing for the MD assignment/library, energy_minimization assignment/library, and the MC and MD lectures. You should end up with a total of five files, each in the appropriate directory.
(If you fail to pre-compile these libraries, you can fix this issue by creating a clean conda environment later, installing only gfortran, then compiling the libraries. Then you can use them back in your original conda environment.)
Proceed to finish installation:
conda config --add channels conda-forge --yes
conda install parmed --yes
conda install openff-toolkit pdbfixer nb_conda mpld3 scikit-learn seaborn bokeh py3dmol --yes
conda install -c openeye openeye-toolkits --yes
conda install -c anaconda requests
conda install pyemma --yes
conda install -c openeye/label/Orion oeommtools --yes
# Optional (enables 3D movies in Jupyter notebooks, but can be finicky)
conda install -c conda-forge nglview --yes
The above installs quite a variety of software packages and may take a reasonable chunk of time to complete, even on a fairly fast connection.
Specific notebooks/assignments used in class may have additional requirements and in general these will be mentioned at the top of the notebook; you should set aside some extra time to install before using a particular notebook.
Also install the OpenEye oenotebook Jupyter helper, IF it installs OK in your environment (this is handy, but nonessential):
pip install --extra-index-url https://pypi.org/simple --extra-index-url https://pypi.anaconda.org/openeye/simple/ -i https://pypi.anaconda.org/openeye/label/oenotebook/simple openeye-oenotebook
Download a copy of the oe_license.txt
OpenEye license file from the course Canvas site, as you will need it for what follows.
Put the license file somewhere safe, where you can find it again, and note the path (directory location) where you put it. Then do the following, replacing PATHTOFILE with where you put it:
#Add the OE_LICENSE to your ~/.bash_profile
# (Here, replace <PATHTOFILE> with the path of the place you have put this file)
echo export OE_LICENSE="<PATHTOFILE>/oe_license.txt" >> ~/.bash_profile
Verify the installation with:
oecheminfo.py
The output should look something like:
oechem
info.py
Installed OEChem version: 3.1.1.1 platform: osx-clang++-x64 built: 20210708 release name: 2021.1.1
Examples: /Users/dmobley/opt/anaconda3/envs/drugcomp/lib/python3.8/site-packages/openeye/examples
Doc Examples: /Users/dmobley/opt/anaconda3/envs/drugcomp/lib/python3.8/site-packages/openeye/docexamples
code| ext | description |read? |write?
----+---------------+------------------------------------------+------+------
1 | smi | Canonical stereo SMILES | yes | yes
2 | mdl,mol,rxn | MDL Mol | yes | yes
3 | pdb,ent | PDB | yes | yes
4 | mol2,syb | Tripos MOL2 | yes | yes
5 | usm | Non-Canonical non-stereo SMILES | yes | yes
6 | ism,isosmi | Canonical stereo SMILES | yes | yes
7 | mol2h | MOL2 with H | yes | yes
8 | sdf,sd | MDL SDF | yes | yes
9 | can | Canonical non-stereo SMILES | yes | yes
10 | mf | Molecular Formula | no | yes
11 | xyz | XYZ | yes | yes
12 | fasta,seq | FASTA | yes | yes
13 | mopac,pac | MOPAC | no | yes
14 | oeb | OEBinary v2 | yes | yes
15 | dat,mmd,mmod | Macromodel | yes | yes
16 | sln | Tripos SLN | no | yes
17 | rdf,rd | MDL RDF | yes | no
18 | cdx | ChemDraw CDX | yes | yes
19 | skc | MDL ISIS Sketch File | yes | no
20 | inchi | IUPAC InChI | yes | yes
21 | inchikey | IUPAC InChI Key | no | yes
22 | csv | Comma Separated Values | yes | yes
23 | json | JavaScript Object Notation | yes | yes
24 | cif | Crystallographic Information File (CIF) | yes | no
25 | oez | Zstd Compressed OEBinary | yes | yes
26 | cif | Macromolecular CIF | yes | no
----+---------------+------------------------------------------+------+------
You should doublecheck that the OpenEye installation is working corectly by opening python (on the command prompt) or a Jupyter notebook and typing:
from openeye.oechem import *
mol = OEMol()
and you should get no errors.
If you have errors with your OpenEye installation and have verified that you have an OpenEye license file, it is in the correct place, and properly listed in your ~/.bash_profile
file, you may need to edit your ~/.bashrc
file to point to your ~/.bash_profile
file. Particularly, I have noticed that on some dual boot installations of Ubuntu in some cases this step may be necessary. You would just add a line to the end of your ~/.bashrc
file that says source ~/.bash_profile
After the above, please also use jupyter-nbextension enable nglview --py --sys-prefix
on the command-line to enable the nglview
extension for interactive visualization in Jupyter notebooks.
Many of the notebooks are formatted as RISE notebooks which can be presented as slides using the RISE plugin. To get this to work, I have needed to:
- Install as normal (above)
conda update notebook
(RISE needed more recent version)conda install -c conda-forge rise
python -m ipykernel install --user --name=drugcomp
(or whatever environment is being used above) to make sure this environment is activated in the notebook- Once installed, if a notebook with RISE slides is active, use option-R to enter the slideshow.
Note that Google Colab and environments without RISE will strip out the RISE formatting from notebooks and make it necessary to re-add it.