If you have any problems with this installation, please file an issue and describe any problems, so we can improve the instructions.
Download the Python 3.7 Anaconda installer and run the Anaconda installer.
Then go to the Start Menu and open Anaconda Prompt
For easy access, pin the Anaconda Prompt to the task bar.
The Git version control system is used to download repositories from Github.
Skip this step if you have git
installed.
Download Git and run the git installer (choose all default options).
git clone https://github.com/sbl-sdsc/mmtf-genomics.git
setx SPARK_CONF_DIR <your path>\mmtf-genomics\conf
setx HADOOP_HOME <your path>\mmtf-genomics\conf
Important: Close the Anaconda Prompt and reopen it to set the environment variables.
cd mmtf-genomics
conda env create -f binder/environment.yml
conda activate mmtf-genomics
jupyter lab
conda deactivate
Anytime you want to use the environment, activate it again and start Jupyter Notebook
conda env remove -n mmtf-genomics
When running PySpark on many cores (e.g., > 8), the memory for the Spark Driver and Workers may need to be increased. To change memory setting, go to the mmtf-genomics\conf
folder and edit the file spark-env.cmd
. By default, this file has the following settings:
SPARK_DRIVER_MEMORY=4G
SPARK_WORKER_MEMORY=4G
When running this repo on 24 core machine, you may need to increase the memory settings to 20G.