Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFold Local Jupyter Notebook 📔 | Metrics, Plots, Concurrent Inference #484

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

juliocesar-io
Copy link

@juliocesar-io juliocesar-io commented Aug 28, 2024

Overview

This PR introduces a fully featured Local Notebook for performing inference, obtaining metrics, ranking the best model, and generating plots in a structured and reproducible manner, particularly for experimentation with large datasets.

The metrics are similar to those in the Colab notebook but optimized for a local installation with Docker. It also introduces parallel execution to leverage multiple GPUs.

The notebook operates by executing Docker commands using the Docker client and accessing OpenFold functions within a standalone environment. This approach ensures that the OpenFold codebase remains unaffected, serving as a client to help reproduce metrics and results from the Colab notebook locally.

Usage

Refer to instructions in notebooks/OpenFoldLocal.ipynb

Setup the notebook

Fist, build Openfold using Docker. Follow this guide.

Then, go to the notebook folder

cd notebooks

Create an environment to run Jupyter with the requirements

mamba create -n openfold_notebook python==3.10

Activate the environment

mamba activate openfold_notebook

Install the requirements

pip install -r src/requirements.txt

Start your Jupyter server in the current folder

jupyter lab . --ip="0.0.0.0"

Access the notebook URL or connect remotely using VSCode.

Inference example

Initializing the client:

import docker
from src.inference import InferenceClientOpenFold

# You can also use a remote docker server 
docker_client = docker.from_env()

# Initialize the OpenFold Docker client setting the database path 

databases_dir = "/path/to/databases"

openfold_client = InferenceClientOpenFold(databases_dir, docker_client)

Running Inference:

# For multiple sequences, separate sequences with a colon `:`
input_string = "DAGAQGAAIGSPGVLSGNVVQVPVHVPVNVCGNTVSVIGLLNPAFGNTCVNA:AGETGRTGVLVTSSATNDGDSGWGRFAG"

model_name = "multimer" # or "monomer"
weight_set = 'AlphaFold' # or 'OpenFold'

# Run inference
run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_string)

Using a file:

input_file = "/path/to/test.fasta"

run_id = openfold_client.run_inference(weight_set, model_name, inference_input=input_file)

Screenshots

Screenshot 2024-08-27 at 6 54 49 PM Screenshot 2024-08-27 at 6 54 17 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant