Memory issues for AlphaFlow #89

jyaacoub · 2024-03-27T14:51:48Z

related: #84
Potential Solutions

Look into FSDP for AlphaFlow on davis inputs 12+
Check AlphaFlow code to see how its implemented to improve memory efficiency on GPU.
Look into how to use DeepSpeed for inference instead of FSDP #93

Inputs 12+ failed for Davis (~178 proteins impacted)

203/229 for kiba

jyaacoub · 2024-03-28T15:21:50Z

Histogram plot with stagered labels shows the extent of this issue, especially with davis which has a lot of sequences above 1000

CODE FOR PLOTS

#%%
import os
import pandas as pd
import matplotlib.pyplot as plt

# Function to load sequences and their lengths from csv files
def load_sequences(directory):
    lengths = []
    labels_positions = {}  # Dictionary to hold the last length of each file for labeling
    files = sorted([f for f in os.listdir(directory) if f.endswith('.csv') and f.startswith('input_')])
    for file in files:
        file_path = os.path.join(directory, file)
        data = pd.read_csv(file_path)
        # Extract lengths
        current_lengths = data['seqres'].apply(len)
        lengths.extend(current_lengths)
        # Store the position for the label using the last length in the current file
        labels_positions[int(file.split('_')[1].split('.')[0])] = current_lengths.iloc[0]
    return lengths, labels_positions

p = lambda d: f"/cluster/home/t122995uhn/projects/data/{d}/alphaflow_io"

DATASETS = {d: p(d) for d in ['davis', 'kiba', 'pdbbind']}
DATASETS['platinum'] = "/cluster/home/t122995uhn/projects/data/PlatinumDataset/raw/alphaflow_io"

fig, axs = plt.subplots(len(DATASETS), 1, figsize=(10, 5*len(DATASETS) + len(DATASETS)))

n_bins = 50  # Adjust the number of bins according to your preference

for i, (dataset, d_dir) in enumerate(DATASETS.items()):
    # Load sequences and positions for labels
    lengths, labels_positions = load_sequences(d_dir)
    
    # Plot histogram
    ax = axs[i]
    n, bins, patches = ax.hist(lengths, bins=n_bins, color='blue', alpha=0.7)
    ax.set_title(dataset)
    
    # Add counts to each bin
    for count, x, patch in zip(n, bins, patches):
        ax.text(x + 0.5, count, str(int(count)), ha='center', va='bottom')
    
    # Adding red number labels
    for label, pos in labels_positions.items():
        ax.text(pos, label, str(label), color='red', ha='center')
    
    # Optional: Additional formatting for readability
    ax.set_xlabel('Sequence Length')
    ax.set_ylabel('Frequency')
    ax.set_xlim([0, max(lengths) + 10])  # Adjust xlim to make sure labels fit

plt.tight_layout()
plt.show()
# %%

jyaacoub · 2024-04-05T16:12:48Z

Alphaflow low mem code:

Issue mentions long_sequence_inference=True for ModelConfig bjing2016/alphaflow#17

This is also mentioned in the code! in config.py:L460
Created a fork for this -> jyaacoub/alphaflow@e7c0ab3

Outcome

This just barely output 2 additional proteins before running into OOM again.

jyaacoub/MutDTA#89

jyaacoub · 2024-05-06T15:20:36Z

Solution

See commit jyaacoub/alphaflow@b93e289 for predict_deepspeed.py

To summarize how to get around memory issues there are 4 things that can be done. Listed in order of minimal impact to time complexity they are:

--low_pres: Using low precision for parameters (torch.bfloat vs torch.float32). This will also improve time complexity since there are also fewer calculations to be made at the risk of reduced accuracy.
--chunk_size: Chunking calculations on GPU by modules. Setting this to 4-2 is usually sufficient. on 2x a100 this and --low_pres would get us to sequence lengths of 1070-1167, respectively.
--cpu_offload: offload parameters immediately when they are not in use to the CPU.
--lma: low memory attention using Staats & Rabe's low-memory attention algorithm. This increases time complexity quite a bit and should only be used when absolutely necessary. For this to modify the default chunk_sizes we must change the source code for OpenFold (see OS.Environ for LMA default chunk_sizes aqlaboratory/openfold#435)

jyaacoub added the bug Something isn't working label Mar 27, 2024

jyaacoub added a commit to jyaacoub/alphaflow that referenced this issue Apr 10, 2024

fix(predict_FSDP): custom auto_wrap fn for FSDP on Alphaflow

52beaa7

jyaacoub/MutDTA#89

jyaacoub pinned this issue May 2, 2024

jyaacoub referenced this issue in jyaacoub/alphaflow May 6, 2024

feat(predict_deepspeed): --cpu_offload + help docs for args #89

b93e289

jyaacoub closed this as completed May 6, 2024

jyaacoub added the main hurdle/issue This is an issue that was a pivotal moment during the project. label Jun 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issues for AlphaFlow #89

Memory issues for AlphaFlow #89

jyaacoub commented Mar 27, 2024 •

edited

Loading

jyaacoub commented Mar 28, 2024 •

edited

Loading

jyaacoub commented Apr 5, 2024

jyaacoub commented May 6, 2024 •

edited

Loading

Memory issues for AlphaFlow #89

Memory issues for AlphaFlow #89

Comments

jyaacoub commented Mar 27, 2024 • edited Loading

jyaacoub commented Mar 28, 2024 • edited Loading

jyaacoub commented Apr 5, 2024

Alphaflow low mem code:

Outcome

jyaacoub commented May 6, 2024 • edited Loading

Solution

jyaacoub commented Mar 27, 2024 •

edited

Loading

jyaacoub commented Mar 28, 2024 •

edited

Loading

jyaacoub commented May 6, 2024 •

edited

Loading