More efficient sampling from KroneckerMultiTaskGP #2460

slishak-PX · 2024-08-05T15:52:19Z

Motivation

import torch
from botorch.models import KroneckerMultiTaskGP

n_inputs = 10
n_tasks = 4
n_train = 2048
n_test = 1
device = torch.device("cuda:0")

train_x = torch.randn(n_train, n_inputs, dtype=torch.float64, device=device)
train_y = torch.randn(n_train, n_tasks, dtype=torch.float64, device=device)

test_x = torch.randn(n_test, n_inputs, dtype=torch.float64, device=device)

gp = KroneckerMultiTaskGP(train_x, train_y)

posterior = gp.posterior(test_x)
posterior.rsample(torch.Size([256, 1]))

The final line requires allocation of 128GB of GPU memory, because of the call to torch.cholesky_solve with B shaped (256, 1, 8192, 1) and L shaped (8192, 8192).

By moving the largest batch dimension to the final position, we should achieve a more efficient operation.

Also fix docstring for MultitaskGPPosterior.

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Passes unit tests (specifically test_multitask.py).

Benchmarking results:

Related PRs

N/A

facebook-github-bot · 2024-08-05T15:52:25Z

Hi @slishak-PX!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

codecov · 2024-08-10T00:53:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.98%. Comparing base (e29e30a) to head (6417409).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2460   +/-   ##
=======================================
  Coverage   99.98%   99.98%           
=======================================
  Files         193      193           
  Lines       17062    17072   +10     
=======================================
+ Hits        17059    17069   +10     
  Misses          3        3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Balandat

Thanks for putting this up. Would love to see some benchmarks for this. Ultimately, this is something that ideally should be handled upstream in gpytorch, could you add a comment to that extent?

botorch/posteriors/multitask.py

Balandat · 2024-09-22T18:44:03Z

@slishak-PX have you had a chance to run some benchmarks on this?

slishak-PX · 2024-09-22T21:46:10Z

@Balandat sorry, I will prioritise this as soon as we have the CLA signed, hopefully in the next week or two

Co-authored-by: Max Balandat <[email protected]>

This reverts commit 3f6f697.

slishak-PX · 2024-09-30T17:29:14Z

Added benchmarking results. Uses the following code:

benchmark.py (run on the code in this PR, and BoTorch 0.12.0):

import pickle as pkl

import botorch
import torch
from tqdm import tqdm
from botorch.models import KroneckerMultiTaskGP

device = torch.device("cuda:0")


def get_data(n_inputs=10, n_tasks=4, n_train=128, n_test=1, seed=50):
    torch.manual_seed(seed)
    train_x = torch.randn(n_train, n_inputs, dtype=torch.float64, device=device)
    train_y = torch.randn(n_train, n_tasks, dtype=torch.float64, device=device)
    test_x = torch.randn(n_test, n_inputs, dtype=torch.float64, device=device)

    return train_x, train_y, test_x


def instantiate_and_sample(train_x, train_y, test_x, n_samples=1):
    with torch.no_grad():
        gp = KroneckerMultiTaskGP(train_x, train_y)
        posterior = gp.posterior(test_x)
        posterior.rsample(torch.Size([n_samples]))


def profile(func, *args, **kwargs):
    torch.cuda.reset_peak_memory_stats(device=device)
    m0 = torch.cuda.max_memory_allocated(device=device)

    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)

    start.record()
    func(*args, **kwargs)
    end.record()

    torch.cuda.synchronize()
    time = start.elapsed_time(end)

    m1 = torch.cuda.max_memory_allocated(device=device)
    torch.cuda.empty_cache()

    memory = (m1 - m0) / 1024**3

    return memory, time


if __name__ == "__main__":
    fname = botorch.__version__ + "_results.pkl"
    print(fname)

    n_tasks_list = [2, 4, 8]
    n_train_list = [32, 128, 512]
    n_test_list = [1, 8, 64]
    n_samples_list = [1, 4, 16, 64, 256]

    results = []
    for n_tasks in tqdm(n_tasks_list, desc="n_tasks"):
        for n_train in tqdm(n_train_list, leave=False, desc="n_train"):
            for n_test in tqdm(n_test_list, leave=False, desc="n_test"):
                train_x, train_y, test_x = get_data(
                    n_tasks=n_tasks, n_train=n_train, n_test=n_test
                )
                for n_samples in tqdm(n_samples_list, leave=False, desc="n_sample"):
                    memory = []
                    time = []
                    for i in range(10):
                        try:
                            m, t = profile(
                                instantiate_and_sample,
                                train_x,
                                train_y,
                                test_x,
                                n_samples,
                            )
                        except:
                            print("Failed!")
                            print(
                                {
                                    "n_tasks": n_tasks,
                                    "n_train": n_train,
                                    "n_test": n_test,
                                    "n_samples": n_samples,
                                }
                            )
                            raise

                        if i > 0:
                            memory.append(m)
                            time.append(t)

                    results.append(
                        {
                            "n_tasks": n_tasks,
                            "n_train": n_train,
                            "n_test": n_test,
                            "n_samples": n_samples,
                            "memory": memory,
                            "time": time,
                        }
                    )

    with open(fname, "wb") as f:
        pkl.dump(results, f)

Analysis notebook:

import pickle as pkl

import numpy as np
import pandas as pd
import plotly.express as px


with open("Unknown_results.pkl", "rb") as f:
    results = pkl.load(f)
df_new = pd.DataFrame(results)
df_new["version"] = "PR 2460"

with open("0.12.0_results.pkl", "rb") as f:
    results = pkl.load(f)
df_old = pd.DataFrame(results)
df_old["version"] = "BoTorch 0.12.0"

df = pd.concat([df_new, df_old])
df["memory_mean"] = df["memory"].apply(np.mean)
df["memory_std"] = df["memory"].apply(np.std)
df["time_mean"] = df["time"].apply(np.mean)
df["time_std"] = df["time"].apply(np.std)

px.line(
    df,
    x="n_samples",
    y="memory_mean",
    error_y="memory_std",
    facet_col="n_tasks",
    facet_row="n_test",
    color="n_train",
    line_dash="version",
    log_x=True,
    log_y=True,
    width=800,
    height=800,
)

px.line(
    df,
    x="n_samples",
    y="time_mean",
    error_y="time_std",
    facet_col="n_tasks",
    facet_row="n_test",
    color="n_train",
    line_dash="version",
    log_x=True,
    log_y=True,
    width=800,
    height=800,
)

facebook-github-bot · 2024-10-01T04:15:49Z

@Balandat has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

botorch/posteriors/multitask.py

Balandat

This is awesome, thanks a lot for contributing this change and the comprehensive benchmarks. Seems like there is no downside to using this from a perf perspective and the logic is also sufficiently straightforward so I'm not worried about tech debt.

The only ask I have before merging this in (besides fixing the flake8 lint) is to write a short unittest.

cc @jandylin, @SebastianAment re sampling from Kronecker structured GPs and interesting matrix solve efficiency gains...

botorch/posteriors/multitask.py

Co-authored-by: Max Balandat <[email protected]>

slishak-PX · 2024-10-01T08:47:05Z

You were absolutely right to ask for a unit test - the implementation was not entirely correct, although I think in the context of the use in MultitaskGPPosterior the error was inconsequential. Should all be corrected now. I've also re-run the benchmarks and there is no change.

test/posteriors/test_multitask.py

facebook-github-bot · 2024-10-01T14:55:12Z

@Balandat has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-01T18:42:51Z

@Balandat merged this pull request in 8924d1b.

Balandat · 2024-10-01T20:22:04Z

Many thanks for the contribution, @slishak-PX !

slishak-PX added 2 commits August 2, 2024 15:15

Improve memory usage of multitask posterior sampling

afcd13d

Update docs; fixes to pass unit tests

7bd66d8

Balandat reviewed Aug 24, 2024

View reviewed changes

botorch/posteriors/multitask.py Outdated Show resolved Hide resolved

botorch/posteriors/multitask.py Outdated Show resolved Hide resolved

botorch/posteriors/multitask.py Outdated Show resolved Hide resolved

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Sep 29, 2024

slishak-PX and others added 6 commits September 29, 2024 16:23

Merge branch 'main' into mtgp-memory

1c73439

Apply suggestions from code review

b22b140

Co-authored-by: Max Balandat <[email protected]>

Move permuted solve into its own helper function

3f6f697

Revert "Move permuted solve into its own helper function"

e0809cc

This reverts commit 3f6f697.

Try again (avoid circular import with utils

d63e7b0

Bug fix

0a3d586

slishak-PX marked this pull request as ready for review September 30, 2024 17:30

slishak-PX requested a review from Balandat September 30, 2024 17:30

Balandat reviewed Oct 1, 2024

View reviewed changes

botorch/posteriors/multitask.py Outdated Show resolved Hide resolved

Balandat reviewed Oct 1, 2024

View reviewed changes

botorch/posteriors/multitask.py Outdated Show resolved Hide resolved

slishak-PX and others added 2 commits October 1, 2024 08:56

Update botorch/posteriors/multitask.py

68ba15d

Co-authored-by: Max Balandat <[email protected]>

Add unit test, and correct bugs found

f3baebc

Balandat reviewed Oct 1, 2024

View reviewed changes

test/posteriors/test_multitask.py Outdated Show resolved Hide resolved

Fix import ordering

6417409

facebook-github-bot closed this in 8924d1b Oct 1, 2024

facebook-github-bot added the Merged label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More efficient sampling from KroneckerMultiTaskGP #2460

More efficient sampling from KroneckerMultiTaskGP #2460

slishak-PX commented Aug 5, 2024 •

edited

Loading

facebook-github-bot commented Aug 5, 2024

codecov bot commented Aug 10, 2024 •

edited

Loading

Balandat left a comment

Balandat commented Sep 22, 2024

slishak-PX commented Sep 22, 2024

slishak-PX commented Sep 30, 2024

facebook-github-bot commented Oct 1, 2024

Balandat left a comment

slishak-PX commented Oct 1, 2024

facebook-github-bot commented Oct 1, 2024

facebook-github-bot commented Oct 1, 2024

Balandat commented Oct 1, 2024

More efficient sampling from KroneckerMultiTaskGP #2460

More efficient sampling from KroneckerMultiTaskGP #2460

Conversation

slishak-PX commented Aug 5, 2024 • edited Loading

Motivation

Have you read the Contributing Guidelines on pull requests?

Test Plan

Related PRs

facebook-github-bot commented Aug 5, 2024

Action Required

Process

codecov bot commented Aug 10, 2024 • edited Loading

Codecov Report

Balandat left a comment

Choose a reason for hiding this comment

Balandat commented Sep 22, 2024

slishak-PX commented Sep 22, 2024

slishak-PX commented Sep 30, 2024

facebook-github-bot commented Oct 1, 2024

Balandat left a comment

Choose a reason for hiding this comment

slishak-PX commented Oct 1, 2024

facebook-github-bot commented Oct 1, 2024

facebook-github-bot commented Oct 1, 2024

Balandat commented Oct 1, 2024

slishak-PX commented Aug 5, 2024 •

edited

Loading

codecov bot commented Aug 10, 2024 •

edited

Loading