[train] set auto_transfer cuda device #26819

matthewdeng · 2022-07-21T02:29:01Z

Signed-off-by: Matthew Deng [email protected]

Why are these changes needed?

This sets the CUDA Stream on the correct device (and not the default one) when calling train.torch.prepare_data_loader(auto_transfer=True).

Repro

import subprocess

from torch.utils.data import DataLoader
from torchvision import datasets

from ray import train
from ray.air.config import ScalingConfig
from ray.train.torch import TorchTrainer


def train_func(config):
    training_data = datasets.FashionMNIST(
        root="/tmp/data_fashion_mnist",
        train=True,
        download=True,
    )
    train_dataloader = DataLoader(training_data)
    train_dataloader = train.torch.prepare_data_loader(train_dataloader, auto_transfer=True)
    subprocess.run(["nvidia-smi"])


if __name__ == "__main__":
    trainer = TorchTrainer(
        train_loop_per_worker=train_func,
        scaling_config=ScalingConfig(
            num_workers=4,
            use_gpu=True,
        ),
    )
    result = trainer.fit()

Before:

(BaseWorkerMixin pid=4197) +-----------------------------------------------------------------------------+
(BaseWorkerMixin pid=4197) | Processes:                                                                  |
(BaseWorkerMixin pid=4197) |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
(BaseWorkerMixin pid=4197) |        ID   ID                                                   Usage      |
(BaseWorkerMixin pid=4197) |=============================================================================|
(BaseWorkerMixin pid=4197) |    0   N/A  N/A     47105      C                                    1281MiB |
(BaseWorkerMixin pid=4197) |    0   N/A  N/A     47106      C                                    1281MiB |
(BaseWorkerMixin pid=4197) |    0   N/A  N/A     47107      C                                    1281MiB |
(BaseWorkerMixin pid=4197) |    0   N/A  N/A     47109      C                                    1281MiB |
(BaseWorkerMixin pid=4197) +-----------------------------------------------------------------------------+

After:

(BaseWorkerMixin pid=5604) +-----------------------------------------------------------------------------+
(BaseWorkerMixin pid=5604) | Processes:                                                                  |
(BaseWorkerMixin pid=5604) |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
(BaseWorkerMixin pid=5604) |        ID   ID                                                   Usage      |
(BaseWorkerMixin pid=5604) |=============================================================================|
(BaseWorkerMixin pid=5604) |    0   N/A  N/A     14123      C                                    1281MiB |
(BaseWorkerMixin pid=5604) |    1   N/A  N/A     14124      C                                    1281MiB |
(BaseWorkerMixin pid=5604) |    2   N/A  N/A     14125      C                                    1281MiB |
(BaseWorkerMixin pid=5604) |    3   N/A  N/A     14126      C                                    1281MiB |
(BaseWorkerMixin pid=5604) +-----------------------------------------------------------------------------+

Related issue number

Closes #26707

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Matthew Deng <[email protected]>

JiahaoYao · 2022-07-21T04:39:07Z

Hi @matthewdeng , could u print out the cuda memory?

https://discuss.pytorch.org/t/how-to-check-the-gpu-memory-being-used/131220

say

cuda_mem = [torch.cuda.memory_allocated(i)/1024/1024/1024) for i in range(4)] 
assert min(cuda_mem) == max(cuda_mem) and cuda_memo[0] > 0

JiahaoYao · 2022-07-21T04:39:31Z

this feature is coool 🚀

JiahaoYao · 2022-07-21T04:40:08Z

more reference here: https://stackoverflow.com/questions/58216000/get-total-amount-of-free-gpu-memory-and-available-using-pytorch

JiahaoYao · 2022-07-21T04:46:29Z

i mean in the first case

cuda_memo[0] = 4x1281, cuda_memo[1] = 0, ....

amogkam

Ah this is a great find!

Do you think we should treat this optimization as experimental and disable by default until we can more rigorously test it?

Signed-off-by: Matthew Deng <[email protected]>

matthewdeng · 2022-07-21T05:39:38Z

@amogkam hah I was actually thinking the same thing, updated!

Signed-off-by: Matthew Deng <[email protected]>

krfricke

LGTM - this should solve the benchmark script GPU util issue, right?

matthewdeng · 2022-07-21T14:39:31Z

@krfricke yep if you're referring to #26707!

This sets the CUDA Stream on the correct device (and not the default one) when calling train.torch.prepare_data_loader(auto_transfer=True). Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: Rohan138 <[email protected]>

This sets the CUDA Stream on the correct device (and not the default one) when calling train.torch.prepare_data_loader(auto_transfer=True). Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: Stefan van der Kleij <[email protected]>

[train] set auto_transfer cuda device

1165af9

Signed-off-by: Matthew Deng <[email protected]>

matthewdeng assigned richardliaw, amogkam and krfricke Jul 21, 2022

matthewdeng marked this pull request as ready for review July 21, 2022 04:24

amogkam approved these changes Jul 21, 2022

View reviewed changes

mark experimental

f3dc0c5

Signed-off-by: Matthew Deng <[email protected]>

matthewdeng added 2 commits July 20, 2022 23:06

add unit test

d83d410

Signed-off-by: Matthew Deng <[email protected]>

fix test

a3e7a42

Signed-off-by: Matthew Deng <[email protected]>

krfricke approved these changes Jul 21, 2022

View reviewed changes

amogkam merged commit 728e2b3 into ray-project:master Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train] set auto_transfer cuda device #26819

[train] set auto_transfer cuda device #26819

matthewdeng commented Jul 21, 2022 •

edited

Loading

JiahaoYao commented Jul 21, 2022

JiahaoYao commented Jul 21, 2022

JiahaoYao commented Jul 21, 2022

JiahaoYao commented Jul 21, 2022 •

edited

Loading

amogkam left a comment

matthewdeng commented Jul 21, 2022

krfricke left a comment

matthewdeng commented Jul 21, 2022

[train] set auto_transfer cuda device #26819

[train] set auto_transfer cuda device #26819

Conversation

matthewdeng commented Jul 21, 2022 • edited Loading

Why are these changes needed?

Related issue number

Checks

JiahaoYao commented Jul 21, 2022

JiahaoYao commented Jul 21, 2022

JiahaoYao commented Jul 21, 2022

JiahaoYao commented Jul 21, 2022 • edited Loading

amogkam left a comment

Choose a reason for hiding this comment

matthewdeng commented Jul 21, 2022

krfricke left a comment

Choose a reason for hiding this comment

matthewdeng commented Jul 21, 2022

matthewdeng commented Jul 21, 2022 •

edited

Loading

JiahaoYao commented Jul 21, 2022 •

edited

Loading