GPU idioms with cross-validation #133

jeffjennings · 2023-02-05T00:47:04Z

Is your feature request related to a problem or opportunity? Please describe.

Ensure the routines called in a full fit pipeline can be run on a (single) GPU without error.
Ensure these routines don't overload VRAM. E.g. in the cross-val loop, Ian noted "the eventual model that we should adopt is to send one dataset to the GPU at a time, do the training for that, send the results back tothe CPU, and then repeat for the next dataset subset. Because test_train_datasets is now a list of a bunch of K copies of the dataset, there's a good chance that it won't fit on GPU memory all at once especially if the original dataset is large."

Describe the solution you'd like

Add .to(<device>) calls in the fit runner script, only where necessary and starting as early in the pipeline as possible, to integrate GPU support.
Add GPU-specific tests for core routines.

Additional context
A further improvement would be to handle multiple GPU workflows.

The text was updated successfully, but these errors were encountered:

jeffjennings · 2023-02-05T01:05:17Z

Similar to #126, but resolving this issue may not fully resolve that one.

jeffjennings · 2023-02-28T16:06:10Z

The default in datasets.py is now to have the default device arg in a method be device: torch.device = torch.device("cpu"). Is this best? I think it would be better to have the default device be the current device, as it was before, as otherwise tensors on the gpu can get silently passed to the cpu; and the new version requires additional explicit device specifications in function/class calls throughout a workflow to ensure a given tensor stays on the same device.

iancze · 2023-02-28T16:20:59Z

I think that makes sense to me. @kadri-nizam , is there any reason that you chose CPU as the default?

kadri-nizam · 2023-02-28T16:48:07Z

Jeff is right; the default argument should be None instead of CPU. I misunderstood the copying of the input tensors in the constructor when I added the default argument.

@jeffjennings Shall I revert the changes or will you push it in a PR you're working on?

jeffjennings · 2023-02-28T17:00:18Z

If it's convenient for you to do, that would be great! If not, I'll have time later this afternoon.

Reverting default device in dataset.py

iancze · 2023-02-28T17:55:52Z

The device args should be fixed by #170 thanks Kadri!

jeffjennings · 2023-02-28T18:06:05Z

Yes, thank you @kadri-nizam !

iancze · 2023-04-04T21:34:31Z

The GriddedDataset now inherits from nn.Module thanks to #186 , so the .to component of this issue should be easier/solved when under a normal optimization loop. This issue asks about GPU setup in the context of Cross-validation, though, which will be more complicated. So I changed the title to be a bit more specific.

I think what type of cross-validation will depend on whether the dataset is a GriddedDataset or in loose form. The question is then, can we write a DataLoader that uses either of these products to come up with an efficient idiom for doing a cross-validation loop and sending the components to

a (single) GPU
for a list of multiple GPUs

Do we need to think about using Ray Tune or Lightning to solve this issue?

For a GriddedDataset, changing the k-fold is really just an issue of changing the mask which is used to subset the full gridded dataset cube.

iancze · 2023-12-22T15:06:07Z

@jeffjennings is this still an issue for you after the updates to GriddedDataset and the idiom of .to()? If not, looking to close ☕

jeffjennings · 2023-12-24T00:22:20Z

Feel free to close!

iancze · 2023-12-24T01:18:21Z

Thanks :)

jeffjennings added this to the v0.1.4 milestone Feb 5, 2023

kadri-nizam mentioned this issue Feb 28, 2023

Reverting default device in dataset.py #170

Merged

iancze added a commit that referenced this issue Feb 28, 2023

Merge pull request #170 from kadri-nizam/#133-default_device

3d91ad7

Reverting default device in dataset.py

jeffjennings modified the milestones: v0.2.0, UML redesign Apr 4, 2023

iancze mentioned this issue Apr 4, 2023

Dataset handling and refactoring #126

Closed

iancze changed the title ~~Codebase GPU compatibility~~ GPU idioms with cross-validation Apr 4, 2023

iancze closed this as completed Dec 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU idioms with cross-validation #133

GPU idioms with cross-validation #133

jeffjennings commented Feb 5, 2023 •

edited

Loading

jeffjennings commented Feb 5, 2023

jeffjennings commented Feb 28, 2023 •

edited

Loading

iancze commented Feb 28, 2023

kadri-nizam commented Feb 28, 2023

jeffjennings commented Feb 28, 2023

iancze commented Feb 28, 2023

jeffjennings commented Feb 28, 2023

iancze commented Apr 4, 2023 •

edited

Loading

iancze commented Dec 22, 2023

jeffjennings commented Dec 24, 2023

iancze commented Dec 24, 2023

GPU idioms with cross-validation #133

GPU idioms with cross-validation #133

Comments

jeffjennings commented Feb 5, 2023 • edited Loading

jeffjennings commented Feb 5, 2023

jeffjennings commented Feb 28, 2023 • edited Loading

iancze commented Feb 28, 2023

kadri-nizam commented Feb 28, 2023

jeffjennings commented Feb 28, 2023

iancze commented Feb 28, 2023

jeffjennings commented Feb 28, 2023

iancze commented Apr 4, 2023 • edited Loading

iancze commented Dec 22, 2023

jeffjennings commented Dec 24, 2023

iancze commented Dec 24, 2023

jeffjennings commented Feb 5, 2023 •

edited

Loading

jeffjennings commented Feb 28, 2023 •

edited

Loading

iancze commented Apr 4, 2023 •

edited

Loading