Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU idioms with cross-validation #133

Closed
jeffjennings opened this issue Feb 5, 2023 · 11 comments
Closed

GPU idioms with cross-validation #133

jeffjennings opened this issue Feb 5, 2023 · 11 comments

Comments

@jeffjennings
Copy link
Contributor

jeffjennings commented Feb 5, 2023

Is your feature request related to a problem or opportunity? Please describe.

  • Ensure the routines called in a full fit pipeline can be run on a (single) GPU without error.

  • Ensure these routines don't overload VRAM. E.g. in the cross-val loop, Ian noted "the eventual model that we should adopt is to send one dataset to the GPU at a time, do the training for that, send the results back tothe CPU, and then repeat for the next dataset subset. Because test_train_datasets is now a list of a bunch of K copies of the dataset, there's a good chance that it won't fit on GPU memory all at once especially if the original dataset is large."

Describe the solution you'd like

  • Add .to(<device>) calls in the fit runner script, only where necessary and starting as early in the pipeline as possible, to integrate GPU support.

  • Add GPU-specific tests for core routines.

Additional context
A further improvement would be to handle multiple GPU workflows.

@jeffjennings jeffjennings added this to the v0.1.4 milestone Feb 5, 2023
@jeffjennings
Copy link
Contributor Author

Similar to #126, but resolving this issue may not fully resolve that one.

@jeffjennings
Copy link
Contributor Author

jeffjennings commented Feb 28, 2023

The default in datasets.py is now to have the default device arg in a method be device: torch.device = torch.device("cpu"). Is this best? I think it would be better to have the default device be the current device, as it was before, as otherwise tensors on the gpu can get silently passed to the cpu; and the new version requires additional explicit device specifications in function/class calls throughout a workflow to ensure a given tensor stays on the same device.

@iancze
Copy link
Collaborator

iancze commented Feb 28, 2023

I think that makes sense to me. @kadri-nizam , is there any reason that you chose CPU as the default?

@kadri-nizam
Copy link
Contributor

Jeff is right; the default argument should be None instead of CPU. I misunderstood the copying of the input tensors in the constructor when I added the default argument.

@jeffjennings Shall I revert the changes or will you push it in a PR you're working on?

@jeffjennings
Copy link
Contributor Author

If it's convenient for you to do, that would be great! If not, I'll have time later this afternoon.

iancze added a commit that referenced this issue Feb 28, 2023
Reverting default device in dataset.py
@iancze
Copy link
Collaborator

iancze commented Feb 28, 2023

The device args should be fixed by #170 thanks Kadri!

@jeffjennings
Copy link
Contributor Author

Yes, thank you @kadri-nizam !

@jeffjennings jeffjennings modified the milestones: v0.2.0, UML redesign Apr 4, 2023
@iancze iancze changed the title Codebase GPU compatibility GPU idioms with cross-validation Apr 4, 2023
@iancze
Copy link
Collaborator

iancze commented Apr 4, 2023

The GriddedDataset now inherits from nn.Module thanks to #186 , so the .to component of this issue should be easier/solved when under a normal optimization loop. This issue asks about GPU setup in the context of Cross-validation, though, which will be more complicated. So I changed the title to be a bit more specific.

I think what type of cross-validation will depend on whether the dataset is a GriddedDataset or in loose form. The question is then, can we write a DataLoader that uses either of these products to come up with an efficient idiom for doing a cross-validation loop and sending the components to

  • a (single) GPU
  • for a list of multiple GPUs

Do we need to think about using Ray Tune or Lightning to solve this issue?

For a GriddedDataset, changing the k-fold is really just an issue of changing the mask which is used to subset the full gridded dataset cube.

@iancze
Copy link
Collaborator

iancze commented Dec 22, 2023

@jeffjennings is this still an issue for you after the updates to GriddedDataset and the idiom of .to()? If not, looking to close ☕

@jeffjennings
Copy link
Contributor Author

Feel free to close!

@iancze
Copy link
Collaborator

iancze commented Dec 24, 2023

Thanks :)

@iancze iancze closed this as completed Dec 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants