Skip to content

Using GPU to train the model #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
penny9287 opened this issue Jan 12, 2021 · 7 comments
Closed

Using GPU to train the model #257

penny9287 opened this issue Jan 12, 2021 · 7 comments

Comments

@penny9287
Copy link

Hello, I'm really appreciate your work. But now I wonder how to use GPU to train the model. There are always mistakes when I use the CUDA device. Thanks a lot.
device = torch.device('cuda' if use_cuda else 'cpu')

@jettify
Copy link
Owner

jettify commented Jan 13, 2021

Please provide exception and optimizer you are using. Unfortunately free CI does not have GPU to test on :(

@penny9287
Copy link
Author

I used the optimizer of Shampoo. And other optimizers also can not work when using GPU.
Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same Error occurs, No graph saved Traceback (most recent call last): File "shampoo_gpu.py", line 181, in <module> main() File "shampoo_gpu.py", line 165, in main writer.add_graph(model, images) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/utils/tensorboard/writer.py", line 724, in add_graph self._get_file_writer().add_graph(graph(model, input_to_model, verbose)) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 292, in graph raise e File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/utils/tensorboard/_pytorch_graph.py", line 286, in graph trace = torch.jit.trace(model, args) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/jit/_trace.py", line 742, in trace _module_class, File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/jit/_trace.py", line 940, in trace_module _force_outplace, File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 725, in _call_impl result = self._slow_forward(*input, **kwargs) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 709, in _slow_forward result = self.forward(*input, **kwargs) File "shampoo_gpu.py", line 21, in forward x = self.conv1(x) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 725, in _call_impl result = self._slow_forward(*input, **kwargs) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/nn/modules/module.py", line 709, in _slow_forward result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward return self._conv_forward(input, self.weight) File "/root/anaconda3/envs/optimizer/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

@jettify
Copy link
Owner

jettify commented Jan 14, 2021

Are you running mnist example? I removed line

        # visualize NN computation graph
        writer.add_graph(model, images)

it has nothing to do with training but failing with gpu tensors

@jettify
Copy link
Owner

jettify commented Jan 14, 2021

Shampoo still may be faulty since it does SVD on CPU regardless if GPU is present. I tested pid/yogi/diffgrad/appolo/adabelief all of them work as expected

@penny9287
Copy link
Author

Thanks a lot. I removed the above line and the GPU works!

@jettify jettify closed this as completed Jan 15, 2021
@alimoezzi
Copy link

@jettify Any plan to update Shampoo to does SVD on GPU?

@alimoezzi
Copy link

@jettify #439 This is a fix for shampoo optimizer by upgrading its implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants