-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable multi-GPU in object detection #33561
base: main
Are you sure you want to change the base?
Conversation
@qubvel Hi Pavel, even though this is draft (which is finished but want to make some code clean) I want you to review about overall idea :) cc. @amyeroberts |
Thanks for opening this PR @SangbumChoi! What I would propose is creating a new script e.g. Otherwise, I think the changes here all looks great :D |
@amyeroberts I found that this flatten issue happens all over the detection pipeline (e.g. detr, groundingdino, deformable-detr, etc..). I think it would be better to add additional argument at BTW, as you suggested I will seperate the file to |
Agreed! I think @qubvel was working on something to enable this in trainer |
Is there any progress on this issue? I would be much interested :) |
@daniel-bogdoll Thanks for the interest. I think you can use current script but I will plan to support amy's suggestion. |
[rank0]: File examples/pytorch/object-detection/run_object_detection_multi_gpu.py", line 216, in compute_metrics This happens because the loss is present in batch_logits as a dict. I am facing this error when trying to evaluate, did you also face any issue when trying to evaluate? I was able to fix this by modifying the compute metrics as:
|
What does this PR do?
Fixes #33525
This is the concept of the PR that fundamentally fixes the following multi-GPU circumstances error. The reason why I wrote as
concept
is because torchmetrics does not accept multi-gpu problem + also I want to make some code more clean.Main keypoint is that all the
Trainer
class accept theprediction
andlabels
as nested tensor (e.g. batch size : 4, num_gpu : 2 -> length of 8). However in order to calculate proper evaluation metric it should be shape asbatch_size
. + We should always encourage to useaccelerate
as default.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.