Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make_distribute_tutorial_work_in_google_colab #3022

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

venkatram-dev
Copy link

@venkatram-dev venkatram-dev commented Aug 31, 2024

Fixes #ISSUE_NUMBER

#3003
#3009

Description

for issue 3003

mp.set_start_method("spawn") works in local (mac)
But using that does not work well in google colab, since it has some restrictions.
So added below code snippet to address both.

    if "google.colab" in sys.modules:
        print("Running in Google Colab")
        mp.get_context("spawn")
    else:
        mp.set_start_method("spawn")

Please note mp.set_start_method("fork") will also work in google colab. But it will work only if the code is run once.
Upon rerunning, it will fail. mp.get_context("spawn") allows multiple reruns with our restarting the session.

Also added clarification for issue 3009

`-  reading from ``tensor`` after ``dist.irecv()`` will result in undefined behaviour,
until ``req.wait()`` has been executed.`

Since the change is in the same file (and is little), I am trying to address both issues together.
I am happy to make 2 PRs if needed.

Checklist

  • The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
  • Only one issue is addressed in this pull request
  • Labels from the issue that this PR is fixing are added to this pull request
  • No unnecessary issues are included into this pull request.

cc @wconstab @osalpekar @H-Huang @kwen2501

Copy link

pytorch-bot bot commented Aug 31, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3022

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@venkatram-dev
Copy link
Author

@svekars , Please review.

Note : Since the change is in the same file (and is little), I am trying to address both issues together.
I am happy to make 2 PRs if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants