Migrate `torch_xla.device()` to `torch.device('xla')` #9253

ghpvnist · 2025-05-27T22:03:31Z

yaoshiang

approved pending successful ci/cd tests. Thank you for finally getting this fixed! As another issue, we should update all our docs to reflect this (the MDs not just the docstrings, which I think this PR covers). THANK YOU!!!

benchmarks/experiment_runner.py

ghpvnist · 2025-06-11T22:50:53Z

So, after some digging I realized that the semantics of torch.device('xla') is not the same as torch_xla.device() in many cases. See torch.device docs. torch.device('xla') returns the device, but without the ordinal present like xla:X, it defaults to torch.cuda.current_device() assuming for gpus. However, torch_xla.device() has different behavior depending on some conditions: 1) with XLA_USE_SPMD env var set, torch_xla.device() will always return xla:0 2) torch_xla.device() will otherwise return torch.device(torch_xla._XLAC._xla_get_default_device()) which can be xla:0 or another ordinal depending on single vs multi process context. 3) if called via torch_xla.device(X) (equivalent to torch.device('xla:X')), it'll return xla:X. That said, I don't think we can fully remove the torch_xla.device api in favor of e.g. torch.device('xla:X') because torch_xla.device is a wrapper with more functionality than torch.device api.

yaoshiang · 2025-06-12T19:38:12Z

Thanks for this good analysis. Clearly the APIs are not the same, but, do they have enough expressiveness to do the same thing? setting an ordinal for device number appears supported. The one area that does not appear supported is SPMD. I wonder if torch.device("gspmd") would be the better way to do it rather than torch_xla.device() which always returns "xla:0".

There is room to get this right. The closest I can think of is torch.device("meta"). similarly, "gspmd" might represent a virtual device. That said, if we are going to try to redo the gSPMD stuff to align with DTensor, we should just worry about it then.

ghpvnist · 2025-06-12T19:47:34Z

If you know which ordinal you are querying for, both have the same expressiveness. Apart from SPMD, torch.device(torch_xla._XLAC._xla_get_default_device()) cannot be expressed by torch.device api since it requires knowing which ordinal you want (the default will return a device with no ordinal). The torch_xla.device() by default in non-SPMD context will return both the device and the ordinal value without explicitly querying for it. This is where the semantic differs between the two apis.

yaoshiang · 2025-06-12T19:53:20Z

Thanks. I don't know what is the setter for torch_xla._XLAC._xla_get_default_device(), but the fact that is in _XLAC but not exposed properly outside of it is already a redflag this part of the API is incomplete...

I wonder if this would allow us to have a get and set default device that is idiomatic.

https://docs.pytorch.org/docs/stable/generated/torch.set_default_device.html

ghpvnist self-assigned this May 27, 2025

ghpvnist mentioned this pull request Jun 2, 2025

Migrate runtime.xla_device in favor of core.xla_model.xla_device #9200

Merged

ghpvnist force-pushed the 9252 branch from 87497c3 to f693e34 Compare June 2, 2025 22:19

yaoshiang approved these changes Jun 3, 2025

View reviewed changes

benchmarks/experiment_runner.py Show resolved Hide resolved

ghpvnist force-pushed the 9252 branch 5 times, most recently from 2a8f9e9 to 0c01883 Compare June 9, 2025 20:38

ghpvnist force-pushed the 9252 branch 8 times, most recently from b9ed0df to 2fd8ab0 Compare June 11, 2025 22:46

ghpvnist added 3 commits June 16, 2025 20:40

Migrate torch_xla.device() to torch.device('xla')

650f114

Deprecate API

c5c7bfd

Fix errors

e578dfe

ghpvnist force-pushed the 9252 branch from 2fd8ab0 to e578dfe Compare June 16, 2025 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrate `torch_xla.device()` to `torch.device('xla')` #9253

Migrate `torch_xla.device()` to `torch.device('xla')` #9253

Uh oh!

ghpvnist commented May 27, 2025 •

edited

Loading

Uh oh!

yaoshiang left a comment

Uh oh!

Uh oh!

ghpvnist commented Jun 11, 2025

Uh oh!

yaoshiang commented Jun 12, 2025

Uh oh!

ghpvnist commented Jun 12, 2025

Uh oh!

yaoshiang commented Jun 12, 2025

Uh oh!

Uh oh!

Migrate torch_xla.device() to torch.device('xla') #9253

Are you sure you want to change the base?

Migrate torch_xla.device() to torch.device('xla') #9253

Uh oh!

Conversation

ghpvnist commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaoshiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ghpvnist commented Jun 11, 2025

Uh oh!

yaoshiang commented Jun 12, 2025

Uh oh!

ghpvnist commented Jun 12, 2025

Uh oh!

yaoshiang commented Jun 12, 2025

Uh oh!

Uh oh!

Migrate `torch_xla.device()` to `torch.device('xla')` #9253

Migrate `torch_xla.device()` to `torch.device('xla')` #9253

ghpvnist commented May 27, 2025 •

edited

Loading