Skip to content

wip: use torch from a wheel #9340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rickeylev
Copy link
Contributor

@rickeylev rickeylev commented Jun 11, 2025

Using torch from a wheel will eliminate the ~20 minutes it takes to build from source.

A custom repo rule is used to replicate the structure that python_init_repositories
expects (a directory with a dist/ folder with a wheel) to add the whl file
into the requirements.txt files.

TODO:

  • Handle different builds (cuda, nightly, etc). There thousands of pytorch wheels.
    Unclear which combinations need to be pulled.

Work towards #9173

@bhavya01
Copy link
Collaborator

Thanks for looking into this. It will helpful for #9173

@rickeylev
Copy link
Contributor Author

Thanks! There's a couple questions I have that would help me figure out next steps:

  1. Can the ability to use a pytorch source checkout be removed entirely? It's easy to allow using a locally built wheel still.
  2. The ts_native_functions yaml and cpp files aren't in the wheel. Are the actually needed? From my local building, their absence hasn't resulted in any errors.
  3. Which versions and configurations of pytorch are needed? In order to use wheels, we'll need to specify them by URL (either via environment variables, or having a list in a .bzl file)

@iwknow
Copy link
Contributor

iwknow commented Jun 16, 2025

I am also curious about which versions and configurations of pytorch will it pins to. Currently, the pytorch/xla assumes the HEAD of main. some features rely on the "unreleased" code (e.g. #8632 (comment)). pining pytorch to any released version will break this feature because a hashable TreeSpec is not included in any release. I believe there are many instances of these making the choices of pytorch be very limited.

@bhavya01
Copy link
Collaborator

  1. Can the ability to use a pytorch source checkout be removed entirely? It's easy to allow using a locally built wheel still.
    I think that the lazy_tensor_generator.py needs the pytorch source. I am trying to remove it by copying the generated code locally to my repository and not have it codegen'd for now till I figure out a better solution.
  1. The ts_native_functions yaml and cpp files aren't in the wheel. Are the actually needed? From my local building, their absence hasn't resulted in any errors.
    Not sure if these are actually needed. I think that we just need these generated files https://github.com/bhavya01/playground/tree/main/torch_xla_generated_04232025/csrc
  1. Which versions and configurations of pytorch are needed? In order to use wheels, we'll need to specify them by URL (either via environment variables, or having a list in a .bzl file)
    We just need the CPU versions. For the nightly builds, we should use the nightly torch wheels. Each torch_xla stable release depends on the corresponding torch stable release.

@rickeylev
Copy link
Contributor Author

Thanks for the info, @bhavya01 and @iwknow

what versions of torch would this pin to

Whichever you want, for the most part. We could probably have it automatically discover the latest nightlys and use those (I'm pretty sure, anyways; a bit more complicated but I think I see a way to do that).

I am also curious about which versions and configurations of pytorch will it pins to. Currently, the pytorch/xla assumes the HEAD of main. some features rely on the "unreleased" code (e.g. #8632 (comment)). pining pytorch to any released version will break this feature because a hashable TreeSpec is not included in any release. I believe there are many instances of these making the choices of pytorch be very limited.

oh hm, this is concerning for a couple reasons.

First, it simply prevents saving the 20 minutes spent building torch in CI, which is about a third of the CI time. As a rule of thumb, waiting more than 10 minutes for presubmit checks is where productivity tends to plummet. The only other option is to fully bazelify torch itself -- possible in theory, but I'm skeptical of its feasibility in practice. Our experience building torch from source within Google has been hard and brittle (and my experience building torch from source outside google isn't much better); this is unsurprising given the size and complexity of torch.

Second, if you're developing against torch head, then you're approximately locked to torch's release schedule. Which, seems like thats what you want to do as a project? ("Each stable torch-xla release depends on the corresponding stable torch release").

I just want to make clear that this makes for a tough route. The problem with head is torch is very active, so it's constantly changing and almost every CI run has to, essentially, start from scratch. Similarly, you, as developers, have to pay that same large tax to your run-edit flow and are disincentivized from syncing, nor obviously know when you must update torch or when you shouldn't update torch. On the infra side, it makes hopping in to address things harder because setup is complex and volatile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants