Skip to content

Commit

Permalink
sync comm_ops for dynamo (#1712)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: #1712

## Problem
Torchrec comm_ops do pipelining logic for forwards and backwards using Awaitables and Custom Autograd Functions.

Custom Autograd Functions are not fully supported by dynamo and have a lot of limitations, pipelining logic is not traceable right now.

Legacy torch.distributed collectives are not traceable by dynamo

## Solution

1/ Adding syncronous path without pipelining logic for dynamo compilation.
NoWait()

2/ Using traceable functional_collectives instead of legacy collectives

3/ functional_collectives do not have Autograd formulas in pytorch as they are not differentiable.

 adding Autograd formulas with BC check in torchrec/distributed/comm_ops.py
The dispatch is called below Autograd and dynamo will see them as a leafs in the graph and will not trace through

4/ dist.distributed_c10d._get_default_group() is not traceable right now => test specifies PG explicitly.
Changed rank/world_size from dist.get_world_size to pg.size() and pg.rank() as they are traceable by PGVariable.

5/ Syntactic changes for dynamo:
Dynamo does not support collection generators => replace with for each range() etc.
SymInts do not support divmod => replacing with // and %

Reviewed By: joshuadeng

Differential Revision: D53707387

fbshipit-source-id: 6c4febf68471cb71da65973d1e4ff6e82eeb94d4
  • Loading branch information
Ivan Kobzarev authored and facebook-github-bot committed Feb 29, 2024
1 parent 521aae1 commit 75772b9
Show file tree
Hide file tree
Showing 3 changed files with 1,036 additions and 65 deletions.
Loading

0 comments on commit 75772b9

Please sign in to comment.