Skip to content

Commit

Permalink
Fixed comm parser issue
Browse files Browse the repository at this point in the history
Summary:
This DIFF is to fix the following two comm parser issue:
1. process_group:init support both u_id and backend_id
2. record_param_comms has different number of input.

Reviewed By: shengbao-zheng

Differential Revision: D56091619

fbshipit-source-id: 58e12a515b17150ee68557fc6b4ad729e1614d49
  • Loading branch information
shengfukevin authored and facebook-github-bot committed Apr 16, 2024
1 parent 0a07342 commit c8e3f2f
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions train/comms/pt/commsTraceParser.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ def _parseExecutionTrace(
break

for pg in pgObj:
backendId = pg["backend_id"]
backendId = pg["uid"] if "uid" in pg else pg["backend_id"]
ranks = pg["ranks"]
if isinstance(ranks, list):
pgId = int(pg["pg_name"])
Expand All @@ -256,7 +256,7 @@ def _parseExecutionTrace(
for node in in_trace.nodes.values():
if node.name == "record_param_comms":
shift = (
0 if len(node.inputs) == 8 else 1
0 if len(node.inputs) == 8 or len(node.inputs) == 10 else 1
) # wait/barrier ops do not have an input tensor (len=7), shift index one over
newComm = commsArgs()
newComm.id = node.id
Expand Down

0 comments on commit c8e3f2f

Please sign in to comment.