Matmul with DID loop split #3651

Priya2698 · 2024-12-28T05:37:51Z

This PR modifies the hasTrivialAllocationDomain to consider if the tensorview has a DID loop split. In this case, we compare the corresponding iterdomains for logical and allocation domain across all but the sharded logical axis.

Note: This does not guarantee that MatmulOp with non-trivial stride order will work for DID loop split. I suspect it will require some additional changes to the MatmulOp::evaluate method.

Priya2698 · 2024-12-28T06:09:38Z

!build

Priya2698 · 2024-12-28T06:16:34Z

!test

wujingyue · 2024-12-29T04:14:44Z

tests/python/test_multidevice.py

+
+    fd = Model(d, b, s, e)
+    out_tensors = fd.execute([inp_tensor, sharded_weight_tensor])
+    print(f"Output tensor: {out_tensors[0].shape}")


Suggested change

print(f"Output tensor: {out_tensors[0].shape}")

I'd avoid debug prints to stdout. If you intended to keep it, you may want to consider logging.debug and how to do it in a compatible way to pytorch.

wujingyue · 2024-12-29T04:22:18Z

tests/python/test_multidevice.py

+    expected_out_tensor = unsharded_out_tensor.view([b, s, d, e]).permute(2, 0, 1, 3)[
+        rank : rank + 1
+    ]


Can this be done with shard_tensor?

wujingyue · 2024-12-29T06:11:32Z

csrc/ir/utils.cpp

@@ -1313,9 +1314,37 @@ bool hasTrivialAllocationDomain(const TensorView* tv) {
  }
  const std::vector<IterDomain*>& alloc = tv->getMaybeAllocationDomain();
  const std::vector<IterDomain*>& logical = tv->getLogicalDomain();
-  return TensorDomain::noBroadcasts(TensorDomain::noReductions(logical)) ==
+  const auto alloc_no_red_bcast =


I'm unsure about changing hasTrivialAllocationDomain, which is used elsewhere. Philosophically, its name suggests it should check for the allocation being trivial and having a DID split is conceptually non-trivial.

More practically, how about using inferShapeOfOutput? I made it support DID loop split and it's used by KernelExecutor to allocate outputs. I think in MatmulOp::evaluate you can say something like

sizes, strides = inferShapeOfOutput(tv, ee); create a meta tensor of sizes and strides if that_meta_tensor.is_contiguous(): return the result of at::matmul ...

When you change the code to use at::matmul_out (which I still think is a change we should land sooner than later), you can further simplify MatmulOp::evaluate to:

sizes, strides = inferShapeOfOutput(tv, ee); out = at::empty_strided(sizes, strides, ...); at::matmul_out(...);

modify hasTrivialAllocationDomain for DID loop split case

ccffce6

Priya2698 requested review from samnordmann and wujingyue and removed request for samnordmann December 28, 2024 06:10

wujingyue reviewed Dec 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matmul with DID loop split #3651

Matmul with DID loop split #3651

Priya2698 commented Dec 28, 2024

Priya2698 commented Dec 28, 2024

Priya2698 commented Dec 28, 2024

wujingyue Dec 29, 2024

wujingyue Dec 29, 2024

wujingyue Dec 29, 2024

wujingyue Dec 29, 2024

Matmul with DID loop split #3651

Are you sure you want to change the base?

Matmul with DID loop split #3651

Conversation

Priya2698 commented Dec 28, 2024

Priya2698 commented Dec 28, 2024

Priya2698 commented Dec 28, 2024

wujingyue Dec 29, 2024

Choose a reason for hiding this comment

wujingyue Dec 29, 2024

Choose a reason for hiding this comment

wujingyue Dec 29, 2024

Choose a reason for hiding this comment

wujingyue Dec 29, 2024

Choose a reason for hiding this comment