Optimized add fails fails at runtime with rank error when broadcasting

### 🐛 Describe the bug

When running add.Tensor on ET (optimized kernels) with input shapes (8) and (1,8), the operator call fails at runtime due to the rank of the output tensor (full error below). It runs successfully in eager mode.

When swapping the inputs such that (1,8) comes first, it runs successfully. It would appear there is a bug in the output size calculation when broadcasting. I haven't tested on portable specifically.

```py
import torch
from executorch.exir import to_edge_transform_and_lower
from executorch.runtime import Runtime

class Model(torch.nn.Module):
    def forward(self, x, y):
        return x + y

inputs = (
    torch.randn(8),
    torch.randn(1, 8),
)

et_program = to_edge_transform_and_lower(
    torch.export.export(Model(), inputs),
).to_executorch()

print(et_program.exported_program())

runtime = Runtime.get()
program = runtime.load_program(et_program.buffer)
method = program.load_method("forward")
method.execute(inputs)
```

Output:
```
[tensor_impl.cpp:78] Attempted to change the tensor rank which is immutable: old=2, new=1
[op_add_sub_impl.h:91] Check failed (error == Error::Ok): Failed to resize output tensor.
[method.cpp:1303] KernelCall failed at instruction 0:0 in operator aten::add.out: 0x12
```

### Versions

N/A

cc @larryliu0820 @manuelcandales

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized add fails fails at runtime with rank error when broadcasting #10959

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimized add fails fails at runtime with rank error when broadcasting #10959

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions