Default num_canonical_nodes to an even multiple of num_physical_nodes

Not sure of the problematic math, but `get_partitions` will error out if `num_canonical_nodes / num_physical_nodes` is not a whole number. This could be resolved by making the default conditional, i.e 
```python
pn=num_physical_nodes
num_canonical_nodes = num_canonical_nodes or 120 // pn * pn + pn
```

Example I saw when attempting to train a 350M gpt example on 6 nodes:
```python
get_partitions(
    num_samples=364672,
    num_canonical_nodes=128,
    num_physical_nodes=6,
    ranks_per_node=4,
    workers_per_rank=1,
    batch_size=6
)
# =>ValueError: cannot reshape array of size 364672 into shape (6)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Default num_canonical_nodes to an even multiple of num_physical_nodes #215

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Default num_canonical_nodes to an even multiple of num_physical_nodes #215

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions