Skip to content

[BUG] NUM_GPUS_PER_NODE not respected in inference #30

Open
@averageFOSSenjoyer

Description

@averageFOSSenjoyer

Describe the bug
When calculating the bandwidth with TP in get_latency_fwd_per_tp_comm and get_latency_fwd_per_layer_shared_dp_comm, the calculation defaults to intra-node BW in the former and the latter depends on a magic number 8 which I assume is referring to NUM_GPUS_PER_NODE.

# assuming tp_size <= number of GPUs per node, thus using intra-node bandwidth
latency_per_all_reduce = (elems_per_all_reduce * dtype_bytes /
(self.get_intra_node_bandwidth() * 10**9))

# assuming tp and dp are preferred when sharding intra node, pp is only applied across nodes
# when (dp_size * tp_size) <= 8, the data parallel processes are within a node
bandwidth = self.get_intra_node_bandwidth() if (
dp_size * tp_size) <= 8 else self.get_inter_node_bandwidth()

NUM_GPUS_PER_NODE = 8 # number of GPUs per node

Expected behavior
For get_latency_fwd_per_tp_comm, it should use get_intra_node_bandwidth when tp_size <= NUM_GPUS_PER_NODE and get_inter_node_bandwidth otherwise.

For get_latency_fwd_per_layer_shared_dp_comm, the magic number 8 should be replaced with NUM_GPUS_PER_NODE.

Looking at training, the tp_size <= NUM_GPUS_PER_NODE seems more like an enforcement than suggestion, should it be checked for infer as well?

assert tp_size <= num_gpus_per_node, (
f"tp_size must be <= {num_gpus_per_node}(num_gpus_per_node), tensor"
" parallelism requires high communication bandwidth to be efficient"
" and is best kept within a single node where high bandwidth NVLink"
" is available.")

Additional context
I'd be more than happy to provide a PR if the report is valid

Minor Issue
A default for mlp_gated_linear_units is not set when it hits the first if and misses the second. Can be reproduced with

python3 -m llm_analysis.analysis infer -m meta-llama/Llama-3.1-405b

if ffn_embed_dim:
expansion_ratio = ffn_embed_dim / hidden_dim
if expansion_ratio == 3.5:
mlp_gated_linear_units = True
else:
mlp_gated_linear_units = False

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions