You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When calculating the bandwidth with TP in get_latency_fwd_per_tp_comm and get_latency_fwd_per_layer_shared_dp_comm, the calculation defaults to intra-node BW in the former and the latter depends on a magic number 8 which I assume is referring to NUM_GPUS_PER_NODE.
Expected behavior
For get_latency_fwd_per_tp_comm, it should use get_intra_node_bandwidth when tp_size <= NUM_GPUS_PER_NODE and get_inter_node_bandwidth otherwise.
For get_latency_fwd_per_layer_shared_dp_comm, the magic number 8 should be replaced with NUM_GPUS_PER_NODE.
Looking at training, the tp_size <= NUM_GPUS_PER_NODE seems more like an enforcement than suggestion, should it be checked for infer as well?
Describe the bug
When calculating the bandwidth with TP in
get_latency_fwd_per_tp_comm
andget_latency_fwd_per_layer_shared_dp_comm
, the calculation defaults to intra-node BW in the former and the latter depends on a magic number8
which I assume is referring toNUM_GPUS_PER_NODE
.llm-analysis/llm_analysis/analysis.py
Lines 1221 to 1223 in d841e40
llm-analysis/llm_analysis/analysis.py
Lines 1247 to 1250 in d841e40
llm-analysis/llm_analysis/constant.py
Line 37 in d841e40
Expected behavior
For
get_latency_fwd_per_tp_comm
, it should useget_intra_node_bandwidth
whentp_size <= NUM_GPUS_PER_NODE
andget_inter_node_bandwidth
otherwise.For
get_latency_fwd_per_layer_shared_dp_comm
, the magic number8
should be replaced withNUM_GPUS_PER_NODE
.Looking at
training
, thetp_size <= NUM_GPUS_PER_NODE
seems more like an enforcement than suggestion, should it be checked forinfer
as well?llm-analysis/llm_analysis/analysis.py
Lines 2695 to 2699 in d841e40
Additional context
I'd be more than happy to provide a PR if the report is valid
Minor Issue
A default for
mlp_gated_linear_units
is not set when it hits the first if and misses the second. Can be reproduced withpython3 -m llm_analysis.analysis infer -m meta-llama/Llama-3.1-405b
llm-analysis/llm_analysis/config.py
Lines 216 to 221 in d841e40
The text was updated successfully, but these errors were encountered: