Skip to content

Commit

Permalink
[Grok-1] 1. upload moe configuration file for moe kernel optimization… (
Browse files Browse the repository at this point in the history
#193)

* [Grok-1] 1. upload moe configuration file for moe kernel optimization 2. support "--num-scheduler-steps" in benchmark_latency.py

* [Grok-1] 1. upload moe configuration file for moe kernel optimization 2. add copy of benchmark_latency.py to support "--num-scheduler-steps"

* [Grok-1] add option num-scheduler-steps in benchmark_latency.py
  • Loading branch information
kkHuang-amd authored Sep 18, 2024
1 parent 54e0441 commit 40581f4
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 4 deletions.
6 changes: 6 additions & 0 deletions benchmarks/benchmark_latency.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def main(args: argparse.Namespace):
distributed_executor_backend=args.distributed_executor_backend,
otlp_traces_endpoint=args.otlp_traces_endpoint,
enable_prefix_caching=args.enable_prefix_caching,
num_scheduler_steps=args.num_scheduler_steps,
)

sampling_params = SamplingParams(
Expand Down Expand Up @@ -279,5 +280,10 @@ def run_to_completion(profile_dir: Optional[str] = None):
type=str,
default=None,
help='Target URL to which OpenTelemetry traces will be sent.')
parser.add_argument(
"--num-scheduler-steps",
type=int,
default=1,
help="Maximum number of forward steps per scheduler call.")
args = parser.parse_args()
main(args)
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
{
"8": {
"BLOCK_SIZE_M": 16,
"BLOCK_SIZE_N": 128,
"BLOCK_SIZE_N": 64,
"BLOCK_SIZE_K": 256,
"GROUP_SIZE_M": 1,
"num_warps": 8,
"num_warps": 4,
"num_stages": 0,
"waves_per_eu": 0,
"waves_per_eu": 1,
"matrix_instr_nonkdim": 16,
"kpack": 2
"kpack": 1
}
}

0 comments on commit 40581f4

Please sign in to comment.