Patch SGL Benchmark Test for Pytest Dashboard (#551)

# Description The nightly SGLang benchmark tests had first successful run last night: https://github.com/nod-ai/shark-ai/actions/runs/11850084805/job/33024395622 And uploaded to dashboard successfully: https://nod-ai.github.io/shark-ai/llm/sglang/?sort=result However, since I used a mock to pipe the `bench_serving` script output to `logger.info`, we ended up with results appearing in runner log, that did not appear in dashboard: ```text ============ Serving Benchmark Result ============ INFO __name__:mock.py:1189 Backend: shortfin INFO __name__:mock.py:1189 Traffic request rate: 4 INFO __name__:mock.py:1189 Successful requests: 10 INFO __name__:mock.py:1189 Benchmark duration (s): 716.95 INFO __name__:mock.py:1189 Total input tokens: 1960 INFO __name__:mock.py:1189 Total generated tokens: 2774 INFO __name__:mock.py:1189 Total generated tokens (retokenized): 291 INFO __name__:mock.py:1189 Request throughput (req/s): 0.01 INFO __name__:mock.py:1189 Input token throughput (tok/s): 2.73 INFO __name__:mock.py:1189 Output token throughput (tok/s): 3.87 INFO __name__:mock.py:1189 ----------------End-to-End Latency---------------- INFO __name__:mock.py:1189 Mean E2E Latency (ms): 549509.25 INFO __name__:mock.py:1189 Median E2E Latency (ms): 578828.23 INFO __name__:mock.py:1189 ---------------Time to First Token---------------- INFO __name__:mock.py:1189 Mean TTFT (ms): 327289.54 INFO __name__:mock.py:1[189](https://github.com/nod-ai/shark-ai/actions/runs/11850084805/job/33024395622#step:8:190) Median TTFT (ms): 367482.31 INFO __name__:mock.py:1189 P99 TTFT (ms): 367972.81 INFO __name__:mock.py:1189 -----Time per Output Token (excl. 1st token)------ INFO __name__:mock.py:1189 Mean TPOT (ms): 939.35 INFO __name__:mock.py:1189 Median TPOT (ms): 886.13 INFO __name__:mock.py:1189 P99 TPOT (ms): 2315.83 INFO __name__:mock.py:1189 ---------------Inter-token Latency---------------- INFO __name__:mock.py:1189 Mean ITL (ms): 732.59 INFO __name__:mock.py:1189 Median ITL (ms): 729.43 INFO __name__:mock.py:1189 P99 ITL (ms): 1477.77 INFO __name__:mock.py:1189 ================================================== ``` It also had a small bug that was obfuscated in runner/terminal logs, but appeared in dashboard, which prevented jsonl files from being generated after benchmark results were collected. By fixing the bug in bench_serving input args, and logging resulting jsonl file after each run, I was able to verify locally that the output html contains the proper results: ![image](https://github.com/user-attachments/assets/6bc21b85-9579-42bf-9b64-bc26127ec696)
nod-ai · Nov 15, 2024 · 2d3bf36 · 2d3bf36
1 parent 8d9a923
commit 2d3bf36
Show file tree

Hide file tree

Showing 3 changed files with 16 additions and 2 deletions.
diff --git a/app_tests/benchmark_tests/llm/conftest.py b/app_tests/benchmark_tests/llm/conftest.py
@@ -21,7 +21,6 @@ def pre_process_model(request, tmp_path_factory):
     settings = request.param["settings"]
     batch_sizes = request.param["batch_sizes"]
 
-    tmp_dir = tmp_path_factory.mktemp("llm_benchmark_test")
     mlir_path = tmp_dir / "model.mlir"
     config_path = tmp_dir / "config.json"
     vmfb_path = tmp_dir / "model.vmfb"

diff --git a/app_tests/benchmark_tests/llm/sglang_benchmark_test.py b/app_tests/benchmark_tests/llm/sglang_benchmark_test.py
@@ -38,7 +38,19 @@
 TOKENIZER_DIR = Path("/data/llama3.1/8b/")
 
 
-@pytest.mark.parametrize("request_rate", [1, 2, 4, 8, 16, 32])
+def log_jsonl_result(file_path):
+    with open(file_path, "r") as file:
+        json_string = file.readline().strip()
+
+    json_data = json.loads(json_string)
+    for key, val in json_data.items():
+        logger.info(f"{key.upper()}: {val}")
+
+
+@pytest.mark.parametrize(
+    "request_rate",
+    [1, 2, 4, 8, 16, 32],
+)
 @pytest.mark.parametrize(
     "pre_process_model",
     [
@@ -101,6 +113,8 @@ def test_sglang_benchmark_server(request_rate, pre_process_model):
             benchmark_process.join()
 
         logger.info(f"Benchmark run completed in {str(time.time() - start)} seconds")
+        logger.info("======== RESULTS ========")
+        log_jsonl_result(benchmark_args.output_file)
     except Exception as e:
         logger.info(e)
 

diff --git a/app_tests/benchmark_tests/llm/utils.py b/app_tests/benchmark_tests/llm/utils.py
@@ -37,6 +37,7 @@ def as_namespace(self) -> Namespace:
             dataset_name="sharegpt",
             random_input_len=None,
             random_output_len=None,
+            random_range_ratio=0.0,
             dataset_path="",
             sharegpt_output_len=None,
             multi=False,