Skip to content

Commit

Permalink
Patch SGL Benchmark Test for Pytest Dashboard (#551)
Browse files Browse the repository at this point in the history
# Description

The nightly SGLang benchmark tests had first successful run last night:
https://github.com/nod-ai/shark-ai/actions/runs/11850084805/job/33024395622

And uploaded to dashboard successfully:
https://nod-ai.github.io/shark-ai/llm/sglang/?sort=result

However, since I used a mock to pipe the `bench_serving` script output
to `logger.info`, we ended up with results appearing in runner log, that
did not appear in dashboard:

```text
============ Serving Benchmark Result ============
INFO     __name__:mock.py:1189 Backend:                                 shortfin  
INFO     __name__:mock.py:1189 Traffic request rate:                    4         
INFO     __name__:mock.py:1189 Successful requests:                     10        
INFO     __name__:mock.py:1189 Benchmark duration (s):                  716.95    
INFO     __name__:mock.py:1189 Total input tokens:                      1960      
INFO     __name__:mock.py:1189 Total generated tokens:                  2774      
INFO     __name__:mock.py:1189 Total generated tokens (retokenized):    291       
INFO     __name__:mock.py:1189 Request throughput (req/s):              0.01      
INFO     __name__:mock.py:1189 Input token throughput (tok/s):          2.73      
INFO     __name__:mock.py:1189 Output token throughput (tok/s):         3.87      
INFO     __name__:mock.py:1189 ----------------End-to-End Latency----------------
INFO     __name__:mock.py:1189 Mean E2E Latency (ms):                   549509.25 
INFO     __name__:mock.py:1189 Median E2E Latency (ms):                 578828.23 
INFO     __name__:mock.py:1189 ---------------Time to First Token----------------
INFO     __name__:mock.py:1189 Mean TTFT (ms):                          327289.54 
INFO     __name__:mock.py:1[189](https://github.com/nod-ai/shark-ai/actions/runs/11850084805/job/33024395622#step:8:190) Median TTFT (ms):                        367482.31 
INFO     __name__:mock.py:1189 P99 TTFT (ms):                           367972.81 
INFO     __name__:mock.py:1189 -----Time per Output Token (excl. 1st token)------
INFO     __name__:mock.py:1189 Mean TPOT (ms):                          939.35    
INFO     __name__:mock.py:1189 Median TPOT (ms):                        886.13    
INFO     __name__:mock.py:1189 P99 TPOT (ms):                           2315.83   
INFO     __name__:mock.py:1189 ---------------Inter-token Latency----------------
INFO     __name__:mock.py:1189 Mean ITL (ms):                           732.59    
INFO     __name__:mock.py:1189 Median ITL (ms):                         729.43    
INFO     __name__:mock.py:1189 P99 ITL (ms):                            1477.77   
INFO     __name__:mock.py:1189 ==================================================
```

It also had a small bug that was obfuscated in runner/terminal logs, but
appeared in dashboard, which prevented jsonl files from being generated
after benchmark results were collected.

By fixing the bug in bench_serving input args, and logging resulting
jsonl file after each run, I was able to verify locally that the output
html contains the proper results:


![image](https://github.com/user-attachments/assets/6bc21b85-9579-42bf-9b64-bc26127ec696)
  • Loading branch information
stbaione authored Nov 15, 2024
1 parent 8d9a923 commit 2d3bf36
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 2 deletions.
1 change: 0 additions & 1 deletion app_tests/benchmark_tests/llm/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ def pre_process_model(request, tmp_path_factory):
settings = request.param["settings"]
batch_sizes = request.param["batch_sizes"]

tmp_dir = tmp_path_factory.mktemp("llm_benchmark_test")
mlir_path = tmp_dir / "model.mlir"
config_path = tmp_dir / "config.json"
vmfb_path = tmp_dir / "model.vmfb"
Expand Down
16 changes: 15 additions & 1 deletion app_tests/benchmark_tests/llm/sglang_benchmark_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,19 @@
TOKENIZER_DIR = Path("/data/llama3.1/8b/")


@pytest.mark.parametrize("request_rate", [1, 2, 4, 8, 16, 32])
def log_jsonl_result(file_path):
with open(file_path, "r") as file:
json_string = file.readline().strip()

json_data = json.loads(json_string)
for key, val in json_data.items():
logger.info(f"{key.upper()}: {val}")


@pytest.mark.parametrize(
"request_rate",
[1, 2, 4, 8, 16, 32],
)
@pytest.mark.parametrize(
"pre_process_model",
[
Expand Down Expand Up @@ -101,6 +113,8 @@ def test_sglang_benchmark_server(request_rate, pre_process_model):
benchmark_process.join()

logger.info(f"Benchmark run completed in {str(time.time() - start)} seconds")
logger.info("======== RESULTS ========")
log_jsonl_result(benchmark_args.output_file)
except Exception as e:
logger.info(e)

Expand Down
1 change: 1 addition & 0 deletions app_tests/benchmark_tests/llm/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ def as_namespace(self) -> Namespace:
dataset_name="sharegpt",
random_input_len=None,
random_output_len=None,
random_range_ratio=0.0,
dataset_path="",
sharegpt_output_len=None,
multi=False,
Expand Down

0 comments on commit 2d3bf36

Please sign in to comment.