Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set max concurrent requests #2961

Closed
wants to merge 2 commits into from

Conversation

AllentDan
Copy link
Collaborator

No description provided.

@AllentDan
Copy link
Collaborator Author

With this setting:

============ Serving Benchmark Result ============
Backend:                                 lmdeploy  
Traffic request rate:                    inf       
Successful requests:                     8405      
Benchmark duration (s):                  160.24    
Total input tokens:                      1950646   
Total generated tokens:                  1677697   
Total generated tokens (retokenized):    1677977   
Request throughput (req/s):              52.45     
Input token throughput (tok/s):          12173.05  
Output token throughput (tok/s):         10469.70  
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   83107.15  
Median E2E Latency (ms):                 84316.44  
---------------Time to First Token----------------
Mean TTFT (ms):                          77769.83  
Median TTFT (ms):                        79269.43  
P99 TTFT (ms):                           149355.97 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          182.14    
Median TPOT (ms):                        24.02     
P99 TPOT (ms):                           4690.68   
---------------Inter-token Latency----------------
Mean ITL (ms):                           325.74    
Median ITL (ms):                         154.95    
P99 ITL (ms):                            2165.01   
==================================================

Without this setting:

============ Serving Benchmark Result ============
Backend:                                 lmdeploy  
Traffic request rate:                    inf       
Successful requests:                     8005      
Benchmark duration (s):                  181.26    
Total input tokens:                      1854522   
Total generated tokens:                  1601078   
Total generated tokens (retokenized):    1591855   
Request throughput (req/s):              44.16     
Input token throughput (tok/s):          10231.09  
Output token throughput (tok/s):         8832.88   
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   95447.55  
Median E2E Latency (ms):                 99336.90  
---------------Time to First Token----------------
Mean TTFT (ms):                          90024.57  
Median TTFT (ms):                        89196.52  
P99 TTFT (ms):                           174165.09 
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          333.14    
Median TPOT (ms):                        0.00      
P99 TPOT (ms):                           11715.11  
---------------Inter-token Latency----------------
Mean ITL (ms):                           539.17    
Median ITL (ms):                         0.01      
P99 ITL (ms):                            3571.25

@AllentDan AllentDan closed this Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant