Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support eos_token list in turbomind #3044

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Jan 16, 2025

Motivation

MinLengthLogitsProcessor in transformers supports list of eos_token_id

Modification

diff --git a/benchmark/profile_throughput.py b/benchmark/profile_throughput.py
index 2e4d2a3b..9fc9605f 100644
--- a/benchmark/profile_throughput.py
+++ b/benchmark/profile_throughput.py
@@ -108,6 +108,7 @@ class Engine:
                 session_id,
                 input_ids=input_ids,
                 gen_config=GenerationConfig(max_new_tokens=output_seqlen,
+                                            min_new_tokens=output_seqlen - 1,
                                             temperature=temperature,
                                             top_p=top_p,
                                             top_k=top_k,
FT_NVTX=ON /mnt/141/2024.5.1/target-linux-x64/nsys profile -t cuda,nvtx,osrt,cudnn,cublas -o output -f true --stats true  python ../benchmark/profile_throughput.py -n 20000 /home/chenxin/ShareGPT_V3_unfiltered_cleaned_split.json /home/chenxin/Llama-3.2-1B-Instruct


 Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max (ns)  StdDev (ns)                                                  Name                                                
 --------  ---------------  ---------  --------  --------  --------  --------  -----------  ----------------------------------------------------------------------------------------------------
      0.0         45986089      16761    2743.6    2656.0      2272      3840        276.0  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, const int *, const int *, const int *, int,…

(sz=1)0.0         48248303      16797    2872.4    2816.0      2464      3808        237.3  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, int, const int *, const int *, int, const i…
(sz=2)0.0         45838054      16854    2719.7    2656.0      2240      3552        170.6  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, int, const int *, const int *, int, const i…
(sz=4)0.0         45971238      16826    2732.2    2688.0      2304      3584        167.5  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, int, const int *, const int *, int, const i…
(sz=8)0.0         45888590      16819    2728.4    2656.0      2240      3680        171.4  void turbomind::batchApplyMinLengthPenalty<float>(T1 *, int, const int *, const int *, int, const i…

@lvhan028 lvhan028 requested review from lvhan028 and lzhangzz January 20, 2025 03:40
@lvhan028
Copy link
Collaborator

Overall LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants