-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
长input下coredump #746
Comments
@lzhangzz hi,你这能复现吗? 如果复现不了,我可以把prompt发你哈 |
应该是有个request太长被reject了导致出现了空batch,可以试试 #747 |
@lzhangzz 目前不再复现了 |
@lzhangzz |
这看起来是history+prompt长度超过session_len,直接reject了 |
直接reject,finish_reason应该填length吧,但实际返回的finish_reason是none |
@lzhangzz 应该不是你这修改的原因,我回滚了还是有报错,可能是我哪边改错了。。。 |
具体还要看turbomind的log,里面算的可能会有点不同。 还有种情况是在特别长的context下模型可能会直接输出eoa,这还要看你现在NTK alpha是怎么算的 |
查到原因了。。 我把kv cache占比调低,导致session_len被截短了 |
你好,我使用main分支较新的代码,#715 ,同时cherrypick了一个更新的commit #738 。 起了1个http client串行请求,prompt长度1k-8k。
出现coredump:
看起来不是OOM。
gdb core file结果:
对应现在github main分支代码这一行: https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/models/llama/LlamaBatch.cc#L488C2-L488C2
The text was updated successfully, but these errors were encountered: