关于多模态模型推理启用prefix cache #2823
Unanswered
zhuchen1109
asked this question in
Q&A
Replies: 1 comment 2 replies
-
vlm 的情况下,暂未支持 prefix caching |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用internvl-8b模型,因为我的prompt system会很长,我想开启来做推理加速,现在开启prefix cache会有些问题,因为图片token只是padding,很大概率被match住,我想问下,如果我修改代码来保证image部分不被match,是不是prefix cache对于我这个任务来说是有效的?
Beta Was this translation helpful? Give feedback.
All reactions