-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sequence length axis in tensor trim #723
Use sequence length axis in tensor trim #723
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we add such models with different sequence length dimension ID to GHA CI validation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we keep stateful mode after CB merge? If yes, then I think it make sense to add tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, CB is already merged and we keep stateful mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests can be added separately
const std::map<std::string, size_t> model_type_to_seq_len_axis{ | ||
{"chatglm", 0}, | ||
{"llama", 2}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilya-lavrenov , @Wovchena , This approach is not reliable, chatglm3 and chatglm2 have seq_len_axis=0
but glm-4-9b-chat has seq_len_axis=2
. All of them have same model_type='chatglm'
.
I'll try to detect seq_len_axis
by comparing kv tensor shape dimensions with generated tokens length.
Removed from merge queue due to: #723 (comment) |
trimm_tensor