-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support yarn in turbomind backend #2519
Conversation
@irexyc may check the pytorch engine as well. Please support it if pytorch engine has no such feature |
auto freq = inv_freq_[i / 2]; | ||
float alpha = ((idx + i) / 2 - yarn_ramp_min) / (yarn_ramp_max - yarn_ramp_min); | ||
alpha = fmaxf(0.f, fminf(1.f, alpha)); | ||
inv_freq_[i / 2] = freq * (1 - alpha + alpha / factor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these expensive divisions can be done in host code. See how the llama3 rope is implemented just a few lines above.
@@ -53,13 +53,16 @@ class ModelConfig: | |||
class AttentionConfig: | |||
rotary_embedding: int = 128 | |||
rope_theta: float = 10000.0 | |||
attention_factor: float = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/huggingface/transformers/blob/f2c388e3f946862f657acc1e21b272ec946fc66c/src/transformers/modeling_rope_utils.py#L189
There is another parameter "partial_rotary_factor"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the partial_rotary_factor
changes the dim of default
, dynamic ntk
and yarn
@@ -236,6 +239,10 @@ def model_info(self): | |||
else llama3_scaling_type | |||
if scaling_type == 'dynamic': | |||
use_dynamic_ntk = 1 | |||
attention_factor = model_arg['rope_scaling'].get( | |||
'attention_factor', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attention_factor = config.rope_scaling.get("attention_factor")
if attention_factor is None:
attention_factor = 0.1 * math.log(factor) + 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous code had this logic. And add this when converting the model as well.
This reverts commit cc4cce7.
* support yarn in turbomind backend * update qwen2 model to support yarn rope in pytorch backend * use mul * refactor export rope params * support partial_rotary_factor * fix lint * fix rope type * Revert "support partial_rotary_factor" This reverts commit cc4cce7.
Motivation
support yarn in torbumind backend
https://github.com/huggingface/transformers/blob/f2c388e3f946862f657acc1e21b272ec946fc66c/src/transformers/modeling_rope_utils.py#L163