Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support yarn in turbomind backend #2519

Merged
merged 10 commits into from
Nov 4, 2024
Merged

support yarn in turbomind backend #2519

merged 10 commits into from
Nov 4, 2024

Conversation

irexyc
Copy link
Collaborator

@irexyc irexyc commented Sep 26, 2024

@lvhan028
Copy link
Collaborator

@irexyc may check the pytorch engine as well. Please support it if pytorch engine has no such feature

@lvhan028 lvhan028 requested review from lzhangzz and grimoire October 7, 2024 13:59
@lvhan028 lvhan028 added the enhancement New feature or request label Oct 7, 2024
Comment on lines 122 to 125
auto freq = inv_freq_[i / 2];
float alpha = ((idx + i) / 2 - yarn_ramp_min) / (yarn_ramp_max - yarn_ramp_min);
alpha = fmaxf(0.f, fminf(1.f, alpha));
inv_freq_[i / 2] = freq * (1 - alpha + alpha / factor);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these expensive divisions can be done in host code. See how the llama3 rope is implemented just a few lines above.

@@ -53,13 +53,16 @@ class ModelConfig:
class AttentionConfig:
rotary_embedding: int = 128
rope_theta: float = 10000.0
attention_factor: float = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the partial_rotary_factor changes the dim of default, dynamic ntk and yarn

@@ -236,6 +239,10 @@ def model_info(self):
else llama3_scaling_type
if scaling_type == 'dynamic':
use_dynamic_ntk = 1
attention_factor = model_arg['rope_scaling'].get(
'attention_factor', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/huggingface/transformers/blob/f2c388e3f946862f657acc1e21b272ec946fc66c/src/transformers/modeling_rope_utils.py#L198

attention_factor = config.rope_scaling.get("attention_factor")
    if attention_factor is None:
        attention_factor = 0.1 * math.log(factor) + 1.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code had this logic. And add this when converting the model as well.

@lvhan028 lvhan028 merged commit e557f05 into InternLM:main Nov 4, 2024
7 of 9 checks passed
AllentDan pushed a commit to AllentDan/lmdeploy that referenced this pull request Nov 13, 2024
* support yarn in turbomind backend

* update qwen2 model to support yarn rope in pytorch backend

* use mul

* refactor export rope params

* support partial_rotary_factor

* fix lint

* fix rope type

* Revert "support partial_rotary_factor"

This reverts commit cc4cce7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants