Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] Lm_head slice -> split #12231

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

cyita
Copy link
Contributor

@cyita cyita commented Oct 18, 2024

Description

lm_head from slice linears -> split dq linears.

1. Why the change?

2. User API changes

3. Summary of the change

4. How to test?

  • N/A
  • Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
  • Application test
  • Document test
  • ...

5. New dependencies

  • New Python dependencies
    - Dependency1
    - Dependency2
    - ...
  • New Java/Scala dependencies and their license
    - Dependency1 and license1
    - Dependency2 and license2
    - ...

@@ -94,9 +94,9 @@ def optimize_llm_pre(model: torch.nn.Module, qtype, mixed_precision):
not cpu_lm_head:
# Do not split lm_head and use sym_int8 instead when mixed_precison is True
is_split = (not mixed_precision) and qtype == "sym_int4_rtn"
split_num = 14 if is_split else 1
split_num = 28 if is_split else 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make split_num=14 if use_split=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering whether we should expose the use_split option to users.

Y (make split_num=14 if use_split=False)

  • Pros: We can recommend the users set use_split=False to avoid the performance degradation on driver versions below 2715
  • Cons: It may need to be clarified for the user.

N (should update the readme recommended driver section)

  • Pros: This approach is more straightforward for users, and less configuration is needed.
  • Cons: Users must ensure they use driver version 2908 or higher. For MTL users, we would need to fall back to split_num=14 w/ slice.

Please let me know if you have any comments. @Oscilloscope98 @jason-dai

@cyita cyita requested a review from jason-dai October 21, 2024 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants