Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snippets] SplitDimensionM: heuristic update #28180

Conversation

v-Golubev
Copy link
Contributor

@v-Golubev v-Golubev commented Dec 23, 2024

Details:

  • SplitDimensionM heuristics are divided into several logic parts
  • Added new heuristic for big shapes which minimizes m_kernel

Tickets:

  • N/A

@v-Golubev v-Golubev requested review from a team as code owners December 23, 2024 10:15
@v-Golubev v-Golubev requested review from tsavina and removed request for a team December 23, 2024 10:15
@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: docs OpenVINO documentation labels Dec 23, 2024
@v-Golubev v-Golubev force-pushed the vg/snippets/split_m_update_heuristic branch 6 times, most recently from 205e16c to 78a43ee Compare December 23, 2024 19:20
@v-Golubev
Copy link
Contributor Author

@a-sidorova @IvanNovoselov could you please take a look? Thanks in advance

return splited;
}

std::pair<size_t, size_t> SplitDimensionM::compute_aggressive_heuristic(size_t batch_dim, size_t m_dim, size_t optimal_parallelism_work_amount) {
constexpr size_t min_kernel_m = 32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In second case in compute_ideal_cases_heuristic min_kernel_m is 64 while this is 32 here.
What's about to use always 64 and set as const static attribute of the class?
Or is there difference between heuristics and we really need to have smaller min_kernel_m in aggressive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it's better to have one min_kernel_m value. But I think that it should be 32, not 64. 64 value was set to empirically avoid the cases in which external repacking feature doesn't work, and overheads on repacking duplication inside kernel are bigger than benefits from the splitting. If external repacking works (and it seems like it will work in all cases after tokenization adjustments), we can easily lower min_kernel_m for compute_ideal_cases_heuristic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your suggestion: min_kernel_m is now a static class member, it equals to 32

Comment on lines 152 to 158
// If M dim is big enough, aggressive heuristic is used for kernel_m minimization.
// For smaller M dim, conservative heuristic is used to preserve old behavour.
const bool big_m_dim = m_dim >= 4000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we also can support the case with small M.
If batch < optimal_parallelism_work_amount and M is quite small (for example, M < 64), nothing is needed to be updated and splitted - let's execute as it is.

I don't insist to do it in this PR but I have some models (for example, action-recognition or levit) with small values of batch and M where this pass is applied and there will be M = 4 or even M = 1. And this action leads to perf degrdation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea. In third-party brgemm heuristics, I saw that minimal allowed m_kernel is 16. Probably we can take this into account in our heuristics.
But it's also important is that SplitDimensionM::split is used in CPU callback (via can_be_optimized), so if it returns false, the MHA tokenization doesn't happen. So another question that we need to answer is whether we need to even tokenize such MHA's or not

Copy link
Contributor

@a-sidorova a-sidorova Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should tokenize these cases too but just without splitting dimension M. These smalls kernels will be quickly processed anyway with Snippets.
At the moment, I suggest to come back when we have time.
Could you please add the comment with this small discussion to the ticket? To remind all needed things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I changed SplitDimensionM::split behavior based on perf experiments. Now, there is no limitations on M dim for heuristics. We just call heuristics in the following order: split_ideally -> split_minimize_kernel_wa -> split_fallback_increase_parallel_wa. The next function is called only if previous one didn't finish successfully
  2. Based on the ticket's description, I can conclude that this PR already solves the described problem (I added a corresponding test case to split_dim_m.cpp), so I created another ticket for the small MHAs tokenization task: CVS-160154

@v-Golubev v-Golubev force-pushed the vg/snippets/split_m_update_heuristic branch from 78a43ee to 4242663 Compare December 27, 2024 13:57
Comment on lines 152 to 158
// If M dim is big enough, aggressive heuristic is used for kernel_m minimization.
// For smaller M dim, conservative heuristic is used to preserve old behavour.
const bool big_m_dim = m_dim >= 4000;
Copy link
Contributor

@a-sidorova a-sidorova Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should tokenize these cases too but just without splitting dimension M. These smalls kernels will be quickly processed anyway with Snippets.
At the moment, I suggest to come back when we have time.
Could you please add the comment with this small discussion to the ticket? To remind all needed things.

@v-Golubev v-Golubev force-pushed the vg/snippets/split_m_update_heuristic branch from 4242663 to 321e549 Compare January 2, 2025 12:34
@v-Golubev
Copy link
Contributor Author

The changes will be merged into master within PR #28179

@v-Golubev v-Golubev closed this Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: docs OpenVINO documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants