[Snippets] SplitDimensionM: heuristic update #28180

v-Golubev · 2024-12-23T10:15:49Z

Details:

SplitDimensionM heuristics are divided into several logic parts
Added new heuristic for big shapes which minimizes m_kernel

Tickets:

N/A

v-Golubev · 2024-12-23T20:51:59Z

@a-sidorova @IvanNovoselov could you please take a look? Thanks in advance

src/common/snippets/src/pass/split_dimension_m.cpp

src/common/snippets/include/snippets/pass/split_dimension_m.hpp

a-sidorova · 2024-12-26T05:57:08Z

src/common/snippets/src/pass/split_dimension_m.cpp

    return splited;
 }

+std::pair<size_t, size_t> SplitDimensionM::compute_aggressive_heuristic(size_t batch_dim, size_t m_dim, size_t optimal_parallelism_work_amount) {
+    constexpr size_t min_kernel_m = 32;


In second case in compute_ideal_cases_heuristic min_kernel_m is 64 while this is 32 here.
What's about to use always 64 and set as const static attribute of the class?
Or is there difference between heuristics and we really need to have smaller min_kernel_m in aggressive?

I agree that it's better to have one min_kernel_m value. But I think that it should be 32, not 64. 64 value was set to empirically avoid the cases in which external repacking feature doesn't work, and overheads on repacking duplication inside kernel are bigger than benefits from the splitting. If external repacking works (and it seems like it will work in all cases after tokenization adjustments), we can easily lower min_kernel_m for compute_ideal_cases_heuristic

I applied your suggestion: min_kernel_m is now a static class member, it equals to 32

a-sidorova · 2024-12-26T06:04:28Z

src/common/snippets/src/pass/split_dimension_m.cpp

+    // If M dim is big enough, aggressive heuristic is used for kernel_m minimization.
+    // For smaller M dim, conservative heuristic is used to preserve old behavour.
+    const bool big_m_dim = m_dim >= 4000;


By the way, we also can support the case with small M.
If batch < optimal_parallelism_work_amount and M is quite small (for example, M < 64), nothing is needed to be updated and splitted - let's execute as it is.

I don't insist to do it in this PR but I have some models (for example, action-recognition or levit) with small values of batch and M where this pass is applied and there will be M = 4 or even M = 1. And this action leads to perf degrdation.

I like your idea. In third-party brgemm heuristics, I saw that minimal allowed m_kernel is 16. Probably we can take this into account in our heuristics.
But it's also important is that SplitDimensionM::split is used in CPU callback (via can_be_optimized), so if it returns false, the MHA tokenization doesn't happen. So another question that we need to answer is whether we need to even tokenize such MHA's or not

I think we should tokenize these cases too but just without splitting dimension M. These smalls kernels will be quickly processed anyway with Snippets.
At the moment, I suggest to come back when we have time.
Could you please add the comment with this small discussion to the ticket? To remind all needed things.

I changed SplitDimensionM::split behavior based on perf experiments. Now, there is no limitations on M dim for heuristics. We just call heuristics in the following order: split_ideally -> split_minimize_kernel_wa -> split_fallback_increase_parallel_wa. The next function is called only if previous one didn't finish successfully

Based on the ticket's description, I can conclude that this PR already solves the described problem (I added a corresponding test case to split_dim_m.cpp), so I created another ticket for the small MHAs tokenization task: CVS-160154

a-sidorova · 2025-01-02T10:20:23Z

src/common/snippets/src/pass/split_dimension_m.cpp

+    // If M dim is big enough, aggressive heuristic is used for kernel_m minimization.
+    // For smaller M dim, conservative heuristic is used to preserve old behavour.
+    const bool big_m_dim = m_dim >= 4000;


I think we should tokenize these cases too but just without splitting dimension M. These smalls kernels will be quickly processed anyway with Snippets.
At the moment, I suggest to come back when we have time.
Could you please add the comment with this small discussion to the ticket? To remind all needed things.

v-Golubev · 2025-01-08T10:40:35Z

The changes will be merged into master within PR #28179

v-Golubev added the do_not_review label Dec 23, 2024

v-Golubev requested review from a team as code owners December 23, 2024 10:15

v-Golubev requested review from tsavina and removed request for a team December 23, 2024 10:15

github-actions bot added category: CPU OpenVINO CPU plugin category: docs OpenVINO documentation labels Dec 23, 2024

v-Golubev force-pushed the vg/snippets/split_m_update_heuristic branch 6 times, most recently from 205e16c to 78a43ee Compare December 23, 2024 19:20

v-Golubev removed the do_not_review label Dec 23, 2024

v-Golubev assigned a-sidorova and IvanNovoselov Dec 23, 2024

IvanNovoselov reviewed Dec 24, 2024

View reviewed changes

src/common/snippets/src/pass/split_dimension_m.cpp Outdated Show resolved Hide resolved

src/common/snippets/include/snippets/pass/split_dimension_m.hpp Outdated Show resolved Hide resolved

a-sidorova reviewed Dec 26, 2024

View reviewed changes

v-Golubev force-pushed the vg/snippets/split_m_update_heuristic branch from 78a43ee to 4242663 Compare December 27, 2024 13:57

IvanNovoselov approved these changes Dec 31, 2024

View reviewed changes

a-sidorova approved these changes Jan 2, 2025

View reviewed changes

v-Golubev force-pushed the vg/snippets/split_m_update_heuristic branch from 4242663 to 321e549 Compare January 2, 2025 12:34

v-Golubev added 2 commits January 2, 2025 13:37

SplitDimensionM: heuristic update

321e549

Tests adjustment

81560c8

This was referenced Jan 6, 2025

[Snippets][CPU] Enabled dynamic INT8,BF16,FP16 MHA tokenization #28276

Merged

[Snippets] SplitDimensionM: heuristic update a-sidorova/openvino#267

Closed

v-Golubev requested a review from a team as a code owner January 7, 2025 11:11

Final changes

a6c4d00

v-Golubev closed this Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Snippets] SplitDimensionM: heuristic update #28180

[Snippets] SplitDimensionM: heuristic update #28180

v-Golubev commented Dec 23, 2024 •

edited

Loading

v-Golubev commented Dec 23, 2024

a-sidorova Dec 26, 2024

v-Golubev Dec 27, 2024

v-Golubev Jan 7, 2025

a-sidorova Dec 26, 2024

v-Golubev Dec 27, 2024

a-sidorova Jan 2, 2025 •

edited

Loading

v-Golubev Jan 7, 2025

a-sidorova Jan 2, 2025 •

edited

Loading

v-Golubev commented Jan 8, 2025

[Snippets] SplitDimensionM: heuristic update #28180

[Snippets] SplitDimensionM: heuristic update #28180

Conversation

v-Golubev commented Dec 23, 2024 • edited Loading

Details:

Tickets:

v-Golubev commented Dec 23, 2024

a-sidorova Dec 26, 2024

Choose a reason for hiding this comment

v-Golubev Dec 27, 2024

Choose a reason for hiding this comment

v-Golubev Jan 7, 2025

Choose a reason for hiding this comment

a-sidorova Dec 26, 2024

Choose a reason for hiding this comment

v-Golubev Dec 27, 2024

Choose a reason for hiding this comment

a-sidorova Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

v-Golubev Jan 7, 2025

Choose a reason for hiding this comment

a-sidorova Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

v-Golubev commented Jan 8, 2025

v-Golubev commented Dec 23, 2024 •

edited

Loading

a-sidorova Jan 2, 2025 •

edited

Loading

a-sidorova Jan 2, 2025 •

edited

Loading