[LLM] [NPU] StaticLLMPipeline: Compiler DQ update #1515

smirnov-alexey · 2025-01-09T12:01:57Z

Depends on openvinotoolkit/openvino#28316
Related to openvinotoolkit/openvino#28343

src/cpp/src/llm_pipeline_static.cpp

dmatveev · 2025-01-13T15:07:19Z

src/cpp/src/llm_pipeline_static.cpp

+        if (npudesc.has_value() && npudesc->compiler_dq) {
+            config.emplace("NPUW_DQ_FULL", "NO");
+            config.emplace("NPU_COMPILER_DYNAMIC_QUANTIZATION", true);
+        }


Why do you enable it for CW compressed models only? The whole point of this feature was to make it for group-quantized prefill.

dmatveev · 2025-01-13T15:08:00Z

src/cpp/src/llm_pipeline_static.cpp

+    if (npudesc.has_value() && npudesc->compiler_dq) {
+        config.emplace("NPUW_DQ_FULL", "NO");
+        config.emplace("NPU_COMPILER_DYNAMIC_QUANTIZATION", true);
+    }


You certainly don't need to make it twice, just do this in the shared section of the config (so it goes to both prefill and kvcache).

Also, when you switch to compiler DQ, you'll have to disable DCOFF since otherwise DCOFF will be applied to this model and it will run in FP16.

lmielick · 2025-01-13T15:21:43Z

src/cpp/src/llm_pipeline_static.cpp

-    if (std::find(device_caps.begin(), device_caps.end(),
-                  "COMPILER_DYNAMIC_QUANTIZATION") != device_caps.end()) {
+    const auto supported_properties = core.get_property("NPU", ov::supported_properties);
+    if (std::find(supported_properties.begin(), supported_properties.end(),


This looks like a sub-string search. Perhaps there is some OV utility to tokenize the list first?

Seems already simple enough

How it is a sub-string search, looks more like a container. At least std::find works on it (this auto sometimes makes things harder to understand)

TolyaTalamanov

Propose to encapsulate it like this:

void enable_dq() {
    if (npudesc.has_value() && npudesc->compiler_dq) {
        "NPUW_DQ_FULL": "NO"
        "NPU_COMPILER_DYNAMIC_QUANTIZATION": true
    } else {
        "NPUW_DQ": "YES"
    }
}

We need to call enable_dq in all places where previously was just NPUW_DQ: YES

As for the logic to enable dq or not, it was the following:

For prefill model DQ is enabled only when model is channel-wise compressed
For generation model DQ is always enabled.
@dmatveev Could you confirm this, please?

If it should be enabled for both CW and GQ models we can call enable_dq() function uncoditionaly for both configs.

TolyaTalamanov · 2025-01-13T16:38:03Z

Propose to encapsulate it like this:
void enable_dq() {
    if (npudesc.has_value() && npudesc->compiler_dq) {
        "NPUW_DQ_FULL": "NO"
        "NPU_COMPILER_DYNAMIC_QUANTIZATION": true
    } else {
        "NPUW_DQ": "YES"
    }
}
We need to call enable_dq in all places where previously was just NPUW_DQ: YES

As for the logic to enable dq or not, it was the following:

For prefill model DQ is enabled only when model is channel-wise compressed

For generation model DQ is always enabled.
@dmatveev Could you confirm this, please?

If it should be enabled for both CW and GQ models we can call enable_dq() function uncoditionaly for both configs.

Should be done here:

openvino.genai/src/cpp/src/llm_pipeline_static.cpp

Lines 502 to 511 in fa76cf7

    
           ov::AnyMap get_default_common_config(const std::shared_ptr<ov::Model>& model) { 
        
               auto config = get_baseline_common_config(); 
        
               const char* npu_l0 = std::getenv("DISABLE_OPENVINO_GENAI_NPU_L0"); 
        
               if (npu_l0 && std::atoi(npu_l0) == 1) { 
        
                   config.emplace("NPUW_WEIGHTS_BANK_ALLOC", "CPU"); 
        
               } else { 
        
                   config.emplace("NPUW_FUNCALL_FOR_ALL", "YES"); 
        
               } 
        
               return config; 
        
           }

dmatveev · 2025-01-13T18:25:43Z

@TolyaTalamanov I updated our internal guide (the one you've used before)

dmatveev

It gets more and more complex.

If it works, let's keep it like this, then propagate to the LLMCompiledModel (down to NPUW), remove it here, and refactor/rethink there.

@TolyaTalamanov please be sure to provide your review here as well.

dmatveev · 2025-01-13T18:27:33Z

src/cpp/src/llm_pipeline_static.cpp

-    if (std::find(device_caps.begin(), device_caps.end(),
-                  "COMPILER_DYNAMIC_QUANTIZATION") != device_caps.end()) {
+    const auto supported_properties = core.get_property("NPU", ov::supported_properties);
+    if (std::find(supported_properties.begin(), supported_properties.end(),


How it is a sub-string search, looks more like a container. At least std::find works on it (this auto sometimes makes things harder to understand)

dmatveev · 2025-01-13T18:28:13Z

src/cpp/src/llm_pipeline_static.cpp

        compiler_dq = true;
    }
    return std::make_optional(NPUDesc{arch, max_tiles, compiler_dq});
 }

-ov::AnyMap get_baseline_common_config() {
+ov::AnyMap get_baseline_common_config(bool enable_compiler_dq) {


Please pass the NPUDesc here instead - there's a request to get rid of "high-precision" options in the future as well.

src/cpp/src/llm_pipeline_static.cpp

TolyaTalamanov

The whole point was to encapsulate logic of compiler and non-compiler DQ into function and where the DQ is needed just call that function

TolyaTalamanov · 2025-01-13T18:46:49Z

src/cpp/src/llm_pipeline_static.cpp

-ov::AnyMap get_default_common_config(const std::shared_ptr<ov::Model>& model) {
-    auto config = get_baseline_common_config();
+bool enable_compiler_dq(const std::optional<NPUDesc>& npudesc) {
+    return npudesc.has_value() && npudesc->compiler_dq;


What's the purpose of this function? Why don't just use npudesc.has_value() && npudesc->compiler_dq straightaway

TolyaTalamanov

enable_compiler_dq seems to be redundant

dmatveev · 2025-01-13T19:20:20Z

src/cpp/src/llm_pipeline_static.cpp

-    if (npudesc.has_value() && npudesc->compiler_dq) {
-        config.emplace("NPUW_DQ_FULL", "NO");
+    // Specify NPUW DQ if Compiler DQ is not enabled
+    if (!npudesc.has_value() || !npudesc->compiler_dq) {


Not gonna lie @TolyaTalamanov, it was much better WITH that tiny one-liner than without that.

dmatveev · 2025-01-13T19:25:44Z

Let's wait for the testing results

Update DQ query

ba70ef1

smirnov-alexey added this to the 2025.0 milestone Jan 9, 2025

smirnov-alexey requested review from dmatveev, ilya-lavrenov, PatrikStepan and TolyaTalamanov January 9, 2025 12:01

smirnov-alexey assigned dmatveev Jan 9, 2025

github-actions bot added the category: LLM LLM pipeline (stateful, static) label Jan 9, 2025

ilya-lavrenov added the category: NPU label Jan 9, 2025

smirnov-alexey mentioned this pull request Jan 9, 2025

[NPUW] Update compiler DQ query in LLMCompiledModel openvinotoolkit/openvino#28343

Open

smirnov-alexey commented Jan 9, 2025

View reviewed changes

src/cpp/src/llm_pipeline_static.cpp Outdated Show resolved Hide resolved

smirnov-alexey added 3 commits January 13, 2025 14:29

Unconditionally utilize compiler DQ

f5dd5b1

DQ only when in supported props

cc44a0d

Add prefix

fa76cf7

dmatveev requested changes Jan 13, 2025

View reviewed changes

lmielick reviewed Jan 13, 2025

View reviewed changes

TolyaTalamanov reviewed Jan 13, 2025

View reviewed changes

smirnov-alexey added the Code Freeze label Jan 13, 2025

Align DQ behaviour

868a7ac

dmatveev reviewed Jan 13, 2025

View reviewed changes

TolyaTalamanov reviewed Jan 13, 2025

View reviewed changes

TolyaTalamanov approved these changes Jan 13, 2025

View reviewed changes

Address review comments

3c72f4d

dmatveev approved these changes Jan 13, 2025

View reviewed changes

dmatveev added this pull request to the merge queue Jan 13, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 13, 2025

smirnov-alexey added this pull request to the merge queue Jan 13, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 13, 2025

smirnov-alexey added 2 commits January 13, 2025 23:50

Merge branch 'master' into as/npuw_dq

24b1a63

Merge branch 'master' into as/npuw_dq

102a1d9

smirnov-alexey enabled auto-merge January 14, 2025 07:31

smirnov-alexey added this pull request to the merge queue Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] [NPU] StaticLLMPipeline: Compiler DQ update #1515

[LLM] [NPU] StaticLLMPipeline: Compiler DQ update #1515

smirnov-alexey commented Jan 9, 2025 •

edited

Loading

dmatveev Jan 13, 2025 •

edited

Loading

smirnov-alexey Jan 13, 2025

dmatveev Jan 13, 2025

dmatveev Jan 13, 2025

smirnov-alexey Jan 13, 2025

lmielick Jan 13, 2025

TolyaTalamanov Jan 13, 2025

dmatveev Jan 13, 2025

TolyaTalamanov left a comment

TolyaTalamanov commented Jan 13, 2025

dmatveev commented Jan 13, 2025

dmatveev left a comment

dmatveev Jan 13, 2025

dmatveev Jan 13, 2025

smirnov-alexey Jan 13, 2025

TolyaTalamanov left a comment

TolyaTalamanov Jan 13, 2025

smirnov-alexey Jan 13, 2025

TolyaTalamanov left a comment

dmatveev Jan 13, 2025

dmatveev commented Jan 13, 2025

[LLM] [NPU] StaticLLMPipeline: Compiler DQ update #1515

[LLM] [NPU] StaticLLMPipeline: Compiler DQ update #1515

Conversation

smirnov-alexey commented Jan 9, 2025 • edited Loading

dmatveev Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TolyaTalamanov left a comment

Choose a reason for hiding this comment

TolyaTalamanov commented Jan 13, 2025

dmatveev commented Jan 13, 2025

dmatveev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TolyaTalamanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TolyaTalamanov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmatveev commented Jan 13, 2025

smirnov-alexey commented Jan 9, 2025 •

edited

Loading

dmatveev Jan 13, 2025 •

edited

Loading