Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another DxDispatch issue with QCOM Hexagon NPU on Windows (CompileOperator) #666

Open
fobrs opened this issue Nov 14, 2024 · 4 comments
Open

Comments

@fobrs
Copy link

fobrs commented Nov 14, 2024

Using NPU driver version 30.0.32.1000 ( 10/9/2024). on Snapdragon Dev Box

dxdispatch.exe .\models\dml_reduce.json -a Adreno
Running on 'Snapdragon(R) X Elite - X1E001DE - Qualcomm(R) Adreno(TM) GPU'
Compile Op
Dispatch 'sum': 1 iterations, 0.5781 ms median (CPU), 0.014063 ms median (GPU)
Resource 'input': 1, 2, 3, 4, 5, 6, 7, 8, 9
Resource 'output': 6, 15, 24

dxdispatch.exe..\models\dml_reduce.json -a Hexagon
Running on 'Qualcomm(R) Hexagon(TM) NPU'
Compile Op
Failed to execute the model: ERROR while initializing 'sum': C:\projects\DirectML\DxDispatch\src\dxdispatch\DmlDispatchable.cpp(404)\dxdispatchImpl.dll!00007FFA4C17BDB8: (caller: 00007FFA4C2B9884) Exception(1) tid(3754) 887A0004 The specified device interface or feature level is not supported on this system.
[DmlDispatchable::Initialize(m_device->DML()->CompileOperator( m_operator.Get(), dmlDesc.executionFlags, IID_PPV_ARGS(m_compiledOperator.ReleaseAndGetAddressOf())))]

dxdispatch.exe -s
Snapdragon(R) X Elite - X1E001DE - Qualcomm(R) Adreno(TM) GPU
-Version: 31.0.56.0
-Hardware: true
-Integrated: true
-Dedicated Adapter Memory: 0 bytes
-Dedicated System Memory: 0 bytes
-Shared System Memory: 15.79 GB
-D3D12_GRAPHICS: true
-CORE_COMPUTE: true
-GENERIC_ML: true

Qualcomm(R) Hexagon(TM) NPU
-Version: 30.0.32.1000
-Hardware: true
-Integrated: false
-Dedicated Adapter Memory: 0 bytes
-Dedicated System Memory: 0 bytes
-Shared System Memory: 15.79 GB
-D3D12_GRAPHICS: false
-CORE_COMPUTE: false
-GENERIC_ML: true

Microsoft Basic Render Driver
-Version: 10.0.26100.2314
-Hardware: false
-Integrated: false
-Dedicated Adapter Memory: 0 bytes
-Dedicated System Memory: 0 bytes

@ashumish-QCOM
Copy link

Hi @fobrs

Thank you for reporting the issue with the Qualcomm(R) Hexagon(TM) NPU on Windows. We have noted the error during the initialization of the 'sum' operation.

To address this, you might want to ensure that you are using the latest version of the DirectML library and that your system meets all the necessary hardware and software requirements. Additionally, you can try running the model with different execution flags or configurations to see if that resolves the issue.

@fobrs
Copy link
Author

fobrs commented Dec 10, 2024

Hi @ashumish-QCOM

I do run the latest version of DirectML and have installed the latest Qualcomm NPU driver.

The only thing I did was running the first example from the DirectML Readme. And it failed.... What should I tweak?

 **Getting Started**
See the [guide](https://github.com/microsoft/DirectML/blob/master/DxDispatch/doc/Guide.md) for detailed usage instructions.
The [models](https://github.com/microsoft/DirectML/blob/master/DxDispatch/models) directory contains some
simple examples to get started. For example, here's an example that invokes DML's reduction operator:

> dxdispatch.exe models/dml_reduce.json

Running on 'NVIDIA GeForce RTX 2070 SUPER'
Resource 'input': 1, 2, 3, 4, 5, 6, 7, 8, 9
Resource 'output': 6, 15, 24

I hope Qualcomm gets its NPU drivers 100% compatible without users requiring tweaking...

@ashumish-QCOM
Copy link

Hi @fobrs

Here are some steps you can look into:

Model Compatibility: Ensure the model dml_reduce.json is fully compatible with the Qualcomm Hexagon NPU. Some models may require specific optimizations.
Execution Flags: Experiment with different execution flags in your dxdispatch command to see if that resolves the issue.
DirectML Version: Ensure you are using DirectML version 1.15.4 or higher, as this version includes optimizations for the Qualcomm Hexagon NPU

@fobrs
Copy link
Author

fobrs commented Dec 12, 2024

The model consists of exactly one operator:

"sum": 
{
    "type": "DML_OPERATOR_REDUCE",
    "desc": 
    {
        "InputTensor": { "DataType": "FLOAT32", "Sizes": [1,1,3,3] },
        "OutputTensor": { "DataType": "FLOAT32", "Sizes": [1,1,3,1] },
        "Function": "DML_REDUCE_FUNCTION_SUM",
        "AxisCount": 1,
        "Axes": [3]
    }
}

The error occurs during compilation. So tweaking execution parameters makes no sense here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants