Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I know if the IPU device is employed ? #21

Open
ocwins opened this issue Nov 17, 2023 · 13 comments
Open

How do I know if the IPU device is employed ? #21

ocwins opened this issue Nov 17, 2023 · 13 comments
Assignees

Comments

@ocwins
Copy link

ocwins commented Nov 17, 2023

I have succeeded run the demo "ipu_modelsx4_demo", but whether I turn the IPU device enabled or disabled in device manager, the demo runs flawless, and I can't tell if there are any difference.

I suggest that releasing a single executable file what runs some simple tests to make sure everything related to IPU are good or not. It's better if there is a demo do not need any environment setups (conda, python, etc.), then everybody can use it to test their hardware, but not only developers.

Another suggestion is examples in pure C/C++ and other low-level tools. In our recent projects, we uses pure C/C++/cuda for inference. To be honest, that makes life much easier. With Ryzen AI, we still don't want to employ any sophisticated solutions. But we need the low-level interfaces/tools and examples how to use them.

A project like cutlas from NVIDIA is a good show. We don't use it, but from their codes, we can easily learn how to using their hardware effectively and efficiently.

@uday610
Copy link
Collaborator

uday610 commented Nov 17, 2023

@ocwins , thanks for offering these suggestions. Let us review your suggestions internally.

The demo is possibly running on CPU when you disabled the IPU device.

@ocwins
Copy link
Author

ocwins commented Nov 17, 2023

@ocwins , thanks for offering these suggestions. Let us review your suggestions internally.

The demo is possibly running on CPU when you disabled the IPU device.

A program shows detail info and if it can do some benchmark may be the best. (with or without source code)

And at moment, the demo, it seems that the fps/cpu usage do not have significant changes when I disabled the IPU device. How can I know if the IPU is properly employed ? Are there some outputs can be used to distinguish?

@ocwins
Copy link
Author

ocwins commented Nov 17, 2023

The demo is possibly running on CPU when you disabled the IPU device.

I did some more investigation and it seems that the demo " "ipu_modelsx4_demo"" always run on CPU regardless of whether the IPU device is enabled or not.

Is this file useful?

vitisai_ep_report.json

@rejectcookies
Copy link

@uday610 Please have the team view this https://www.asrock.com/microsite/aiquickset/index.html

ASRock has made a simple to use application that they have decided to restrict the use of to their GPUs unfortunately, but it is exactly what a consumer friendly Ryzen AI app should be. I just want to install an app and start prompting images, audio and text. :)

@ocwins
Copy link
Author

ocwins commented Nov 18, 2023

@uday610 Please have the team view this https://www.asrock.com/microsite/aiquickset/index.html

ASRock has made a simple to use application that they have decided to restrict the use of to their GPUs unfortunately, but it is exactly what a consumer friendly Ryzen AI app should be. I just want to install an app and start prompting images, audio and text. :)

In my opinion, we should not go so far at current stage.

First, we need a program to test and benchmark our hardware. A simple mma (matrix multiplication and accumulation) demo with correctness check could achieve this goal.

In my current understanding, IPU is a hardware which runs binaries, AMD provided two binaries at moment, 1x4 and 5x4. Running 1x4 binary on IPU could provide 2 TOPS computing power for a single application/program and users could run 5 application employs IPU at same time. 5x4 binary could provide 10 TOPS computing power for a single application and users could run only one application at same time.

A demo only do arithmetic without any other business logic could prove that the IPU works well and can be pushed to its max computing power. And with source codes of a demo like this, developers especially real programmers could learn the way how to use IPU with its full capability.

Personally, I do not like those onnx (and other frameworks) codes. I want to know how to directly operate the IPU, how to submit inputs and where to get the results. If we could learn how to make a binary running on IPU, it would be perfect.

AI is a set of apps but not the foundation making AI works, programs/examples/frameworks of/for AI is too far from the hardware. Something simple but using IPU through onnx or vitisai(?) are somehow a bit closer, but still not suitable to test/demonstrate the IPU itself, examples of running LLM are too heavy for the purpose, that purpose is making users/developers familiar to the hardware.

I know companies have their strategy and policy for a certain period. The decision maker may not have willing to give developers (outside the company) all things under the hood.

What I can say is that a demo/example as small as possible (under the restriction from company), it could tell us if the hardware works properly and if the max computing power can be achieved, a program like this is necessary at this very early stage.

A tool like this is good to the team of RyzenAI-SW too. When users/developers meet troubles, you can ask them running this tool to prove their environments are configured properly. @uday610, @andyluo7

Some apps show what IPU can do at high-level are good, but a tool checks the hardware and its capability is a must. That is my opinion.

@rejectcookies
Copy link

rejectcookies commented Nov 19, 2023

https://www.youtube.com/watch?v=IVPT6scMaaw

that publication date
current stage

If you only knew how bad things really are.

@ocwins
Copy link
Author

ocwins commented Nov 19, 2023

https://www.youtube.com/watch?v=IVPT6scMaaw

that publication date
current stage

If you only knew how bad things really are.

What should I say...

" A thousand mile trip begins with one step. " ;p

As far as I can see, from current stage to a stage being totally usable and useful is not too far. Many works after current stage have been done, the trouble comes from that some works at (or before) current stage have not or not done well.

@uday610
Copy link
Collaborator

uday610 commented Nov 27, 2023

Hi @ocwins
The multi-model demo is just updated #27 , you may try the same.
Regarding your other suggestion, the team is developing a low-level utility/tool to interact with the IPU standalone/independently. Hopefully we will be able to provide an early access release soon, stay tuned.

Thank you,

@ocwins
Copy link
Author

ocwins commented Nov 27, 2023

Hi @ocwins The multi-model demo is just updated #27 , you may try the same. Regarding your other suggestion, the team is developing a low-level utility/tool to interact with the IPU standalone/independently. Hopefully we will be able to provide an early access release soon, stay tuned.

Thank you,

About my suggestion, thank you (and the team) for listening.

About the multi-model demo, it is still problematic. there are typos in generate_script.py:

bat_file =["set XLNX_VART_FIRMWARE=%cd%\\..\\1x4.xclbin\n",
           "set PATH=%cd%\\..\\bin;%cd%\\..\\python;%cd%\\..;%PATH%\n" # MISSING: comma at end
           "set PYTHONPATH="+pythonpath_value+";%PYTHONPATH%" # MISSING: \n and comma
           "set DEBUG_ONNX_TASK=0\n",
           "set DEBUG_DEMO=0\n",
           "set NUM_OF_DPU_RUNNERS=4\n",
           "set XLNX_ENABLE_GRAPH_ENGINE_PAD=1\n",
           "set XLNX_ENABLE_GRAPH_ENGINE_DEPAD=1\n",
           "%cd%\\..\\bin\ipu_multi_models.exe %cd%\\config\\"]

With corrected scripts, ipu_multi_models.exe pops a message box complaining of "glog.dll was not found" and fails to run. The old version didn't have this problem.

@lidachang-amd
Copy link
Contributor

hi @ocwins. in my conda environment, the glog.dll was installed along with Anaconda3. Could you please try running conda install glog and then execute the program again? BTW, my anaconda installer version is Anaconda3-2023.07-2-Windows-x86_64.

@ocwins
Copy link
Author

ocwins commented Nov 28, 2023

Hi @lidachang-amd ,

the glog problem is resolved by installing it.

But there is a new problem, console output:
FAIL : LoadLibrary failed with error 127 "" when trying to load "C:\RyzenAI\SW\demo\multi-model-exec\bin\onnxruntime_vitisai_ep.dll"
and there is a message box showing something like that.

The demo works if we replace onnxruntime_vitisai_ep.dll by its previous version. But like previous version, it also runs well without IPU (disabled in device manager), and the CPU usage is as same as the IPU is enabled.

So it's hardly to tell if the IPU was employed in these experiments.

@lidachang-amd
Copy link
Contributor

Did You delete the cache (C:\temp{User_name}\vaip.cache) firstly, or you can set XLNX_ENABLE_CACHE=0 to disable the cache.

@ocwins
Copy link
Author

ocwins commented Nov 30, 2023

Did You delete the cache (C:\temp{User_name}\vaip.cache) firstly, or you can set XLNX_ENABLE_CACHE=0 to disable the cache.

No difference is observed after deleting cache or set XLNX_ENABLE_CACHE.

BTW, onnxruntime_vitisai_ep.dll provided in this version may be not properly compiled. The message box shows that the entry point could not be found. There could be a name mismatch caused by C++ name mangling.

savitha-srinivasan pushed a commit to savitha-srinivasan/RyzenAI-SW that referenced this issue Jul 29, 2024
Updated hello_world quantization to use enable_ipu_cnn
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants