-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I know if the IPU device is employed ? #21
Comments
@ocwins , thanks for offering these suggestions. Let us review your suggestions internally. The demo is possibly running on CPU when you disabled the IPU device. |
A program shows detail info and if it can do some benchmark may be the best. (with or without source code) And at moment, the demo, it seems that the fps/cpu usage do not have significant changes when I disabled the IPU device. How can I know if the IPU is properly employed ? Are there some outputs can be used to distinguish? |
I did some more investigation and it seems that the demo " "ipu_modelsx4_demo"" always run on CPU regardless of whether the IPU device is enabled or not. Is this file useful? |
@uday610 Please have the team view this https://www.asrock.com/microsite/aiquickset/index.html ASRock has made a simple to use application that they have decided to restrict the use of to their GPUs unfortunately, but it is exactly what a consumer friendly Ryzen AI app should be. I just want to install an app and start prompting images, audio and text. :) |
In my opinion, we should not go so far at current stage. First, we need a program to test and benchmark our hardware. A simple mma (matrix multiplication and accumulation) demo with correctness check could achieve this goal. In my current understanding, IPU is a hardware which runs binaries, AMD provided two binaries at moment, 1x4 and 5x4. Running 1x4 binary on IPU could provide 2 TOPS computing power for a single application/program and users could run 5 application employs IPU at same time. 5x4 binary could provide 10 TOPS computing power for a single application and users could run only one application at same time. A demo only do arithmetic without any other business logic could prove that the IPU works well and can be pushed to its max computing power. And with source codes of a demo like this, developers especially real programmers could learn the way how to use IPU with its full capability. Personally, I do not like those onnx (and other frameworks) codes. I want to know how to directly operate the IPU, how to submit inputs and where to get the results. If we could learn how to make a binary running on IPU, it would be perfect. AI is a set of apps but not the foundation making AI works, programs/examples/frameworks of/for AI is too far from the hardware. Something simple but using IPU through onnx or vitisai(?) are somehow a bit closer, but still not suitable to test/demonstrate the IPU itself, examples of running LLM are too heavy for the purpose, that purpose is making users/developers familiar to the hardware. I know companies have their strategy and policy for a certain period. The decision maker may not have willing to give developers (outside the company) all things under the hood. What I can say is that a demo/example as small as possible (under the restriction from company), it could tell us if the hardware works properly and if the max computing power can be achieved, a program like this is necessary at this very early stage. A tool like this is good to the team of RyzenAI-SW too. When users/developers meet troubles, you can ask them running this tool to prove their environments are configured properly. @uday610, @andyluo7 Some apps show what IPU can do at high-level are good, but a tool checks the hardware and its capability is a must. That is my opinion. |
https://www.youtube.com/watch?v=IVPT6scMaaw
If you only knew how bad things really are. |
What should I say... " A thousand mile trip begins with one step. " ;p As far as I can see, from current stage to a stage being totally usable and useful is not too far. Many works after current stage have been done, the trouble comes from that some works at (or before) current stage have not or not done well. |
About my suggestion, thank you (and the team) for listening. About the multi-model demo, it is still problematic. there are typos in generate_script.py:
With corrected scripts, ipu_multi_models.exe pops a message box complaining of "glog.dll was not found" and fails to run. The old version didn't have this problem. |
hi @ocwins. in my conda environment, the glog.dll was installed along with Anaconda3. Could you please try running conda install glog and then execute the program again? BTW, my anaconda installer version is Anaconda3-2023.07-2-Windows-x86_64. |
Hi @lidachang-amd , the glog problem is resolved by installing it. But there is a new problem, console output: The demo works if we replace onnxruntime_vitisai_ep.dll by its previous version. But like previous version, it also runs well without IPU (disabled in device manager), and the CPU usage is as same as the IPU is enabled. So it's hardly to tell if the IPU was employed in these experiments. |
Did You delete the cache (C:\temp{User_name}\vaip.cache) firstly, or you can set XLNX_ENABLE_CACHE=0 to disable the cache. |
No difference is observed after deleting cache or set XLNX_ENABLE_CACHE. BTW, onnxruntime_vitisai_ep.dll provided in this version may be not properly compiled. The message box shows that the entry point could not be found. There could be a name mismatch caused by C++ name mangling. |
Updated hello_world quantization to use enable_ipu_cnn
I have succeeded run the demo "ipu_modelsx4_demo", but whether I turn the IPU device enabled or disabled in device manager, the demo runs flawless, and I can't tell if there are any difference.
I suggest that releasing a single executable file what runs some simple tests to make sure everything related to IPU are good or not. It's better if there is a demo do not need any environment setups (conda, python, etc.), then everybody can use it to test their hardware, but not only developers.
Another suggestion is examples in pure C/C++ and other low-level tools. In our recent projects, we uses pure C/C++/cuda for inference. To be honest, that makes life much easier. With Ryzen AI, we still don't want to employ any sophisticated solutions. But we need the low-level interfaces/tools and examples how to use them.
A project like cutlas from NVIDIA is a good show. We don't use it, but from their codes, we can easily learn how to using their hardware effectively and efficiently.
The text was updated successfully, but these errors were encountered: